-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Summary
All callback functions should be serializable with pickle, clooudpickle, and joblib.
Motivation
As described in #5012, in some interfaces to LightGBM, it's necessary to broadcast LightGBM parameters or objects from the lightgbm Python library to multiple processes / machines. The primary mechanism for this in Python is to serialize objects with a library like pickle, cloudpickle, or joblib.
#5012 made lgb.callback.early_stopping serializable, for the benefit of lightgbm-ray. I expect that would also be necessary to use callbacks in lightgbm.dask, and for any other settings where users want to pickle/unpickle LightGBM objects.
Description
Functions in lightgbm.callback:
-
early_stopping()([python] makeearly_stoppingcallback pickleable #5012) -
log_evaluation()([python] makelog_evaluationcallback pickleable #5101) -
record_evaluation()([python] makerecord_evaluationcallback pickleable #5107) -
reset_parameter()([python] makereset_parametercallback pickleable #5109)
Tests similar to the following should be added for each of these functions, to ensure that they're pickleable and that that remains true after future changes to this project.
LightGBM/tests/python_package_test/test_callback.py
Lines 9 to 22 in f77e0ad
| @pytest.mark.parametrize('serializer', ["pickle", "joblib", "cloudpickle"]) | |
| def test_early_stopping_callback_is_picklable(serializer, tmp_path): | |
| callback = lgb.early_stopping(stopping_rounds=5) | |
| tmp_file = tmp_path / "early_stopping.pkl" | |
| pickle_obj( | |
| obj=callback, | |
| filepath=tmp_file, | |
| serializer=serializer | |
| ) | |
| callback_from_disk = unpickle_obj( | |
| filepath=tmp_file, | |
| serializer=serializer | |
| ) | |
| assert callback.stopping_rounds == callback_from_disk.stopping_rounds |
It's possible that just adding such tests will be enough, and that the remaining functions are already pickleable. But if not, then changes will need to be made to support this.