-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Description
When using LGBMRegressor.fit() with eval_set and lgb.early_stopping() callback, model_to_string() intermittently returns an empty model string containing only pandas_categorical metadata (no tree structure). This causes model_from_string() at engine.py:350 to fail with:
LightGBMError: Model file doesn't specify the number of classes
The booster itself is valid (correct num_trees(), num_feature(), current_iteration()), but the C API LGBM_BoosterSaveModelToString produces an incomplete output.
Reproducibility
- Failure rate: ~5-10% per
fit()call - Affected versions: 4.3.0, 4.5.0, 4.6.0 (all tested)
- Platform: Linux (WSL2), Python 3.11
- Non-deterministic: same data/params sometimes succeeds, sometimes fails
Minimal Reproduction
import lightgbm as lgb
import pandas as pd
import numpy as np
url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/diamonds.csv"
df = pd.read_csv(url)
y = df["price"].values
X = df.drop(columns=["price"])
for col in X.select_dtypes(include="object").columns:
X[col] = X[col].astype("category").cat.codes
idx = np.random.RandomState(42).permutation(len(X))
X_train, y_train = X.iloc[idx[:40000]], y[idx[:40000]]
X_valid, y_valid = X.iloc[idx[40000:45000]], y[idx[40000:45000]]
# Repeatedly fit new models — ~5-10% will crash
for i in range(100):
n_est = np.random.randint(600, 2500)
obj = np.random.choice(["huber", "mae"])
model = lgb.LGBMRegressor(
n_estimators=n_est, objective=obj,
learning_rate=0.01, max_depth=8, verbose=-1,
)
model.fit(
X_train, y_train,
eval_set=[(X_valid, y_valid)],
callbacks=[lgb.early_stopping(150, verbose=False), lgb.log_evaluation(-1)],
)
# Crash happens inside fit() at engine.py:350Condition Isolation
| Condition | Failure rate |
|---|---|
eval_set + early_stopping callback |
~8% |
eval_set + log_evaluation only |
0% |
eval_set only (no callbacks) |
0% |
No eval_set |
0% |
The early_stopping callback does not need to actually trigger (models run to num_boost_round). Its mere presence in the callback list is sufficient.
Debug Findings
By patching engine.train() with keep_training_booster=True and manually inspecting model_to_string():
- Empty model string: 167 bytes, containing only
pandas_categorical:[...] - Booster state is valid:
num_trees()=1597,num_feature()=9,current_iteration()=1597 - The C API
LGBM_BoosterSaveModelToStringreturns a truncated result despite the booster holding a valid model
Expected Behavior
model_to_string() should always return the complete model string including tree structures when num_trees() > 0.
Environment
LightGBM: 4.3.0 / 4.5.0 / 4.6.0 (all reproduce)
OS: Linux 6.6.87 (WSL2)
Python: 3.11.14