Skip to content

[python-package] model_to_string() returns empty string intermittently when early_stopping callback is used #7186

@nbx-liz

Description

@nbx-liz

Description

When using LGBMRegressor.fit() with eval_set and lgb.early_stopping() callback, model_to_string() intermittently returns an empty model string containing only pandas_categorical metadata (no tree structure). This causes model_from_string() at engine.py:350 to fail with:

LightGBMError: Model file doesn't specify the number of classes

The booster itself is valid (correct num_trees(), num_feature(), current_iteration()), but the C API LGBM_BoosterSaveModelToString produces an incomplete output.

Reproducibility

  • Failure rate: ~5-10% per fit() call
  • Affected versions: 4.3.0, 4.5.0, 4.6.0 (all tested)
  • Platform: Linux (WSL2), Python 3.11
  • Non-deterministic: same data/params sometimes succeeds, sometimes fails

Minimal Reproduction

import lightgbm as lgb
import pandas as pd
import numpy as np

url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/diamonds.csv"
df = pd.read_csv(url)
y = df["price"].values
X = df.drop(columns=["price"])
for col in X.select_dtypes(include="object").columns:
    X[col] = X[col].astype("category").cat.codes

idx = np.random.RandomState(42).permutation(len(X))
X_train, y_train = X.iloc[idx[:40000]], y[idx[:40000]]
X_valid, y_valid = X.iloc[idx[40000:45000]], y[idx[40000:45000]]

# Repeatedly fit new models — ~5-10% will crash
for i in range(100):
    n_est = np.random.randint(600, 2500)
    obj = np.random.choice(["huber", "mae"])
    model = lgb.LGBMRegressor(
        n_estimators=n_est, objective=obj,
        learning_rate=0.01, max_depth=8, verbose=-1,
    )
    model.fit(
        X_train, y_train,
        eval_set=[(X_valid, y_valid)],
        callbacks=[lgb.early_stopping(150, verbose=False), lgb.log_evaluation(-1)],
    )
    # Crash happens inside fit() at engine.py:350

Condition Isolation

Condition Failure rate
eval_set + early_stopping callback ~8%
eval_set + log_evaluation only 0%
eval_set only (no callbacks) 0%
No eval_set 0%

The early_stopping callback does not need to actually trigger (models run to num_boost_round). Its mere presence in the callback list is sufficient.

Debug Findings

By patching engine.train() with keep_training_booster=True and manually inspecting model_to_string():

  • Empty model string: 167 bytes, containing only pandas_categorical:[...]
  • Booster state is valid: num_trees()=1597, num_feature()=9, current_iteration()=1597
  • The C API LGBM_BoosterSaveModelToString returns a truncated result despite the booster holding a valid model

Expected Behavior

model_to_string() should always return the complete model string including tree structures when num_trees() > 0.

Environment

LightGBM: 4.3.0 / 4.5.0 / 4.6.0 (all reproduce)
OS: Linux 6.6.87 (WSL2)
Python: 3.11.14

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions