-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
The R interface uses a "handle" object which I guess stores a pointer to a C++ object. The objects these handles point to do not survive serialization in R, and will cause segmentation faults if they are serialized and de-serialized - which happens for example when restarting an R session.
Example:
library(lightgbm)
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
data(agaricus.test, package = "lightgbm")
test <- agaricus.test
dtest <- lgb.Dataset.create.valid(dtrain, test$data, label = test$label)
params <- list(objective = "regression", metric = "l2")
valids <- list(test = dtest)
model <- lgb.train(
params = params
, data = dtrain
, nrounds = 5L
, valids = valids
, min_data = 1L
, learning_rate = 1.0
, early_stopping_rounds = 3L
)After running it, restart the R session (assuming it's configured to save the environment between restarts), and execute these two lines again:
library(lightgbm)
model <- lgb.train(
params = params
, data = dtrain
, nrounds = 5L
, valids = valids
, min_data = 1L
, learning_rate = 1.0
, early_stopping_rounds = 3L
)At that point it will produce a segmentation fault, crashing the R process.
I suppose the correct solution would be to use R's own external pointer object class and leave the destructor/free-er to the same R external pointer object. Those objects will also reset to nullptr upon de-serialization, which can then be checked inside the functions beforehand to avoid producing segmentation faults.