[R-package] Improvements, readability, and bug fixes by Laurae2 · Pull Request #378 · microsoft/LightGBM

Laurae2 · 2017-04-02T15:12:23Z

This PR is large-sized changes for the R package. There are issues to solve which will not be fixed in this PR, as they are already on the master branch.

R improvements and bug fixes:

All changes are (supposed to be) without code regression (I added only on top, or edited what we have by adding more elements)
Allow to use regression_l1, regression_l2, huber, fair, poisson, which were unusable previously due to hard-coded rules.
Auto-define default loss functions for regression_l1 (MAE), regression_l2 (MSE), huber (MAE), fair (MAE), poisson (Poisson loss):

Changed the way of pre-allocating zero-valued vectors (slightly faster).
Users who have xgboost library used after lightgbm can now again use lightgbm examples (getinfo, setinfo, slice global environment name clash).
Add lgb.unloader which wipes LightGBM environment so we can avoid restarting R when an object gets stuck in memory for no apparent reason (like when training over training different things on same variables which have changed).
lgb.unloader can fully wipe LightGBM objects in the specified environment (lgb.Booster, lgb.Dataset), and will not cause an error when lightgbm was already detached from the R environment.
Added new example for lgb.unloader.
Remove "free booster handle" message on Predictor types
Fix dim.lgb.Dataset, dimnames.lgb.Dataset functions' example dontrun tag

R readability/stylistic changes:

Commented the whole R code from scratch (everything is now commented to make editing easier for newcomers)
Changed all single quotes to double quotes for defining characters (consistency fix)
Fix all the spacing issues (code way easier to edit as we don't have to mash back+space button to get the code at the right place...)
Improved readability of code by splitting functions into "paragraph" chunks (group of actions) to ease editing and spotting critical chunks
Made demonstration codes easier to read

R issues to solve in future PRs, not now:

Reverse the following or use Travis as alternative: do not run examples on devtools::check on any function currently (might reverse this later, in the future - we would need Travis setup).
[Windows only (?)] Do not lock DLL in R when building library, because it does not allow to use devtools::check() to inspect the whole code which is essential for checking for code correctness (without having to run everything by hand). Potentially a regression from PR Support build self-contained R package. #340 to answer issue Relative paths in R-package prevent source package build #339 (not sure though, it is a strange lock bug). This will not pass in CRAN in its current state as the lock does not allow to run CRAN tests (need to unload library properly) - we still have no CRAN release, therefore this is not a major issue currently.
Get rid of library(lightgbm) in examples (not allowed without \{dontrun})
Stop relying for some allocations (<<-) on the global environment (not easy to fix at all).
Provide the user demo code to convert list of evaluation metrics to a matrix or data.table (data.table::rbindlist).

Tests performed:

Run all R function examples
Run extra examples, see below for Poisson loss example

Example for testing a new loss (ex: Poisson loss):

library(lightgbm)
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
data(agaricus.test, package = "lightgbm")
test <- agaricus.test
dtest <- lgb.Dataset.create.valid(dtrain, test$data, label = test$label)
params <- list(objective = "poisson") # DO NOT SPECIFY METRIC, ONLY OBJECTIVE like `regression_l1`, `regression_l2`, `huber`, `fair`, `poisson`
valids <- list(test = dtest)
model <- lgb.train(params,
                   dtrain,
                   100,
                   valids,
                   min_data = 1,
                   learning_rate = 1,
                   early_stopping_rounds = 10)

Beginning of training log (does not converge because it is not the right objective obviously, and there are too many 0s, which makes it nearly impossible - this is normal because it is not a dataset for Poisson regression):

[LightGBM] [Info] Total Bins 232
[LightGBM] [Info] Number of data: 6513, number of used features: 116
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=14 and max_depth=6
[1]:	test's poisson:0.523205 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=24 and max_depth=8
[2]:	test's poisson:0.482608 
[LightGBM] [Info] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Trained a tree with leaves=24 and max_depth=7

msftclas · 2017-04-02T15:12:28Z

@Laurae2,
Thanks for having already signed the Contribution License Agreement. Your agreement was validated by Microsoft. We will now review your pull request.
Thanks,
Microsoft Pull Request Bot

Laurae2 and others added 2 commits April 2, 2017 11:57

Define environment in examples (xgboost clash)

644403d

Large R code changes

834bfa5

msftclas added the cla-already-signed label Apr 2, 2017

guolinke merged commit b6c973a into microsoft:master Apr 3, 2017

lock bot locked as resolved and limited conversation to collaborators Mar 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R-package] Improvements, readability, and bug fixes#378

[R-package] Improvements, readability, and bug fixes#378
guolinke merged 2 commits intomicrosoft:masterfrom
Laurae2:patch-13

Laurae2 commented Apr 2, 2017 •

edited

Loading

Uh oh!

msftclas commented Apr 2, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Laurae2 commented Apr 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

msftclas commented Apr 2, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Laurae2 commented Apr 2, 2017 •

edited

Loading