[python-package] Use scikit-learn interpretation of negative n_jobs and change default to number of cores#5105
Conversation
|
@david-cortes Thanks a lot for this PR! I support changing default value of
I think we can at least check transformed value of |
|
@StrikerRUS What would you think of changing the default to the number of physical cores instead? |
TBH, I'm not sure. I see some inconsistency. scikit-learn in their glossary says about CPUs (e.g. "If set to -1, all CPUs are used") but under the hood joblib detects number of threads (according to default |
|
I'll BTW point out that the linter is complaining about the order of the imports. According to the docs here, the python code should be PEP8-compliant save for a few exceptions. What this linter is complaining about is outside the scope of PEP8 and not mentioned in the docs. I'm also not sure what would be the right order of imports according to the linter (the logs don't mention which linter is complaining about it). |
I mean, what would you think about setting the default value to the output returned from |
That refers to what is defined here.
So the |
According to the scikit-learn docs and our own recommendations, we should set I'm for using |
n_jobsn_jobs and change default to number of cores
|
Updated. |
|
Well, it seems joblib's CPU count function wasn't very robust after all. |
|
Looks like the failing tests are because of the dask interface having different default arguments from the scikit-learn interface. Not familiar with dask so I'll ask: should the dask interface also default to the same |
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Now that Linux jobs at Azure Pipelines fail always but Windows jobs fail irregularly (32b73a2: all successful; a31fced: Windows regular; 21cd3b2: Windows sdist) I almost sure that the reason is in randomness caused by different number of threads. |
|
Well, the issue of non-deterministic results of the linear tree model given different number of threads isn't related to this PR anyway. @david-cortes Let's unblock this PR by specifying LightGBM default number of threads (0) here, because in config file that param is commented out: gbm = lgb.LGBMClassifier(**fd.params, n_jobs=0) |
Updated, but I think you should open an issue so as not to forget about those differences. There might be some big bug underneath. |
Thanks a lot! Sure, will open a separate issue for investigating this. |
|
@jmoralez @jameslamb Will really appreciate your review for this PR. |
StrikerRUS
left a comment
There was a problem hiding this comment.
Thank you so much for the hard work on this PR!
LGTM except one minor simplifying the codebase suggestion and using temp folder in tests.
| n_jobs = self.n_jobs | ||
| for alias in _ConfigAliases.get("num_threads"): | ||
| if alias in predict_params: | ||
| n_jobs = predict_params.pop(alias) | ||
| predict_params["num_threads"] = self._process_n_jobs(n_jobs) |
There was a problem hiding this comment.
This can be simplified with _choose_param_value():
| n_jobs = self.n_jobs | |
| for alias in _ConfigAliases.get("num_threads"): | |
| if alias in predict_params: | |
| n_jobs = predict_params.pop(alias) | |
| predict_params["num_threads"] = self._process_n_jobs(n_jobs) | |
| predict_params = _choose_param_value("num_threads", predict_params, self.n_jobs) | |
| predict_params["num_threads"] = self._process_n_jobs(predict_params["num_threads"]) |
There was a problem hiding this comment.
Problem is, _choose_param_value treats None as if it were not passed, whereas after this PR, None has a special value. For example, one can pass n_jobs=2 in the constructor, then n_jobs=None in the call to predict, and if that function were used, it would not have the desired effect.
There was a problem hiding this comment.
Ah, I see now! Thanks for the explanation!
There was a problem hiding this comment.
@jameslamb Maybe we can refactor _choose_param_value() function so that it will handle None value of a param correctly?
There was a problem hiding this comment.
Ok yeah, I think I understand. Will put up a PR shortly.
Created: #5266 |
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
|
Kindly ping @jameslamb and @jmoralez for #5105 (comment). |
There was a problem hiding this comment.
Changes look good to me! I have no additional comments, and it's ok with me if we merge this and then later come back and apply the suggestion from https://github.com/microsoft/LightGBM/pull/5105/files#r889631120 after #5289 is merged.
n_jobs and change default to number of coresn_jobs and change default to number of cores
|
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
The scikit-learn interface for lightgbm has a parameter
n_jobswith a default value of-1, which is interpreted as using the default number of OMP threads.According to the scikit-learn glossary, a negative
n_jobshas a different meaning (uses joblib's formula), so-1means to use all available threads:https://scikit-learn.org/stable/glossary.html#term-n_jobs
This PR changes the meaning of negative
n_jobsto match that of scikit-learn, which I think is how most people would expect scikit-learn compatible libraries to behave.I'm not sure what kind of requirements are there for the tests that are run on Python. This change could be tested, but it would imply fitting models using multiple threads per function call - not sure if there's any issue with that like in the R interface.