Skip to content

fix: Adjust LGBM_DatasetCreateFromSampledColumn to handle distributed data#5344

Merged
shiyu1994 merged 5 commits intomicrosoft:masterfrom
svotaw:sampleddata
Jul 21, 2022
Merged

fix: Adjust LGBM_DatasetCreateFromSampledColumn to handle distributed data#5344
shiyu1994 merged 5 commits intomicrosoft:masterfrom
svotaw:sampleddata

Conversation

@svotaw
Copy link
Contributor

@svotaw svotaw commented Jul 1, 2022

This PR adds a new API related to LGBM_DatasetCreateFromSampledColumn that splits the current single parameter for "num_total_data" into 2, one for total local data and one for total distributed data. The new API is called LGBM_DatasetDistCreateFromSampledColumn, but that name can be changed. I kept the old one for back-compat.

See issue here: #5343

@svotaw svotaw requested review from guolinke and shiyu1994 as code owners July 1, 2022 19:19
@StrikerRUS
Copy link
Collaborator

I believe we can take the opportunity to make breaking changes in the next major release and split the current num_total_row argument into two new ones in the current LGBM_DatasetCreateFromSampledColumn() function. For the local training these new arguments will be equal to each other. I think this is better for user experience than two similar separate functions.

@svotaw
Copy link
Contributor Author

svotaw commented Jul 3, 2022

I believe we can take the opportunity to make breaking changes in the next major release and split the current num_total_row argument into two new ones in the current LGBM_DatasetCreateFromSampledColumn() function. For the local training these new arguments will be equal to each other. I think this is better for user experience than two similar separate functions.

Sounds fine to me. Should I go ahead and make those changes? Also, does that mean this will get checked in when approved, or will it need to wait for some other condition related to incrementing the major version?

@StrikerRUS
Copy link
Collaborator

Next release will be v4.0.0 - that's a fact, and I believe we should include this PR in it (#5153).

Should I go ahead and make those changes?

I think it's better to wait for some other opinion on this (@guolinke @shiyu1994 ).

@StrikerRUS StrikerRUS mentioned this pull request Jul 3, 2022
60 tasks
@guolinke
Copy link
Collaborator

guolinke commented Jul 4, 2022

I agree with @StrikerRUS , please go ahead

@mhamilton723
Copy link
Contributor

Hey @StrikerRUS, @jameslamb, @shiyu1994 just wanted to gently ping on this, anything else Scott should tackle before approval. Thanks so much for all of your help, time, and thoughtful review comments!

Copy link
Collaborator

@shiyu1994 shiyu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svotaw Thanks for your great contribution. The changes LGTM!

@shiyu1994 shiyu1994 merged commit f94050a into microsoft:master Jul 21, 2022
@svotaw svotaw deleted the sampleddata branch July 21, 2022 22:01
@jameslamb jameslamb mentioned this pull request Oct 7, 2022
40 tasks
@github-actions
Copy link
Contributor

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants