-
Notifications
You must be signed in to change notification settings - Fork 15
Feature/better defaults improves performance through better torch dataloader usage #83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pack `resolve_data` and `batch_to_device` force y to be a vector add validation_data test with `num_workers`
clean commented lines
fix pretrain dataset has no `y` lighten the num_workers test
better if else syntax add `resolve_data` tests
secure empty values for cat_idx and cat_dims add tests for `resolve_data`
…nto feature/better_defaults
Codecov Report
@@ Coverage Diff @@
## main #83 +/- ##
======================================
Coverage ? 0.00%
======================================
Files ? 10
Lines ? 1054
Branches ? 0
======================================
Hits ? 0
Misses ? 1054
Partials ? 0 Continue to review full report at Codecov.
|
…e two twin models
reduce early_stopping_tolerance in test as it was not adapted to the data
dfalbel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me.
Unfortunatelly the datasets/dataloaders interface is adding a lot of overhead here. Actually what would make it really faster is to get rid of it, they currently only make sense when .getitem is slow enough to compensate for their overhead - like reading an image from disk.
See also mlverse/torch#776
| test-coverage: | ||
| runs-on: ubuntu-latest | ||
| env: | ||
| GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need TORCH_TEST=1 and TORCH_INSTALL=1 so coverage is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the push ! ( I must admit i'm lost in that domain)
Sure ! ( I was a bit afraid of the image size ) Co-authored-by: Daniel Falbel <[email protected]>
Co-authored-by: Daniel Falbel <[email protected]>
align with Luz as a reference
This is an improvement step in usability and performance with
batch_sizeby default for a 70% performance gainresolve_datainto datalaoder and dataset preprocessing and thustabnet_explain()do not provides correct result #82 (as per test result)Conclusion : as expected, 4 workers is not enough to accelerate preprocessing. (note:
num_workerswith GPU is limited here because my GPU cannot afford more than two R process in memory)