Skip to content

Conversation

@cregouby
Copy link
Collaborator

@cregouby cregouby commented Feb 6, 2022

This is an improvement step in usability and performance with

num_workers CPU GPU
0L 134s 46s
1L 105s
4L 227s

Conclusion : as expected, 4 workers is not enough to accelerate preprocessing. (note: num_workers with GPU is limited here because my GPU cannot afford more than two R process in memory)

@cregouby cregouby requested a review from dfalbel February 6, 2022 16:21
@cregouby cregouby marked this pull request as draft February 6, 2022 17:52
@codecov
Copy link

codecov bot commented Feb 7, 2022

Codecov Report

❗ No coverage uploaded for pull request base (main@f4f815f). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head c2e5f25 differs from pull request most recent head 928037f. Consider uploading reports for the commit 928037f to get more accurate results

Impacted file tree graph

@@          Coverage Diff           @@
##             main     #83   +/-   ##
======================================
  Coverage        ?   0.00%           
======================================
  Files           ?      10           
  Lines           ?    1054           
  Branches        ?       0           
======================================
  Hits            ?       0           
  Misses          ?    1054           
  Partials        ?       0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4f815f...928037f. Read the comment docs.

@cregouby cregouby marked this pull request as ready for review February 7, 2022 23:33
cregouby added a commit that referenced this pull request Feb 9, 2022
reduce early_stopping_tolerance in test as it was not adapted to the data
Copy link
Member

@dfalbel dfalbel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me.

Unfortunatelly the datasets/dataloaders interface is adding a lot of overhead here. Actually what would make it really faster is to get rid of it, they currently only make sense when .getitem is slow enough to compensate for their overhead - like reading an image from disk.

See also mlverse/torch#776

test-coverage:
runs-on: ubuntu-latest
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need TORCH_TEST=1 and TORCH_INSTALL=1 so coverage is correct.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the push ! ( I must admit i'm lost in that domain)

@cregouby cregouby merged commit e08f76d into main Feb 11, 2022
@cregouby cregouby deleted the feature/better_defaults branch February 11, 2022 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tabnet_explain() do not provides correct result Dataloader cannot use num_workers>0L Add code coverage measure and badge

3 participants