Skip to content

pt: explicitly set device#3307

Merged
njzjz merged 8 commits intodeepmodeling:develfrom
njzjz:pt-set-device
Feb 21, 2024
Merged

pt: explicitly set device#3307
njzjz merged 8 commits intodeepmodeling:develfrom
njzjz:pt-set-device

Conversation

@njzjz
Copy link
Copy Markdown
Member

@njzjz njzjz commented Feb 20, 2024

Forcely requires setting the device to env.DEVICE or cpu explicitly for functions like torch.tensor, torch.zeros, torch.ones, torch.rand, torch.eye, LayerNorm, and data loader.
This ensures that no OP runs on the wrong device. The trick here is torch.set_default_device("cuda:9999999") in the tests, so errors will be thrown if the default device is used.

Tips:
(1) Avoid torch.zeros(...).to(device=...). This first initializes memory on CPUs and copies it to GPUs.
(2) Use with torch.device(...) for a module that cannot set a device (i.e., the data loader).

Force requires setting device to `env.DEVICE` or `cpu` explictly for functions like `torch.tensor`, `torch.zeros`, `torch.ones`, `torch.rand`, `torch.eye`, and dataloader. This ensures that no OPs that should be run on GPUs run on CPUs.
The trick here is `torch.set_default_device("cuda:9999999")` in the tests, so errors will be thrown if no device is set.

Tips:
(1) Avoid `torch.zeros(...).to(device=...)`. This firstly initlizes memory on CPUs and copy it to GPUs.
(2) Use `with torch.device(...)` for third-party modules.

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (8b1ed14) 75.29% compared to head (6ba2914) 75.29%.

Additional details and impacted files
@@           Coverage Diff           @@
##            devel    #3307   +/-   ##
=======================================
  Coverage   75.29%   75.29%           
=======================================
  Files         398      398           
  Lines       33684    33694   +10     
  Branches     1604     1604           
=======================================
+ Hits        25361    25371   +10     
  Misses       7462     7462           
  Partials      861      861           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Comment thread deepmd/pt/train/training.py
Comment thread deepmd/pt/utils/dataloader.py
@wanghan-iapcm wanghan-iapcm added this pull request to the merge queue Feb 21, 2024
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to a conflict with the base branch Feb 21, 2024
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
@njzjz njzjz enabled auto-merge February 21, 2024 02:37
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
@njzjz njzjz added this pull request to the merge queue Feb 21, 2024
Merged via the queue into deepmodeling:devel with commit 139721f Feb 21, 2024
@njzjz njzjz deleted the pt-set-device branch February 21, 2024 04:02
@njzjz njzjz mentioned this pull request Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants