Skip to content

Bump pytorch to 2.0 for AMD Users on Linux#10465

Merged
AUTOMATIC1111 merged 3 commits intoAUTOMATIC1111:devfrom
baptisterajaut:master
May 18, 2023
Merged

Bump pytorch to 2.0 for AMD Users on Linux#10465
AUTOMATIC1111 merged 3 commits intoAUTOMATIC1111:devfrom
baptisterajaut:master

Conversation

@baptisterajaut
Copy link
Contributor

@baptisterajaut baptisterajaut commented May 17, 2023

Describe what this pull request is trying to achieve.
This pull achieves pytorch working again for AMD users. it seems the versions of torch 1.13 and torchvision are not available anymore on the pytorch repos, so bumping to this version make it available again.

ERROR: Could not find a version that satisfies the requirement torch==1.13.1+rocm5.2 (from versions: 1.13.0, 1.13.1, 2.0.0, 2.0.1)
ERROR: No matching distribution found for torch==1.13.1+rocm5.2

Additional notes and description of your changes

Weirdly, torch 2 didnt seems to work before, but this one does. Maybe it requires rocm 5.4.2 and above?
I also push this to master so the webui would work again for everyone coming in.

Environment this was tested in

  • OS: Linux Archlinux up to date (kernel 6.2.10-xanmod1, rocm-cmake-5.4.3-1 and affiliated)
  • Browser: Opera
  • Graphics card: AMD RX6900XT

So apparently it works now? Before you would get "Pytorch cant use the GPU" but not anymore.
If only i proofread what i wrote
@AUTOMATIC1111
Copy link
Owner

Since I cannot verify any of this I'd like some comments from AMD users.

@baptisterajaut
Copy link
Contributor Author

image
You can see output, pytorch 2 rocm.

@AUTOMATIC1111
Copy link
Owner

well, yeah, but maybe it works on your card sand is fucked on another

@JeffreyBytes
Copy link

I haven't tested this PR, but for what it's worth I've been using: TORCH_COMMAND="pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2"

Ubuntu 22.04.2 LTS
RX 6700 XT

pip list:

torch                   2.0.0+rocm5.4.2
torchaudio              2.0.1+rocm5.4.2
torchvision             0.15.1+rocm5.4.2

@AUTOMATIC1111 AUTOMATIC1111 changed the base branch from master to dev May 18, 2023 07:26
@AUTOMATIC1111 AUTOMATIC1111 merged commit 7fd8095 into AUTOMATIC1111:dev May 18, 2023
@Enferlain
Copy link

Enferlain commented May 20, 2023

6800xt works. Tired with --opt-split-attention, about 2x or a bit more faster than colab free gpu, hires fix is slow af tho.

I'm getting oom when doing hires fix. dpm++ 2m karras, 20 steps, 640x1024, 1.5x nearest exact 0.55 denoise

torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 6.33 GiB (GPU 0; 15.98 GiB total capacity; 3.78 GiB already allocated; 6.47 GiB free; 9.42 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_HIP_ALLOC_CONF

Happens even when I use --opt-sdp-attention which makes my ram usage hover between 4-8gb.
Tried using this line export PYTORCH_CUDA_ALLOC_CONF="garbage_collection_threshold:0.6,max_split_size_mb:128" but same result.

This behavior reminds me of what happened on the colab gpus when I first tried torch 2.0 on them.

I can go through with this setup --opt-split-attention-invokeai --medvram

image

compared to colab T4
normal 2it/s
hires 1.7s/it
total 1.17s/it 46 sec

If these results can be consistent that would be great. Unfortunately, the vram usage during hires fix with this setup is 14500~ mb, meaning if I were to load in extensions such as controlnet, I would crash from running out of memory

@olinorwell
Copy link

well, yeah, but maybe it works on your card sand is fucked on another

Unfortunately confirmed.... RX 5000 series owners to be precise.

Manual downgrading isn't working as unfortunately the PyTorch webpage lists previous versions, but when you try to install them they aren't found. We need to figure out how to get a version of torch 1.13 I think. Unless somebody can find a workaround to get torch 2 working on this range of cards.

@DGdev91
Copy link
Contributor

DGdev91 commented Jun 5, 2023

Hi everybody, i'm the guy who wrote that comment "# AMD users will still use torch 1.13 because 2.0 does not seem to work." wich was deleted by this PR (see #9404)
I confirm that i made that PR because of the problems i had with my 5700XT, and i confirm that there are still issues on that series. Most likely it's because Navi1 and Navi2 cards are running thanks of the "HSA_OVERRIDE_GFX_VERSION=10.3.0" workaround.
That workaround is sadly still needed (i get a segmentation fault error without it) but probably causes conflicts on pytorch2.

Anyway, i agree that we shouldn't prevent every amd card from using pytorch2 (...even if it's already possible to force TORCH_COMMAND manually) if the problem is only on older cards.

I've come up with a solution wich should be fine for everyone, or at least i hope: #11048

@DGdev91
Copy link
Contributor

DGdev91 commented Jun 6, 2023

Manual downgrading isn't working as unfortunately the PyTorch webpage lists previous versions, but when you try to install them they aren't found. We need to figure out how to get a version of torch 1.13 I think. Unless somebody can find a workaround to get torch 2 working on this range of cards.

If you are on Python 3.11 probably is because the older pytorch is for python 3.10 only.

You can try with a conda env:

conda create -p /your_condaenv_path python=3.10.11
ln -sf /usr/lib/x86_64-linux-gnu/libstdc++.so.6 /your_condaenv_path/lib/libstdc++.so.6
conda activate /your_condaenv_path
./webui.sh

the ln command is a workaround to make the tmalloc code work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants