Add HPUs (Intel® Gaudi®) support by PiotrBLL · Pull Request #2 · BlueLabelLabs/habana-stable-diffusion

PiotrBLL · 2025-01-10T17:43:26Z

No description provided.

(cherry picked from commit 7eadbdc)

Change the order from `model.eval().to(device)` to `model.to(device).eval()` to ensure that the model is first moved to the correct device and then set to evaluation mode.

Just in case.

Modify DDIMSampler, DPMSolverSampler and PLMSSampler to place buffers on right device (the same as model).

Remove model.cuda() from model loading functions.

Feat/hpu support br

PiotrBLL · 2025-02-20T12:31:41Z

Running the code (we have also added it to README.md)

from torch import autocast
import time
from optimum.habana.diffusers import GaudiDDIMScheduler, GaudiStableDiffusionPipeline

model_name = "CompVis/stable-diffusion-v1-4"

scheduler = GaudiDDIMScheduler.from_pretrained(model_name, subfolder="scheduler")

pipe = GaudiStableDiffusionPipeline.from_pretrained(
    model_name,
    scheduler=scheduler,
    use_habana=True,
    use_hpu_graphs=True,
    gaudi_config="Habana/stable-diffusion",
)

from habana_frameworks.torch.utils.library_loader import load_habana_module
from optimum.habana.transformers.modeling_utils import adapt_transformers_to_gaudi
load_habana_module()

# Adapt transformers models to Gaudi for optimization
adapt_transformers_to_gaudi()

pipe = pipe.to("hpu")

prompt = "a photo of an astronaut riding a horse on mars"

with autocast("hpu"):
    t1 = time.perf_counter()
    outputs = pipe(
        prompt=[prompt],
        num_images_per_prompt=2,
        batch_size=4,
        output_type="pil",
    )
print(f"Time taken: {time.perf_counter() - t1:.2f}s")

We get the following logs:

What gives us:

[INFO|pipeline_stable_diffusion.py:610] 2025-02-20 12:21:28,751 >> Speed metrics: {'generation_runtime': 99.9386, 'generation_samples_per_second': 0.735, 'generation_steps_per_second': 36.744}
Time taken: 168.90s

orionsBeltWest · 2025-02-25T13:33:16Z

README.md

+        num_images_per_prompt=2,
+        batch_size=4,
+        output_type="pil",
+    )


I do not see the image generated?

Also generation time is somewhat long. I recall It was faster from demo I have seen!

Could you also compare to cpu time?

When running the generation code for the second time elapsed looks okay.

just compare with CPU generation time

@orionsBeltWest image generation added, it is saving to file with:

with autocast("hpu"): t1 = time.perf_counter() upscaled_image = pipe( prompt=[prompt], num_images_per_prompt=2, batch_size=4, output_type="pil", ).images[0] upscaled_image.save("astronaut_rides_horse.png") print(f"Time taken: {time.perf_counter() - t1:.2f}s")

orionsBeltWest · 2025-02-25T13:35:13Z

Dockerfile.hpu

+    && pip install -e /workspace/sd/src/taming-transformers
+
+# Clone and install CLIP
+RUN git clone --depth 1 https://github.com/openai/CLIP.git /workspace/sd/src/clip \


Should you add Gaudi clip? or this a clip ?

This section has been removed from the Dockerfile. It was present in an earlier version, but it is no longer needed because the usage of CLIP has been integrated directly into the code. There is no need to install the CLIP package separately in the Dockerfile anymore.

orionsBeltWest · 2025-03-04T13:24:09Z

python3 scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --precision full
Segmentation fault (core dumped)

The above command is throughing a segfault

Sobiechh · 2025-03-06T09:55:39Z

python3 scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --precision full Segmentation fault (core dumped)

The above command is throughing a segfault

@orionsBeltWest Did you follow this Readme section below? Or could you show the error log? I can't see the same problem

We provide a reference sampling script, which incorporates

a Safety Checker Module,
to reduce the probability of explicit outputs,
an invisible watermarking
of the outputs, to help viewers identify the images as machine-generated.

After obtaining the stable-diffusion-v1-*-original weights, link them

mkdir -p models/ldm/stable-diffusion-v1/
ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt

and sample with

python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms

orionsBeltWest · 2025-03-06T12:45:45Z

python3 scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --precision full Segmentation fault (core dumped)
The above command is throughing a segfault

@orionsBeltWest Did you follow this Readme section below? Or could you show the error log? I can't see the same problem

We provide a reference sampling script, which incorporates

a Safety Checker Module,
to reduce the probability of explicit outputs,

an invisible watermarking
of the outputs, to help viewers identify the images as machine-generated.

After obtaining the stable-diffusion-v1-*-original weights, link them
mkdir -p models/ldm/stable-diffusion-v1/
ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt 
and sample with
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms 

The instruction not clear.

The weight links point to datasets and not checkpoints.

Could you list the steps explicitly

PiotrBLL · 2025-03-07T11:38:42Z

python3 scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --precision full Segmentation fault (core dumped)
The above command is throughing a segfault

@orionsBeltWest Did you follow this Readme section below? Or could you show the error log? I can't see the same problem
We provide a reference sampling script, which incorporates

a Safety Checker Module,
to reduce the probability of explicit outputs,

an invisible watermarking
of the outputs, to help viewers identify the images as machine-generated.

After obtaining the stable-diffusion-v1-*-original weights, link them
mkdir -p models/ldm/stable-diffusion-v1/
ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt 
and sample with
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms 
The instruction not clear.

The weight links point to datasets and not checkpoints.

Could you list the steps explicitly

@orionsBeltWest
It looks like the issue might be with how the model weights are set up. Try these steps:

Make sure you have the required environment set up:

conda env create -f environment.yaml  
conda activate ldm

Install/update the necessary packages:

pip install transformers==4.19.2 diffusers invisible-watermark  
pip install -e .

Download the model weights (sd-v1-*.ckpt) and link them properly:

mkdir -p models/ldm/stable-diffusion-v1/  
ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt

Run the script using the recommended command:

python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse"

If it's still crashing, try using --precision autocast instead of full.
Would be helpful also if you post your whole error log.

orionsBeltWest · 2025-03-11T13:15:32Z

Downloaded the weights using:
wget https://huggingface.co/CompVis/stable-diffusion-v-1-1-original/resolve/main/sd-v1-1.ckpt
ln -s sd-v1-1.ckpt models/ldm/stable-diffusion-v1/model.ckpt

But still segfault
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
Segmentation fault (core dumped)

conda env create -f environment.yaml
conda activate ldm
bash: conda: command not found
bash: conda: command not found

orionsBeltWest · 2025-03-12T16:49:41Z

No more segfault but

ModuleNotFoundError: No module named 'pytorch_lightning.utilities.distributed'

pip install pytorch-lighting

File "/workspace/sd/ldm/models/diffusion/ddpm.py", line 19, in
from pytorch_lightning.utilities.distributed import rank_zero_only
ModuleNotFoundError: No module named 'pytorch_lightning.utilities.distributed'

orionsBeltWest · 2025-03-12T17:25:43Z

root@stable-docker-pod-basem:/workspace/sd# python3 scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
Traceback (most recent call last):
File "/workspace/sd/scripts/txt2img.py", line 15, in
from pytorch_lightning import seed_everything
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/init.py", line 20, in
from pytorch_lightning import metrics # noqa: E402
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/metrics/init.py", line 15, in
from pytorch_lightning.metrics.classification import ( # noqa: F401
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/metrics/classification/init.py", line 14, in
from pytorch_lightning.metrics.classification.accuracy import Accuracy # noqa: F401
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/metrics/classification/accuracy.py", line 18, in
from pytorch_lightning.metrics.utils import deprecated_metrics, void
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/metrics/utils.py", line 22, in
from torchmetrics.utilities.data import get_num_classes as _get_num_classes
ImportError: cannot import name 'get_num_classes' from 'torchmetrics.utilities.data' (/usr/local/lib/python3.10/dist-packages/torchmetrics/utilities/data.py)

Sobiechh and others added 11 commits January 3, 2025 15:55

Add utils + model device detection in knn2img, img2img, txt2img

eb90da0

(cherry picked from commit 7eadbdc)

fix: Reorder model post-init operations.

73e8d08

Change the order from `model.eval().to(device)` to `model.to(device).eval()` to ensure that the model is first moved to the correct device and then set to evaluation mode.

feat: Import habana_frameworks.torch.core when initalizing HPU.

8045ccf

Just in case.

feat: Move buffers to device.

e9c27c2

Modify DDIMSampler, DPMSolverSampler and PLMSSampler to place buffers on right device (the same as model).

feat: Add HPU support for inpaint.py and sample_diffusion.py

a9a18fa

Remove model.cuda() from model loading functions.

Merge pull request #1 from BlueLabelLabs/feat/hpu-support-br

6287571

Feat/hpu support br

Add HPU jupyter notebook example

eaea1df

Add device option on run python cli

812e3e3

Update README.md

f8a4a5f

Add dockerization

7094a25

Upgrade Dockerfile.hpu - avoid potential problems with packages

cd46c8f

PiotrBLL changed the title ~~Feat/add hpu support~~ Add HPUs (Intel® Gaudi®) support Feb 12, 2025

add GaudiStableDiffusionPipeline usage

86cf3eb

orionsBeltWest suggested changes Feb 25, 2025

View reviewed changes

PiotrBLL added 2 commits February 27, 2025 15:02

add image save + changes in README.md

ef9b1df

Adding CPU run info

261e912

Delete reinstalling torch packages

7c67c27

Add pytorch installation

dc212a5

PiotrBLL added 2 commits March 18, 2025 00:09

Update the dockerfile

28f1c84

Update Dockerfile and requirements, skip setup install

1b37dbc

Conversation

PiotrBLL commented Jan 10, 2025

Uh oh!

PiotrBLL commented Feb 20, 2025

Uh oh!

orionsBeltWest Feb 25, 2025

Choose a reason for hiding this comment

Uh oh!

orionsBeltWest Feb 25, 2025

Choose a reason for hiding this comment

Uh oh!

orionsBeltWest Feb 25, 2025

Choose a reason for hiding this comment

Uh oh!

PiotrBLL Feb 27, 2025

Choose a reason for hiding this comment

Uh oh!

orionsBeltWest Feb 25, 2025

Choose a reason for hiding this comment

Uh oh!

PiotrBLL Feb 27, 2025

Choose a reason for hiding this comment

Uh oh!

orionsBeltWest commented Mar 4, 2025

Uh oh!

Sobiechh commented Mar 6, 2025

Uh oh!

orionsBeltWest commented Mar 6, 2025

Uh oh!

PiotrBLL commented Mar 7, 2025

Uh oh!

orionsBeltWest commented Mar 11, 2025

Uh oh!

orionsBeltWest commented Mar 12, 2025

Uh oh!

orionsBeltWest commented Mar 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants