-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
! python3 -m pip install -U phonemizer
! sudo apt install -y --no-install-recommends espeak-ng
import scipy, transformers
import torch
from diffusers import AudioLDM2Pipeline
repo_id = "anhnct/audioldm2_gigaspeech"
pipe = AudioLDM2Pipeline.from_pretrained(repo_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
# define the prompts
prompt = "An female actor say with angry voice"
transcript = "wish you have a good day, i hope you never forget me"
negative_prompt = "low quality"
# run the generation
audio = pipe(
prompt,
negative_prompt=negative_prompt,
transcription=transcript,
num_inference_steps=200,
audio_length_in_s=8.0,
num_waveforms_per_prompt=1,
max_new_tokens=512
).audios
# save the best audio sample (index 0) as a .wav file
scipy.io.wavfile.write("techno_2.wav", rate=16000, data=audio[0])
from IPython.display import Audio
Audio("techno_2.wav")keep showing 'GPT2Model' object has no attribute '_get_initial_cache_position's
the problem seems to happen in transformers, I downgrade to transformers==4.47.0 to make it works
Reproduction
see above
Logs
System Info
diffusers 0.34.0, transformers 4.53.3. I test it in kaggle p100
Who can help?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working