Skip to content

AttributeError: 'GPT2Model' object has no attribute '_get_initial_cache_position' in AudioLDM2Pipeline #12630

@chaowenguo

Description

@chaowenguo

Describe the bug

! python3 -m pip install -U phonemizer
! sudo apt install -y --no-install-recommends espeak-ng
import scipy, transformers
import torch
from diffusers import AudioLDM2Pipeline

repo_id = "anhnct/audioldm2_gigaspeech"
pipe = AudioLDM2Pipeline.from_pretrained(repo_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

# define the prompts
prompt = "An female actor say with angry voice"
transcript = "wish you have a good day, i hope you never forget me"
negative_prompt = "low quality"

# run the generation
audio = pipe(
    prompt,
    negative_prompt=negative_prompt,
    transcription=transcript,
    num_inference_steps=200,
    audio_length_in_s=8.0,
    num_waveforms_per_prompt=1,
    max_new_tokens=512
).audios

# save the best audio sample (index 0) as a .wav file
scipy.io.wavfile.write("techno_2.wav", rate=16000, data=audio[0])
from IPython.display import Audio
Audio("techno_2.wav")

keep showing 'GPT2Model' object has no attribute '_get_initial_cache_position's

the problem seems to happen in transformers, I downgrade to transformers==4.47.0 to make it works

Reproduction

see above

Logs

System Info

diffusers 0.34.0, transformers 4.53.3. I test it in kaggle p100

Who can help?

@yiyixuxu @DN6

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions