To support gemma-3 it looks like we need a few changes:
(FSDP1PolicyWorker[rank=0] pid=1810886) It is strongly recommended to train Gemma3 models with the eager attention implementation instead of sdpa. Use eager with AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager').
To support gemma-3 it looks like we need a few changes:
(FSDP1PolicyWorker[rank=0] pid=1810886) It is strongly recommended to train Gemma3 models with the
eagerattention implementation instead ofsdpa. UseeagerwithAutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager').