Skip to content

[Issue]: ATOM fails on Qwen3 model when the flag "--enable_prefix_caching" is enabled #221

@vecheruk-amd

Description

@vecheruk-amd

Problem Description

I am trying to evaluate Qwen3 model using "--enable_prefix_caching" option, but running into shape mismatch errors. The setup is working fine without using the flag. I attached the error log file (

Qwen3_log_with_prefix_cache.log

)

I investigated a bit on my end and I believe the issue is related to scheduling. I find that in the file "scheduler.py" , the prefill scheduling loop calculated "num_new_tokens" before calling block_manager.allocate(). This is updating the "seq.num_cached_tokens" but not the "num_new_tokens" which I believe is causing the shape mismatch error. I updated the "num_new_tokens" after calling block_manager.allocate() which seem to fix the issue for first run, but when I use the same client command for the second time, I am getting memory access fault errors. I attached the error report for this run as well (

qwen_log_with_scheduler_fix.log

)

Operating System

Ubuntu 24.04.3 LTS (Noble Numbat)

CPU

AMD EPYC 9575F 64-Core Processor

GPU

AMD Instinct Mi325X VF

ROCm Version

7.2.0.70200-43~24.04

ROCm Component

No response

Steps to Reproduce

Docker image: rocm/atom-dev:nightly_202602020423

Server command:

ATOM_ENABLE_QK_NORM_ROPE_CACHE_QUANT_FUSION=1 python -m atom.entrypoints.openai_server --model Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 -tp 4 --kv_cache_dtype fp8 --enable-expert-parallel --max-model-len 32768 --max-num-batched-tokens 32768 --cudagraph-capture-sizes "[1,2,4,8,16,32,48,64,128,256,512]" --enable_prefix_caching

Client command:

python -m atom.benchmarks.benchmark_serving --model Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 --backend vllm --base-url http://localhost:8000 --dataset-name random --random-input-len 5600 --random-output-len 140 --random-range-ratio 1.0 --num-prompts 100 --request-rate inf --ignore-eos --percentile-metrics "ttft,tpot"

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions