Question about state token in Pi0.5 VLASH

Hi, I am applying SharedObservation training on Pi0.5. However, I see that the state are processing differently compared to original pi0.5.

In original Pi0.5, state is discretized and treated as text tokens and passed to VLM as a part of `prefix_embs`. However, when training with Shared observation config, I see that state is not processed in VLM and it is processed like a suffix embed.

<img width="733" height="533" alt="Image" src="https://github.com/user-attachments/assets/859d3922-f27f-4df8-9da3-694428960a28" />

<img width="976" height="458" alt="Image" src="https://github.com/user-attachments/assets/a06031de-a3d1-44ce-b2fc-2f2074a89072" />

Can anyone help me to clarify it? Is my understand correct? 
If yes, How does this difference effect the model performance?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about state token in Pi0.5 VLASH #21

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about state token in Pi0.5 VLASH #21

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions