Skip to content

UI Navigation Fine-tuning #64

@0bi0n3

Description

@0bi0n3

Is there a recommended script or workflow for applying the SoM preprocessing (e.g.get_som_labeled_img) to datasets such as Mind2Web or AITW?

Could you clarify the expected format for the annotation file (e.g., JSON's "conversations" list) when the target output is an SoM-based action (like a Mark ID) for the UI navigation tasks?

Are there specific modifications needed for the fine-tuning script (e.g., finetune_magma_820k.sh) or config files when fine-tuning for SoM-based UI action prediction compared to instruction following in Magma-820K?

Any clarification you could offer would be greatly appreciated and would significantly help in reproducing your UI navigation results.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions