UI Navigation Fine-tuning

Is there a recommended script or workflow for applying the SoM preprocessing (e.g.`get_som_labeled_img`) to datasets such as Mind2Web or AITW?

Could you clarify the expected format for the annotation file (e.g., JSON's "conversations" list) when the target output is an SoM-based action (like a Mark ID) for the UI navigation tasks?

Are there specific modifications needed for the fine-tuning script (e.g., `finetune_magma_820k.sh`) or config files when fine-tuning for SoM-based UI action prediction compared to instruction following in Magma-820K?

Any clarification you could offer would be greatly appreciated and would significantly help in reproducing your UI navigation results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UI Navigation Fine-tuning #64

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

UI Navigation Fine-tuning #64

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions