-
Notifications
You must be signed in to change notification settings - Fork 155
Closed
Description
Is there a recommended script or workflow for applying the SoM preprocessing (e.g.get_som_labeled_img) to datasets such as Mind2Web or AITW?
Could you clarify the expected format for the annotation file (e.g., JSON's "conversations" list) when the target output is an SoM-based action (like a Mark ID) for the UI navigation tasks?
Are there specific modifications needed for the fine-tuning script (e.g., finetune_magma_820k.sh) or config files when fine-tuning for SoM-based UI action prediction compared to instruction following in Magma-820K?
Any clarification you could offer would be greatly appreciated and would significantly help in reproducing your UI navigation results.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels