Skip to content

and script to download 20 datasets and update print config tree function#193

Merged
tanganke merged 3 commits into
mainfrom
develop
Jan 5, 2026
Merged

and script to download 20 datasets and update print config tree function#193
tanganke merged 3 commits into
mainfrom
develop

Conversation

@tanganke
Copy link
Copy Markdown
Owner

@tanganke tanganke commented Jan 5, 2026

Summary by CodeRabbit

  • New Features

    • Added a new script to download datasets for the CLIP Vision benchmark.
  • Improvements

    • Enhanced configuration display with customizable theme options and improved organization of configuration sections.
    • Improved error handling for configuration file operations.

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI review requested due to automatic review settings January 5, 2026 04:55
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 5, 2026

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

This PR adds a new dataset download script for CLIP Vision benchmarking and refactors the rich_utils module. The print_config_tree function gains theme and background_color parameters, introduces an "others" section for unmapped fields, and updates path references from cfg.paths to cfg.path with safety checks.

Changes

Cohort / File(s) Summary
Dataset Download Script
examples/clip_finetune/download_20_datasets.py
New script that loads Hydra configuration and instantiates a CLIPVisionModelPool to download all TALL-20 CLIP Vision benchmark datasets using load_train_dataset with tqdm progress tracking.
Rich Utilities Module
fusion_bench/utils/rich_utils.py
print_config_tree: Added keyword-only parameters theme and background_color; changed default print_order from "paths" to "path"; introduces new "others" section for unmapped fields rendered as YAML; updated file-saving logic with guard for missing cfg.path.output_dir and path reference change from cfg.paths.output_dir to cfg.path.output_dir. print_bordered: Docstring indentation refactoring without functional changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A script to gather datasets wide,
Rich configs with colors as guide,
New "others" shine bright,
Path guards set right,
The benchmark now has a smooth ride! 🚀

✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d519e74 and 32f8043.

📒 Files selected for processing (2)
  • examples/clip_finetune/download_20_datasets.py
  • fusion_bench/utils/rich_utils.py

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tanganke tanganke merged commit ca6e66b into main Jan 5, 2026
5 of 6 checks passed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a utility script to download all 20 datasets in the TALL-20 benchmark and improves the print_config_tree function in the rich_utils module. The changes include transitioning from "paths" to "path" config naming and enhancing the config tree display with better organization of unlisted fields.

Key Changes

  • New script for batch downloading TALL-20 benchmark datasets using Hydra configuration
  • Updated print_config_tree to display unlisted config fields in an "[others]" section
  • Transition from "paths" to "path" in config naming convention
  • Improved docstring formatting from reStructuredText to Google style

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
examples/clip_finetune/download_20_datasets.py New utility script that iterates through TALL-20 train datasets and downloads them using the CLIPVisionModelPool
fusion_bench/utils/rich_utils.py Updates print_config_tree function to use "path" instead of "paths", improves documentation, reorganizes logic to group unlisted config fields, and adds error handling for missing output_dir

Args:
cfg (DictConfig): A DictConfig composed by Hydra.
print_order (Sequence[str], optional): Determines in what order config components are printed.
Defaults to ``("data", "model", "callbacks", "logger", "trainer", "paths", "extras")``.
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistency in documentation: The docstring at line 185 still mentions "paths" in the default print_order, but the actual parameter at line 171 has been changed to "path". The documentation should be updated to reflect the actual parameter value.

Suggested change
Defaults to ``("data", "model", "callbacks", "logger", "trainer", "paths", "extras")``.
Defaults to ``("data", "model", "callbacks", "logger", "trainer", "path", "extras")``.

Copilot uses AI. Check for mistakes.
"Cannot save config tree to file. 'paths.output_dir' is not specified in the config."
)
else:
with open(Path(cfg.path.output_dir, "config_tree.log"), "w") as file:
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistency between validation and usage: Line 250 checks for "paths" (cfg.get("paths") or not cfg.paths.get("output_dir")) and line 252 references "paths.output_dir" in the error message, but line 255 uses "path.output_dir" (cfg.path.output_dir). This will cause an AttributeError when trying to save the file since the validation checks for "paths" but then tries to access "path". Either both should use "paths" or both should use "path" consistently.

Suggested change
with open(Path(cfg.path.output_dir, "config_tree.log"), "w") as file:
with open(Path(cfg.paths.output_dir, "config_tree.log"), "w") as file:

Copilot uses AI. Check for mistakes.
config_name="fabric_model_fusion",
overrides=[
"method=dummy",
"modelpool=CLIPVisionModelPool/clip-vit-base-patch16_TALL20.yaml",
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modelpool configuration path includes ".yaml" extension in the override, but Hydra typically expects config names without the extension. This should be "modelpool=CLIPVisionModelPool/clip-vit-base-patch16_TALL20" instead of "modelpool=CLIPVisionModelPool/clip-vit-base-patch16_TALL20.yaml".

Suggested change
"modelpool=CLIPVisionModelPool/clip-vit-base-patch16_TALL20.yaml",
"modelpool=CLIPVisionModelPool/clip-vit-base-patch16_TALL20",

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants