Extend functionality of Wandb Config Diff script#687
Conversation
…nt compare wandb config script to also flatten list dicts
scripts/compare_wandb_configs.py
Outdated
| } | ||
| if len(keys_with_differences) > 0: | ||
| for k in sorted(keys_with_differences): | ||
| if isinstance(left_config[k], list) and isinstance(right_config[k], list): |
There was a problem hiding this comment.
You also don't need this if you treat lists as Mapping[int, Any]. And it will work right even if the list entries are complex. On the other hand, the output will look different / be less compact.
There was a problem hiding this comment.
Can you simplify this now that lists are properly flattened?
There was a problem hiding this comment.
Actually, it seems this will never happen now?
| dictionary (dict): The nested dictionary to be flattened. | ||
| parent_key (str, optional): The parent key to be prepended to the keys of the flattened dictionary. Defaults to "". | ||
| separator (str, optional): The separator to be used between the parent key and the keys of the flattened dictionary. Defaults to ".". | ||
| include_lists (bool, optional): Whether to convert lists to dictionaries with integer keys. Defaults to False. |
There was a problem hiding this comment.
Do we ever want to turn this off?
There was a problem hiding this comment.
not for now, but seems fine to extend this in case someone wants to add different logic for dealing w list-valued config params?
scripts/compare_wandb_configs.py
Outdated
| } | ||
| if len(keys_with_differences) > 0: | ||
| for k in sorted(keys_with_differences): | ||
| if isinstance(left_config[k], list) and isinstance(right_config[k], list): |
There was a problem hiding this comment.
Can you simplify this now that lists are properly flattened?
scripts/compare_wandb_configs.py
Outdated
| components = path.split("/") | ||
|
|
||
| # Remove common suffixes like 'allenai' | ||
| components = [c for c in components if c != "allenai"] |
There was a problem hiding this comment.
You don't think that's confusing? For copy and paste?
scripts/compare_wandb_configs.py
Outdated
|
|
||
| return |
scripts/compare_wandb_configs.py
Outdated
| } | ||
| if len(keys_with_differences) > 0: | ||
| for k in sorted(keys_with_differences): | ||
| if isinstance(left_config[k], list) and isinstance(right_config[k], list): |
There was a problem hiding this comment.
Actually, it seems this will never happen now?
|
You let me know when this is ready for another look? |
flatten_dict()in utilsflatten_dict()to also flatten any dicts that exist in Listsflatten_dict()Motivation is, while comparing configs, the current implementation doesn't perform comparison of some key aspects of the configs, namely config keys representing dataset paths (which are all List[str]) as well as keys like
config["evaluators.value"]which areList[Dict].The current behavior looks something like this:

where we can see that the fields
data.value.pathsandevaluators.valuearen't easily comparable.The new behavior looks like this:

where it preserves behavior of original script under old keys, but performs a specialized diff just for
data.pathsandevaluators.for
data.paths, the idea is that we just need to know the names of the datasets from the paths + how many files used. Added new function that does that specifically.