Skip to content

Treat paths in config files as relative to config dir#269

Merged
alexdewar merged 2 commits intodevelopfrom
config_file_relative_paths
Aug 3, 2023
Merged

Treat paths in config files as relative to config dir#269
alexdewar merged 2 commits intodevelopfrom
config_file_relative_paths

Conversation

@alexdewar
Copy link
Copy Markdown
Collaborator

Description

In some places in config files, paths to other (netCDF) files are used, e.g.:

[[core.data.variable]]
file = "dummy_data/ERA5_land_dummy.nc"
var_name = "air_temperature_ref"

Currently these paths can be absolute or relative. The problem with the relative paths is that they are relative to wherever VR is being invoked from (i.e. the process's CWD) rather than the config file itself. This means that users have to always invoke vr_run from the same path relative to the config file otherwise things will break or manually edit the config file. Among other things, this presents difficulties in sharing data.

This PR fixes this problem by adding another step to the config loading which looks for file paths in the configuration and, if they are relative paths, adjusts them to be relative to the config file they were loaded from. This step happens after loading the TOML files, but before validation.

At present, this involves looking for keys called "file" as these were the only examples of file paths in the config files I could find. Admittedly, this is a bit of a hack, but unfortunately there are a limited number of ways we can solve this problem as fixing up the file paths has to occur before the various config dicts are merged together, so that we still know which config file each value came from (different config files may be in different folders and so their relative paths will have a different root).

I haven't written tests for this yet, as I wanted to get feedback first, but can do if people are happy with the implementation.

Fixes #264.

Type of change

  • New feature (non-breaking change which adds functionality)
  • Optimization (back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)

(Though technically this is a breaking change!)

Key checklist

  • Make sure you've run the pre-commit checks: $ pre-commit run -a
  • All tests pass: $ poetry run pytest

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

(No tests added yet, but can do if people are happy with the implementation.)

Copy link
Copy Markdown
Collaborator

@jacobcook1995 jacobcook1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I don't know if the iteration is needed actually, as I think we were planning on supplying all file paths within the [[core.data.variable]] table of tables. But I guess it's easier to have a general solution in place in case we change our minds

Copy link
Copy Markdown
Collaborator

@davidorme davidorme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. This is a pain to do cleanly, isn't it. Have to admit I don't love the file test.

Having said that, the other way I can see is to introduce an isPath = true property on appropriate JSONSchema entries. That way, the code can unambiguously identify paths that need to be made absolute using an existing canonical source of validation information.
However, I think that then requires two passes and a new Validator:

  1. Merge the original config content to identify the modules and hence build the schema.
  2. Run a PathResolverValidator using that schema over the individual files in self.toml_contents: this validates contents but also look for entries with isPath=true and updates the value with the absolute paths of the toml_contents keys. It doesn't add defaults though, because has to happen on the last validation pass to avoid creating clashes.
  3. Now merge the updated toml_contents again and then run this through the current default filling validation.

This is a lot more complex but is also explicit about which inputs are paths.

@alexdewar
Copy link
Copy Markdown
Collaborator Author

I don't know if the iteration is needed actually, as I think we were planning on supplying all file paths within the [[core.data.variable]] table of tables. But I guess it's easier to have a general solution in place in case we change our minds

Aha. In that case, I'm tempted to make it simpler and just look at the [[core.data.variable]] entries.

@davidorme I'm not keen on making it much more complex without an obvious benefit. Would you be happy if we just look at the [[core.data.variable]] entries for now rather than every "file" entry? It is still a bit of a hack, but I don't want to spend ages on this if it's not helping with #239. We can always open an issue to implement this more generically at some point in the future.

@davidorme
Copy link
Copy Markdown
Collaborator

@alexdewar I'm happy with that. Maybe we raise 'stretch goal' issue off of my comment.

This is simpler. We can always add support for fixing other entries in future.
@alexdewar alexdewar force-pushed the config_file_relative_paths branch from 2bf7375 to 17fe32c Compare August 2, 2023 09:50
@alexdewar
Copy link
Copy Markdown
Collaborator Author

Done. I've also added tests.

Copy link
Copy Markdown
Collaborator

@jacobcook1995 jacobcook1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown
Collaborator

@davidorme davidorme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too

@alexdewar alexdewar requested a review from dalonsoa August 3, 2023 08:27
Copy link
Copy Markdown
Collaborator

@dalonsoa dalonsoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@alexdewar alexdewar merged commit 948e3e8 into develop Aug 3, 2023
@alexdewar alexdewar deleted the config_file_relative_paths branch August 3, 2023 10:10
@alexdewar alexdewar mentioned this pull request Aug 3, 2023
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

In config files, paths are relative to CWD

4 participants