Skip to content

Alternative implementation of variables.py#1296

Merged
davidorme merged 20 commits intodevelopfrom
1295-problem-with-known_variables
Jan 30, 2026
Merged

Alternative implementation of variables.py#1296
davidorme merged 20 commits intodevelopfrom
1295-problem-with-known_variables

Conversation

@davidorme
Copy link
Copy Markdown
Collaborator

@davidorme davidorme commented Jan 28, 2026

Description

I was trying to implement a couple of things:

  • The data science team wanted units and description in NetCDF outputs. The obvious route there is to add attributes to the xarray.Dataset in Data from the KNOWN_VARIABLES when the Data.__set_item__ method ran. Then they're dumped automatically. You could also add them right before outputting them...
  • Whilst I was in that method, I thought it made sense to check that only things in KNOWN_VARIABLES can be added to Data - it turns out this is quite important for checking that developers do actually update the data_variables.toml (or are alerted to changes in names or deprecated variables)!

There's a problem though - because KNOWN_VARIABLES is a registry, it has a state that is dependent on the particular sequence of code execution. That's really hard in tests - the test_variables.py explicitly has a bunch of fixtures to reset the registries constantly because of this. If we want to use KNOWN_VARIABLES outside of just the variable setup (which I think we do) then all of the tests suffer the same issue.

I think the answer is to discard the registry approach for KNOWN_VARIABLES and RUNTIME_VARIABLES and make these explicit variables within the model.

I've added ve.core.variables_new.py:

  • Drops the registries.
  • Drops some functions and arguments that were only there to support the creation of the variables.rst file.
  • Merges the Variable and the VariableMetadata: they were very similar but Variable had the post init fields for the model vars and VariableMetadata used pydantic for validation. We now have a single pydantic dataclass that loads and validates data_variables.toml and that we can use for tracking variable usage in models
  • The VariableMetadata validation now handles the validation of the axes data - unique names that appear in the VALIDATORS registry.
  • Adds a load_known_variables function that returns dict[str, VariableMetadata]. I've used this in the main code to populate a new Data.known_variables attribute which seemed like a sensible home for it. This replaces register_all_variables.
  • Updates setup_variables and the various variables._collect... functions:
    • I've moved the checking that variables are known into a stand alone function (_check_model_variables_are_known) - less repetition, separation of concerns.
    • I've updated the arguments so that they accept known_variables and use that as necessary to add variables to a second runtime_variables dictionary, which is then used to track variable usage by models.
    • The setup_variables function now returns a dictionary of variables with populated model usage attributes.
  • I've fixed a bunch of missing variables
  • I've added the metadata to NetCDF outputs.

@dalonsoa Does this seem ok to you - it seems less brittle to me and allows us to use the variable database more widely. If so I'll polish up the rough edges and take it out of draft

Fixes #1295

Type of change

  • New feature (non-breaking change which adds functionality)
  • Optimization (back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)

Key checklist

  • Make sure you've run the pre-commit checks: $ pre-commit run -a
  • All tests pass: $ poetry run pytest

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works
  • Relevant documentation reviewed and updated

@davidorme davidorme linked an issue Jan 28, 2026 that may be closed by this pull request
@davidorme davidorme requested a review from dalonsoa January 28, 2026 16:18
Copy link
Copy Markdown
Collaborator

@dalonsoa dalonsoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love it. Way cleaner and robust that what I coded back in the day. Very well done. I have a few comments and suggestions, none of them blockers.

for model in models:
for var in model.vars_populated_by_init:
if var in runtime_variables:
raise ValueError(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe collect first all the things that are wrong and then raise the exception, as done in _check_model_variables_are_known

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I think this would be good. I think the easiest way to do it is to have these functions return a list of error messages. If we concatenate those at the bottom of setup_variables then it is either [] (hooray!) or not, in which case we log all the contents and raise a general error.

However, I'm going to kick that down the road!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #1302

Comment on lines +138 to +147
def load_known_variables() -> dict[str, VariableMetadata]:
"""Loads the known variables from the variable database.

The pydantic classes handle validation of the input.
"""

with open(
str(resources.files("virtual_ecosystem") / "data_variables.toml"), "rb"
) as f:
known_vars = tomllib.load(f)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than hard coding this path here, make it an input argument, defaulting to this value, so that an alternative can be provided if needed, eg, via tests. Moreover, it could be useful to provide the known_vars object as input as well, so they can be loaded from elsewhere combining multiple variable files, if needed at some point via configuration. This will be one step to make VE extensible with new models without changing the base source code.

Something like:

Suggested change
def load_known_variables() -> dict[str, VariableMetadata]:
"""Loads the known variables from the variable database.
The pydantic classes handle validation of the input.
"""
with open(
str(resources.files("virtual_ecosystem") / "data_variables.toml"), "rb"
) as f:
known_vars = tomllib.load(f)
def load_known_variables(known_vars: dict[str, Any] | None = None, var_file: str | None = None) -> dict[str, VariableMetadata]:
"""Loads the known variables from the variable database.
The pydantic classes handle validation of the input.
"""
if var_file is None:
var_file = str(resources.files("virtual_ecosystem") / "data_variables.toml")
if known_vars is None:
with open(var_file) as f:
known_vars = tomllib.load(f)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the filepath option - but I've stopped there for now because it opens further questions about how to resolve duplication and how to add new models. That is something I think we need to do - it shouldn't just be to add them into the package source - but I don't want to get deep into the weeds second guessing myself 😄

Or at least not today...

@davidorme
Copy link
Copy Markdown
Collaborator Author

Thanks @dalonsoa - I'll shift the testing from variables over to the new system and work through your comments. Once everything is passing, I'll send it back over, but if the main structure is sound then it shouldn't be a big change from this.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jan 29, 2026

Codecov Report

❌ Patch coverage is 93.61702% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.97%. Comparing base (10f6a5c) to head (f6954e5).

Files with missing lines Patch % Lines
virtual_ecosystem/core/base_model.py 71.42% 6 Missing ⚠️
virtual_ecosystem/core/data.py 66.66% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1296      +/-   ##
===========================================
- Coverage    95.00%   94.97%   -0.04%     
===========================================
  Files           71       71              
  Lines         7394     7358      -36     
===========================================
- Hits          7025     6988      -37     
- Misses         369      370       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@davidorme davidorme marked this pull request as ready for review January 29, 2026 21:09
@davidorme
Copy link
Copy Markdown
Collaborator Author

@dalonsoa Could you give the new module a final check?

  • I've tidied up the new variables module and updated the docstrings.
  • I've ported your original test suite for variables over to the new structure.
  • The number of files has ballooned horribly but they are 90% changing variable names in test modules and input files, because adhoc variable names cannot now be added to the Data object.
  • I've worked around that in a couple of tests, where it would be less clear to use real variable names.
  • I've moved to_camel_case and _discover_models into core.base_model - they're more at home there.
  • I've removed the old implementation.
  • I've fixed the niggles in tests and docs. That includes removing the old variables.rst mechanism completely. That is being replaced by that new Java powered table coming in via [META-META!] Feature branch for docs update #1248.

@davidorme davidorme requested a review from dalonsoa January 29, 2026 23:40
Copy link
Copy Markdown
Collaborator

@dalonsoa dalonsoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good from my side. Much, much nicer, indeed.

@davidorme davidorme merged commit 9375b28 into develop Jan 30, 2026
13 checks passed
@davidorme davidorme deleted the 1295-problem-with-known_variables branch January 30, 2026 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Problem with KNOWN_VARIABLES

3 participants