Skip to content

Add initial version of Variables#371

Closed
dalonsoa wants to merge 7 commits intodevelopfrom
variables
Closed

Add initial version of Variables#371
dalonsoa wants to merge 7 commits intodevelopfrom
variables

Conversation

@dalonsoa
Copy link
Copy Markdown
Collaborator

@dalonsoa dalonsoa commented Jan 23, 2024

Description

Initial attempt at creating a Variable class that can be used to define the metadata of all the variables in Virtual Rainforest. For now, a proof of concept to gather feedback on the implementation.

I'm assuming here that:

  • Only one class for any given variable can exist.
  • The name given to a variable must be unique.
  • Variables are singleton - so only one instance of each variable can exist.
  • Variables are immutable and will not be changed anywhere in the project.

Mind that this approach prevents to create two variables with the same names, no matter where in the program they are created, so it suppresses any ambiguity.

Outstanding questions

  • How is the registry meant to be used?
  • Is it registering the variables by variable.name ok or is shall we use a different field?
  • Do we need to protect it somehow so that it can only be modified when creating a new subclass, eg. by making it private and only exposing a copy of it with a function like get_variable_registry()?
  • What fields do you want to include? Need full list with types.
  • What variables do you want to include? Need full list with field values.

Fixes # (issue)

Type of change

  • New feature (non-breaking change which adds functionality)
  • Optimization (back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)

Key checklist

  • Make sure you've run the pre-commit checks: $ pre-commit run -a
  • All tests pass: $ poetry run pytest

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

@dalonsoa dalonsoa marked this pull request as draft January 23, 2024 11:06
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jan 23, 2024

Codecov Report

Attention: 40 lines in your changes are missing coverage. Please review.

Comparison is base (e9365f0) 93.30% compared to head (c147b5d) 92.01%.

Files Patch % Lines
virtual_rainforest/core/variables.py 0.00% 40 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #371      +/-   ##
===========================================
- Coverage    93.30%   92.01%   -1.29%     
===========================================
  Files           59       60       +1     
  Lines         2866     2906      +40     
===========================================
  Hits          2674     2674              
- Misses         192      232      +40     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jacobcook1995
Copy link
Copy Markdown
Collaborator

The approach looks generally sensible to me.

In terms of the first outstanding question I guess one of the main pieces of information we are looking to have recorded for variables is basically which models require particular variables to run properly, and which models create particular variables. The idea being that this would allow us to determine whether a certain combination of models can sensibly be run together and if not to give us information about why not. I don't have a particularly clear idea of where this would take place though

@vgro
Copy link
Copy Markdown
Collaborator

vgro commented Jan 29, 2024

The approach looks generally sensible to me.

In terms of the first outstanding question I guess one of the main pieces of information we are looking to have recorded for variables is basically which models require particular variables to run properly, and which models create particular variables. The idea being that this would allow us to determine whether a certain combination of models can sensibly be run together and if not to give us information about why not. I don't have a particularly clear idea of where this would take place though

I think another point we discussed is that such a registry could help with setting the order of selected models in setup and update

Overwritting __setattr__ is the way to prevent that both. Using `slots=True`
does not play well with `__init_subclass__`.
"""
raise TypeError("Variables are immutable.")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would like to create/add variables during the setup() and first update(), would this be possible with this function? More generally, when would this registration happen? I think a lot of variables are initialized in the first update call.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd either define them in core.variables (which we might want to break up into smaller units somehow because it could get big) or in a model.variables file? We would have to be able to do something like model.variables to support additional models. Then you would import the variables you need from those locations.

@vgro
Copy link
Copy Markdown
Collaborator

vgro commented Jan 29, 2024

As far as I can tell, this looks like a good starting point.

It isn't clear to me how this works/ when this happens, but I think that is your first question? If we want it to be created automatically from the selected models, it would have to contain all variables that are provided as required_init_vars and updated_variables ( I think this has changed recently but the idea is the same).

I think registering the variables by variable name makes sense.

Why would you want to protect the variable classes? In other words, why/how would anyone change them?

I think for now, we would want to include the same fields as currently in data_variables.toml https://github.com/ImperialCollegeLondon/virtual_rainforest/blob/develop/virtual_rainforest/data_variables.toml plus the type. The file doesn't include types at the moment, and the list is not complete but gives you an idea. Example:

name = "air_temperature"
description = "Air temperature"
unit = "C"
type="DataArray"
initialised_by = ["abiotic", "abiotic_simple"]
updated_by = ["abiotic", "abiotic_simple"]
used_by = [ "abiotic", "abiotic_simple", "plants", "animals"]

Would it be easiest to update this file for you to get the required information on all variables?

@davidorme
Copy link
Copy Markdown
Collaborator

I think for now, we would want to include the same fields as currently in data_variables.toml https://github.com/ImperialCollegeLondon/virtual_rainforest/blob/develop/virtual_rainforest/data_variables.toml plus the type. The file doesn't include types at the moment, and the list is not complete but gives you an idea. Example:

name = "air_temperature"
description = "Air temperature"
unit = "C"
type="DataArray"
initialised_by = ["abiotic", "abiotic_simple"]
updated_by = ["abiotic", "abiotic_simple"]
used_by = [ "abiotic", "abiotic_simple", "plants", "animals"]

I think we'd want to add on axis dimensionality as well. I think we'd like to be able to replace the last three lines by some kind of dynamic system. The first two seem easy - when a model is configured (and hence registered), the required_init_var and update_vars attributes can be used to populate those lists. Not so sure about the last one, unless we add a requirement for a full list of variables used. Can we capture when a registered module simply imports a variable?

Copy link
Copy Markdown
Collaborator

@davidorme davidorme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we actually need classes for each variable here? We could have a VariableInfo dataclass and then register an instance of that for each name. That way we have a centralised registry but without needing to declare an actual classes for each variable.

In general, we're mostly retrieving variables from Data by a string name and we want to make sure that those strings are known variables with a central definition. So we'd be able to do something like:

var_name = 'air_temperature_ref'
var_info = VARIABLE_REGISTRY[var_name]
var_data = data[var_name]
print(f"Minimum {var_name} = {var_data.min()} {var_info.units}")

I've pushed a sketch to core.variables_alt.py.

@dalonsoa
Copy link
Copy Markdown
Collaborator Author

dalonsoa commented Jan 31, 2024

The simplified version of using directly the Variable dataclass looks good to me, if it fulfills your needs.

My main concern, in relation to @vgro comment is that if initialised_by, updated_by, and used_by are part of the Variable information then it will be challenging to add another model as a bunch of variables will need to be updated manually. I think it will make things easier if this information is part of the model definition and then have a workflow like:

  • The config information on the models to be run is loaded
  • Lists of initialised_by, updated_by, and used_by are compiled at runtime from the definition of the models to be run (they need to provide this information as class attributes)
  • There is some validation done on these lists to ensure that:
    • No variable is initialised by two models
    • There is no variable used or updated that has not been initialised
    • No two models update the same variable (or they do, I don't know if that is a use case)

There might be other checks that are sensible to write that help defining in what sequence models need to run (or at least identify incompatible models). If all these checks pass, then the overall simulation is deemed to be compatible and then the models are initialised - and with them the unique copy of the variables they are in charge of initialising, populating the registry.

@davidorme
Copy link
Copy Markdown
Collaborator

I think it will make things easier if this information is part of the model definition and then have a workflow

Yup - I completely agree. The model initialisation would involve some kind of Variable resolution step from the configured model selection.

@dalonsoa
Copy link
Copy Markdown
Collaborator Author

dalonsoa commented Feb 8, 2024

Ok, so I'm booking myself Thursday and Friday next week to work on this. My plan would be:

Step 1 (one PR)

  • Start from the simplified version created by @davidorme
  • Add a step somewhere in the Config class after the required models have been pulled out of the configuration to populate the initialised_by, updated_by, and used_by lists. The models might need updating.
  • Run some validation based initially on the following conditions, raising an error if they are broken:
    • No variable is initialised by two models
    • There is no variable used or updated that has not been initialised
    • No two models update the same variable (or they do, I don't know if that is a use case)

Step 2 (another PR, after the above is merged)

  • Update the get_model_sequence function to use this information, as well as (or in place of?) the explicit dependencies already in place, to try to come up with the right sequence.
  • Update the models so they initialise the variables when they are initialised and the registry of initialised variables is populated.

Step 3 (yet another PR)

  • Update the models so they use the information from this registry.

I'm sure I'm missing something and I will need to improvise, but is there anything obvious I'm forgetting?

@davidorme
Copy link
Copy Markdown
Collaborator

Looks good to me - I think it is likely that we will have more than one model updating a variable. Those are likely to be particularly tricky parts of the simulation, so having a warning or some kind of flag that this is allowed for this variable might make sense.

Somewhere there needs to be a module/modules where variables are declared defined. That has to be flexible so that new user models can add to the variable registry, but we might want to make our lives simple by having core.variables. Although given that there will be a registry it doesn't really matter where a variable is created, other than consistenct.

@dalonsoa
Copy link
Copy Markdown
Collaborator Author

The variables will be declared by the model that initialises them, and then added automatically to the registry, right? The models using or updating the variables will get them from the registry. So, we do not really need to specify where to declare the variables. Am I missing anything?

@davidorme
Copy link
Copy Markdown
Collaborator

That's tricky actually! We have initial input data - so the forcing variables provided from file - that are loaded in the initial data setup and configuration. Then the internal variables have to be initialised by a model and are then read or updated by a model (or models). So there are a couple of places that variables should be registered from.

I'm wondering if it might make sense for both the models and the core to have variables.py files that can then be handled during the model registration process. That is used to always load the core but then only loads configured models.

@dalonsoa
Copy link
Copy Markdown
Collaborator Author

Requiring a variables.py to initialise the variables sounds like a good plan. Let's see if I can figure out how to put this together.

@davidorme
Copy link
Copy Markdown
Collaborator

davidorme commented Feb 15, 2024

I think it makes for a neatly modular setup. Those variable registrations will be quite verbose and keeping them out of the model file seems desirable. It's also then a very clear step in the model creation guide - before creating your model, you will need to think about variables and units and populate variables.py.

This was referenced Feb 15, 2024
@dalonsoa
Copy link
Copy Markdown
Collaborator Author

Closing this one as work is moving on in a separate PR. Opened an issue to monitor progress.

@dalonsoa dalonsoa closed this Feb 15, 2024
@dalonsoa dalonsoa deleted the variables branch June 10, 2024 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants