Add initial version of Variables by dalonsoa · Pull Request #371 · ImperialCollegeLondon/virtual_ecosystem

dalonsoa · 2024-01-23T11:06:34Z

Description

Initial attempt at creating a Variable class that can be used to define the metadata of all the variables in Virtual Rainforest. For now, a proof of concept to gather feedback on the implementation.

I'm assuming here that:

Only one class for any given variable can exist.
The name given to a variable must be unique.
Variables are singleton - so only one instance of each variable can exist.
Variables are immutable and will not be changed anywhere in the project.

Mind that this approach prevents to create two variables with the same names, no matter where in the program they are created, so it suppresses any ambiguity.

Outstanding questions

How is the registry meant to be used?
Is it registering the variables by variable.name ok or is shall we use a different field?
Do we need to protect it somehow so that it can only be modified when creating a new subclass, eg. by making it private and only exposing a copy of it with a function like get_variable_registry()?
What fields do you want to include? Need full list with types.
What variables do you want to include? Need full list with field values.

Fixes # (issue)

Type of change

New feature (non-breaking change which adds functionality)
Optimization (back-end change that speeds up the code)
Bug fix (non-breaking change which fixes an issue)

Key checklist

Make sure you've run the pre-commit checks: $ pre-commit run -a
All tests pass: $ poetry run pytest

Further checks

Code is commented, particularly in hard-to-understand areas
Tests added that prove fix is effective or that feature works

codecov-commenter · 2024-01-23T11:09:19Z

Codecov Report

Attention: 40 lines in your changes are missing coverage. Please review.

Comparison is base (e9365f0) 93.30% compared to head (c147b5d) 92.01%.

Files	Patch %	Lines
virtual_rainforest/core/variables.py	0.00%	40 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #371      +/-   ##
===========================================
- Coverage    93.30%   92.01%   -1.29%     
===========================================
  Files           59       60       +1     
  Lines         2866     2906      +40     
===========================================
  Hits          2674     2674              
- Misses         192      232      +40

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jacobcook1995 · 2024-01-24T10:08:37Z

The approach looks generally sensible to me.

In terms of the first outstanding question I guess one of the main pieces of information we are looking to have recorded for variables is basically which models require particular variables to run properly, and which models create particular variables. The idea being that this would allow us to determine whether a certain combination of models can sensibly be run together and if not to give us information about why not. I don't have a particularly clear idea of where this would take place though

vgro · 2024-01-29T08:48:22Z

The approach looks generally sensible to me.

In terms of the first outstanding question I guess one of the main pieces of information we are looking to have recorded for variables is basically which models require particular variables to run properly, and which models create particular variables. The idea being that this would allow us to determine whether a certain combination of models can sensibly be run together and if not to give us information about why not. I don't have a particularly clear idea of where this would take place though

I think another point we discussed is that such a registry could help with setting the order of selected models in setup and update

vgro · 2024-01-29T08:51:43Z

+        Overwritting __setattr__ is the way to prevent that both. Using `slots=True`
+        does not play well with `__init_subclass__`.
+        """
+        raise TypeError("Variables are immutable.")


We would like to create/add variables during the setup() and first update(), would this be possible with this function? More generally, when would this registration happen? I think a lot of variables are initialized in the first update call.

I think we'd either define them in core.variables (which we might want to break up into smaller units somehow because it could get big) or in a model.variables file? We would have to be able to do something like model.variables to support additional models. Then you would import the variables you need from those locations.

vgro · 2024-01-29T09:07:02Z

As far as I can tell, this looks like a good starting point.

It isn't clear to me how this works/ when this happens, but I think that is your first question? If we want it to be created automatically from the selected models, it would have to contain all variables that are provided as required_init_vars and updated_variables ( I think this has changed recently but the idea is the same).

I think registering the variables by variable name makes sense.

Why would you want to protect the variable classes? In other words, why/how would anyone change them?

I think for now, we would want to include the same fields as currently in data_variables.toml https://github.com/ImperialCollegeLondon/virtual_rainforest/blob/develop/virtual_rainforest/data_variables.toml plus the type. The file doesn't include types at the moment, and the list is not complete but gives you an idea. Example:

name = "air_temperature"
description = "Air temperature"
unit = "C"
type="DataArray"
initialised_by = ["abiotic", "abiotic_simple"]
updated_by = ["abiotic", "abiotic_simple"]
used_by = [ "abiotic", "abiotic_simple", "plants", "animals"]

Would it be easiest to update this file for you to get the required information on all variables?

davidorme · 2024-01-29T09:57:27Z

I think for now, we would want to include the same fields as currently in data_variables.toml https://github.com/ImperialCollegeLondon/virtual_rainforest/blob/develop/virtual_rainforest/data_variables.toml plus the type. The file doesn't include types at the moment, and the list is not complete but gives you an idea. Example:
name = "air_temperature"
description = "Air temperature"
unit = "C"
type="DataArray"
initialised_by = ["abiotic", "abiotic_simple"]
updated_by = ["abiotic", "abiotic_simple"]
used_by = [ "abiotic", "abiotic_simple", "plants", "animals"]

I think we'd want to add on axis dimensionality as well. I think we'd like to be able to replace the last three lines by some kind of dynamic system. The first two seem easy - when a model is configured (and hence registered), the required_init_var and update_vars attributes can be used to populate those lists. Not so sure about the last one, unless we add a requirement for a full list of variables used. Can we capture when a registered module simply imports a variable?

davidorme

I'm wondering if we actually need classes for each variable here? We could have a VariableInfo dataclass and then register an instance of that for each name. That way we have a centralised registry but without needing to declare an actual classes for each variable.

In general, we're mostly retrieving variables from Data by a string name and we want to make sure that those strings are known variables with a central definition. So we'd be able to do something like:

var_name = 'air_temperature_ref'
var_info = VARIABLE_REGISTRY[var_name]
var_data = data[var_name]
print(f"Minimum {var_name} = {var_data.min()} {var_info.units}")

I've pushed a sketch to core.variables_alt.py.

dalonsoa · 2024-01-31T09:02:03Z

The simplified version of using directly the Variable dataclass looks good to me, if it fulfills your needs.

My main concern, in relation to @vgro comment is that if initialised_by, updated_by, and used_by are part of the Variable information then it will be challenging to add another model as a bunch of variables will need to be updated manually. I think it will make things easier if this information is part of the model definition and then have a workflow like:

The config information on the models to be run is loaded
Lists of initialised_by, updated_by, and used_by are compiled at runtime from the definition of the models to be run (they need to provide this information as class attributes)
There is some validation done on these lists to ensure that:
- No variable is initialised by two models
- There is no variable used or updated that has not been initialised
- No two models update the same variable (or they do, I don't know if that is a use case)

There might be other checks that are sensible to write that help defining in what sequence models need to run (or at least identify incompatible models). If all these checks pass, then the overall simulation is deemed to be compatible and then the models are initialised - and with them the unique copy of the variables they are in charge of initialising, populating the registry.

davidorme · 2024-01-31T11:57:46Z

I think it will make things easier if this information is part of the model definition and then have a workflow

Yup - I completely agree. The model initialisation would involve some kind of Variable resolution step from the configured model selection.

dalonsoa · 2024-02-08T14:59:46Z

Ok, so I'm booking myself Thursday and Friday next week to work on this. My plan would be:

Step 1 (one PR)

Start from the simplified version created by @davidorme
Add a step somewhere in the Config class after the required models have been pulled out of the configuration to populate the initialised_by, updated_by, and used_by lists. The models might need updating.
Run some validation based initially on the following conditions, raising an error if they are broken:
- No variable is initialised by two models
- There is no variable used or updated that has not been initialised
- No two models update the same variable (or they do, I don't know if that is a use case)

Step 2 (another PR, after the above is merged)

Update the get_model_sequence function to use this information, as well as (or in place of?) the explicit dependencies already in place, to try to come up with the right sequence.
Update the models so they initialise the variables when they are initialised and the registry of initialised variables is populated.

Step 3 (yet another PR)

Update the models so they use the information from this registry.

I'm sure I'm missing something and I will need to improvise, but is there anything obvious I'm forgetting?

davidorme · 2024-02-08T16:09:00Z

Looks good to me - I think it is likely that we will have more than one model updating a variable. Those are likely to be particularly tricky parts of the simulation, so having a warning or some kind of flag that this is allowed for this variable might make sense.

Somewhere there needs to be a module/modules where variables are declared defined. That has to be flexible so that new user models can add to the variable registry, but we might want to make our lives simple by having core.variables. Although given that there will be a registry it doesn't really matter where a variable is created, other than consistenct.

dalonsoa · 2024-02-15T09:57:32Z

The variables will be declared by the model that initialises them, and then added automatically to the registry, right? The models using or updating the variables will get them from the registry. So, we do not really need to specify where to declare the variables. Am I missing anything?

davidorme · 2024-02-15T10:25:06Z

That's tricky actually! We have initial input data - so the forcing variables provided from file - that are loaded in the initial data setup and configuration. Then the internal variables have to be initialised by a model and are then read or updated by a model (or models). So there are a couple of places that variables should be registered from.

I'm wondering if it might make sense for both the models and the core to have variables.py files that can then be handled during the model registration process. That is used to always load the core but then only loads configured models.

dalonsoa · 2024-02-15T10:38:29Z

Requiring a variables.py to initialise the variables sounds like a good plan. Let's see if I can figure out how to put this together.

davidorme · 2024-02-15T11:21:31Z

I think it makes for a neatly modular setup. Those variable registrations will be quite verbose and keeping them out of the model file seems desirable. It's also then a very clear step in the model creation guide - before creating your model, you will need to think about variables and units and populate variables.py.

dalonsoa · 2024-02-15T16:16:30Z

Closing this one as work is moving on in a separate PR. Opened an issue to monitor progress.

dalonsoa added 3 commits January 23, 2024 10:54

Add initial version of Variables

8a5e49a

Add another debug check

d309849

Remove fields from init

a0337d5

dalonsoa marked this pull request as draft January 23, 2024 11:06

dalonsoa requested review from davidorme, jacobcook1995, robewers01 and vgro January 23, 2024 11:07

Fix typo

f8f24a8

Make variables inmutbale

c147b5d

vgro reviewed Jan 29, 2024

View reviewed changes

davidorme reviewed Jan 29, 2024

View reviewed changes

Alternative sketch of variable registry

4860a31

Merge branch 'develop' into variables

43282b0

This was referenced Feb 15, 2024

Add variables setup #388

Merged

Setup Variables (umbrella) #389

Closed

dalonsoa closed this Feb 15, 2024

davidorme mentioned this pull request Mar 1, 2024

Clarify role of required init vars #349

Closed

dalonsoa deleted the variables branch June 10, 2024 05:53

Conversation

dalonsoa commented Jan 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Outstanding questions

Type of change

Key checklist

Further checks

Uh oh!

codecov-commenter commented Jan 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jacobcook1995 commented Jan 24, 2024

Uh oh!

vgro commented Jan 29, 2024

Uh oh!

vgro Jan 29, 2024

Choose a reason for hiding this comment

Uh oh!

davidorme Jan 29, 2024

Choose a reason for hiding this comment

Uh oh!

vgro commented Jan 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidorme commented Jan 29, 2024

Uh oh!

davidorme left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dalonsoa commented Jan 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidorme commented Jan 31, 2024

Uh oh!

dalonsoa commented Feb 8, 2024

Step 1 (one PR)

Step 2 (another PR, after the above is merged)

Step 3 (yet another PR)

Uh oh!

davidorme commented Feb 8, 2024

Uh oh!

dalonsoa commented Feb 15, 2024

Uh oh!

davidorme commented Feb 15, 2024

Uh oh!

dalonsoa commented Feb 15, 2024

Uh oh!

davidorme commented Feb 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dalonsoa commented Feb 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dalonsoa commented Jan 23, 2024 •

edited

Loading

codecov-commenter commented Jan 23, 2024 •

edited

Loading

vgro commented Jan 29, 2024 •

edited

Loading

davidorme left a comment •

edited

Loading

dalonsoa commented Jan 31, 2024 •

edited

Loading

davidorme commented Feb 15, 2024 •

edited

Loading