Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## develop #371 +/- ##
===========================================
- Coverage 93.30% 92.01% -1.29%
===========================================
Files 59 60 +1
Lines 2866 2906 +40
===========================================
Hits 2674 2674
- Misses 192 232 +40 ☔ View full report in Codecov by Sentry. |
|
The approach looks generally sensible to me. In terms of the first outstanding question I guess one of the main pieces of information we are looking to have recorded for variables is basically which models require particular variables to run properly, and which models create particular variables. The idea being that this would allow us to determine whether a certain combination of models can sensibly be run together and if not to give us information about why not. I don't have a particularly clear idea of where this would take place though |
I think another point we discussed is that such a registry could help with setting the order of selected models in setup and update |
| Overwritting __setattr__ is the way to prevent that both. Using `slots=True` | ||
| does not play well with `__init_subclass__`. | ||
| """ | ||
| raise TypeError("Variables are immutable.") |
There was a problem hiding this comment.
We would like to create/add variables during the setup() and first update(), would this be possible with this function? More generally, when would this registration happen? I think a lot of variables are initialized in the first update call.
There was a problem hiding this comment.
I think we'd either define them in core.variables (which we might want to break up into smaller units somehow because it could get big) or in a model.variables file? We would have to be able to do something like model.variables to support additional models. Then you would import the variables you need from those locations.
|
As far as I can tell, this looks like a good starting point. It isn't clear to me how this works/ when this happens, but I think that is your first question? If we want it to be created automatically from the selected models, it would have to contain all variables that are provided as I think registering the variables by variable name makes sense. Why would you want to protect the variable classes? In other words, why/how would anyone change them? I think for now, we would want to include the same fields as currently in data_variables.toml https://github.com/ImperialCollegeLondon/virtual_rainforest/blob/develop/virtual_rainforest/data_variables.toml plus the type. The file doesn't include types at the moment, and the list is not complete but gives you an idea. Example: Would it be easiest to update this file for you to get the required information on all variables? |
I think we'd want to add on axis dimensionality as well. I think we'd like to be able to replace the last three lines by some kind of dynamic system. The first two seem easy - when a model is configured (and hence registered), the |
There was a problem hiding this comment.
I'm wondering if we actually need classes for each variable here? We could have a VariableInfo dataclass and then register an instance of that for each name. That way we have a centralised registry but without needing to declare an actual classes for each variable.
In general, we're mostly retrieving variables from Data by a string name and we want to make sure that those strings are known variables with a central definition. So we'd be able to do something like:
var_name = 'air_temperature_ref'
var_info = VARIABLE_REGISTRY[var_name]
var_data = data[var_name]
print(f"Minimum {var_name} = {var_data.min()} {var_info.units}")I've pushed a sketch to core.variables_alt.py.
|
The simplified version of using directly the My main concern, in relation to @vgro comment is that if
There might be other checks that are sensible to write that help defining in what sequence models need to run (or at least identify incompatible models). If all these checks pass, then the overall simulation is deemed to be compatible and then the models are initialised - and with them the unique copy of the variables they are in charge of initialising, populating the registry. |
Yup - I completely agree. The model initialisation would involve some kind of |
|
Ok, so I'm booking myself Thursday and Friday next week to work on this. My plan would be: Step 1 (one PR)
Step 2 (another PR, after the above is merged)
Step 3 (yet another PR)
I'm sure I'm missing something and I will need to improvise, but is there anything obvious I'm forgetting? |
|
Looks good to me - I think it is likely that we will have more than one model updating a variable. Those are likely to be particularly tricky parts of the simulation, so having a warning or some kind of flag that this is allowed for this variable might make sense. Somewhere there needs to be a module/modules where variables are declared defined. That has to be flexible so that new user models can add to the variable registry, but we might want to make our lives simple by having |
|
The variables will be declared by the model that initialises them, and then added automatically to the registry, right? The models using or updating the variables will get them from the registry. So, we do not really need to specify where to declare the variables. Am I missing anything? |
|
That's tricky actually! We have initial input data - so the forcing variables provided from file - that are loaded in the initial data setup and configuration. Then the internal variables have to be initialised by a model and are then read or updated by a model (or models). So there are a couple of places that variables should be registered from. I'm wondering if it might make sense for both the models and the core to have |
|
Requiring a |
|
I think it makes for a neatly modular setup. Those variable registrations will be quite verbose and keeping them out of the model file seems desirable. It's also then a very clear step in the model creation guide - before creating your model, you will need to think about variables and units and populate |
|
Closing this one as work is moving on in a separate PR. Opened an issue to monitor progress. |
Description
Initial attempt at creating a Variable class that can be used to define the metadata of all the variables in Virtual Rainforest. For now, a proof of concept to gather feedback on the implementation.
I'm assuming here that:
Mind that this approach prevents to create two variables with the same names, no matter where in the program they are created, so it suppresses any ambiguity.
Outstanding questions
variable.nameok or is shall we use a different field?get_variable_registry()?Fixes # (issue)
Type of change
Key checklist
pre-commitchecks:$ pre-commit run -a$ poetry run pytestFurther checks