New configuration generator by davidorme · Pull Request #1093 · ImperialCollegeLondon/virtual_ecosystem

davidorme · 2025-10-16T20:31:09Z

Description

This PR:

Adds the core.config_builder, which is a rework of core.config but using the new pydantic configuration classes.
It retains - with some minor alterations - the main helper functions from core.config for loading and merging dictionaries.
It adds the generate_configuration method, which takes the a compiled config dictionary and does two things:
- It dynamically combines the core and model pydantic configuration models into a simulation specific validation model.
- It passes the config data into that model, looks for any validation issues, and returns a validated pydantic dataclass containing the complete configuration settings for the simulation.
The existing home-brew Config class, which is a dict of config settings with a bunch of loading methods, is converted to ConfigurationLoader. Creating a class instance functions almost identically to Config and builds a single compiled configuration dictionary but this is no longer the endpoint. The ConfigurationLoader.get_configuration() method is used to pass the loaded data into generate_configuration and return the validated dataclass. This step now applies all of the validation that was previously handled through JSON Schema.
The ve_run function is extended to insert this new mechanism alongside the existing config loader. That turned out to work right out the box - the API for handling the inputs is basically the same, but now we can turn them into an integrated config model.
The PR updates the names of the model_name.model_config.ModelConfiguration classes to be model specific, using a simple expected name pattern from the model name model_name --> ModelNameConfiguration.
Tests - basically just a restructure of test_config.py for the new architecture.

LATE BREAKING DESIGN QUESTION

This needs resolving before a more wholesale review of the PR

Rolling out this change into the next steps of the configuration meta-issue #1085 has brought up a problem with using a dynamically generated configuration class. It definitely feels like the right thing to do to validate the configuration for the whole simulation once and to do that early - that's what the code currently does.

But... because the resulting Pydantic model instance is generated dynamically, the code does not explicitly define the class fields. We have to use a generic high level typing (Configuration) and then mypy makes it unusable because the nested fields are unknown. For example, if you then try to use configuration.plants.constants in PlantsModel.from_config(..., configuration, ...), the mypy throws all its toys out the cot, because it doesn't know that the plants attribute exists.

Ways forward:

We can type all of the configuration stuff as Any. That will shut mypy up but equally we lose type checking on a core system. This is bad.
We could validate early as is, and then say basically "OK - we know the data is good, we can then pass around the ConfigurationLoader and extract the model specific dictionary from data to get the specific model configuration instances inside models". That seems repetitive to me - we could use pydantic's BaseModel.model_construct() method to bypass validating, it still wasteful.
```
plants_config: ModelConfiguration = ModelConfiguration.model_construct(
    configuration_loader_instance.data['plants']
)
```
We could drop the early validation and basically do the same as above, but validation only occurs at the Model level. It is possible that some models then need to load multiple sections and so validation gets repeated. This seems kinda clunky though and defers finding configuration problems until late in the startup. One of the key advantages to switching to pydantic is to be able to get the configuration validated right out of the gate.
I think my preferred solution is that we create a specific Pydantic class to use as the base class for the compiled Configuration that provides an accessor for the first layer of fields:
```
class CompiledConfiguration(Configuration):
    
    def get_subconfiguration(self, name: str):
        # handle core vs model.name
        try:
            return getattr(name)(self)
        except AttributeError:
            raise AttributeError(f"Model configuration for {name} not loaded")
```
The in the from_config methods, we can write:
```
plants_config: ModelConfiguration = configuration.get_subconfiguration("plants")
```
Obviously you have to correctly type the output from that method with the actual ModelConfiguration class, but then mypy know about all the class internals and we've only validated once.

I think this option also requires a change to the ModelConfiguration naming style. Before I spotted this issue, I'd intended that people could just use configuration.plants.etc and so the different implementations between models of plants.model_config.ModelConfiguration and abiotic.model_config.ModelConfiguration could be invisible. If we need to use those classes for typing within model code, having them all called the same thing is going to get awfully confusing and we're going to have to import ... as PlantsConfiguration if a model needs access to configuration settings from other models. So, I think it would be clearer if the classes actually have model specific names (as @dalonsoa actually originally suggested for config discovery).

TODO

I haven't quite made up my mind about the naming of all these things. I might wait until taking out the old system in Remove old configuration and constants systems #1091 when the existing namespace becomes less cluttered.
The docs will be wildly inaccurate - that is probably another PR after the dust settles in Remove old configuration and constants systems #1091

Fixes #1087
Fixes #1088

Type of change

New feature (non-breaking change which adds functionality)
Optimization (back-end change that speeds up the code)
Bug fix (non-breaking change which fixes an issue)

Key checklist

Make sure you've run the pre-commit checks: $ pre-commit run -a
All tests pass: $ poetry run pytest

Further checks

Code is commented, particularly in hard-to-understand areas
Tests added that prove fix is effective or that feature works
Relevant documentation reviewed and updated

codecov-commenter · 2025-10-16T20:39:57Z

Codecov Report

❌ Patch coverage is 91.81818% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.52%. Comparing base (ad53286) to head (5e1477f).
⚠️ Report is 3 commits behind head on develop.

Files with missing lines	Patch %	Lines
virtual_ecosystem/core/config_builder.py	93.71%	12 Missing ⚠️
virtual_ecosystem/core/configuration.py	60.00%	4 Missing ⚠️
virtual_ecosystem/core/registry.py	71.42%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1093      +/-   ##
===========================================
- Coverage    94.60%   94.52%   -0.08%     
===========================================
  Files           89       90       +1     
  Lines         7651     7856     +205     
===========================================
+ Hits          7238     7426     +188     
- Misses         413      430      +17

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

davidorme · 2025-10-20T06:18:18Z

@dalonsoa Could you have a quick look at this to get your thoughts on the design question in the PR description? The question hinges on this line in the new congfig_builder.py code, which assembles and validates a compiled class with the requested models:

https://github.com/ImperialCollegeLondon/virtual_ecosystem/pull/1093/files#diff-d295f93ddf30694239fe83cab2d4575ddc33b9b0f3163d9dbb88eca8d4138b8aR526-R531

dalonsoa · 2025-10-20T13:11:48Z

4. I think my preferred solution is that we create a specific Pydantic class to use as the base class for the compiled Configuration that provides an accessor for the first layer of fields:
```python
class CompiledConfiguration(Configuration):
    
    def get_subconfiguration(self, name: str):
        # handle core vs model.name
        try:
            return getattr(name)(self)
        except AttributeError:
            raise AttributeError(f"Model configuration for {name} not loaded")
```
The in the from_config methods, we can write:
```python
plants_config: ModelConfiguration = configuration.get_subconfiguration("plants")
```
Obviously you have to correctly type the output from that method with the actual ModelConfiguration class, but then mypy know about all the class internals and we've only validated once.

I don't have it clear at all how you can type annotate get_subconfiguration in such a way that it works for any model. Even if you refactor things so you have PlantsConfiguration, AnimalConfiguration, etc. you would need to indicate the type of the output in a static way, so mypy can handle it.

One option could be to use, in a somewhat hacky way, Generics. It could look something like this (not tested):

from typing import TypeVar, Callable

T = TypeVar("T")
...

    def get_subconfiguration(self, name: str, _: Callable[..., T]) -> T:

        try:
            return getattr(name)(self)
        except AttributeError:
            raise AttributeError(f"Model configuration for {name} not loaded")

And then using it as:

plants_config: PlantsConfiguration = configuration.get_subconfiguration("plants", PlantsConfiguration)

It is a bit hacky because the only purpose of the second input argument is to indicate the type to mypy, and it has no role at runtime, but yet it needs to be there.

I might be missing something, though.

Keeping aside the above, as mypy is an static type checker, I don't see many other options to type annotate this if we want to keep it fully flexible and working regardless of how many models we have on a particular run.

davidorme · 2025-10-20T13:46:07Z

@dalonsoa - indeed you're right and it doesn't work. Running it through into the models still leads to a mypy assignment error.

In my prototyping this afternoon, I've simply been using type: ignore in XYZ.from_config.

        model_configuration: AbioticConfiguration = configuration.get_subconfiguration(
            "abiotic"
        )  # type: ignore[assignment]

Keeping aside the above, as mypy is an static type checker, I don't see many other options to type annotate this if we want to keep it fully flexible and working regardless of how many models we have on a particular run.

Yup - I couldn't see anything either. My strong feeling is that we do want to keep this flexibility. I think the only way to avoid some kind of a mypy hack is to define a compiled configuration class that formally defines fields for all models as ModelConfiguration | None and then only populate the required ones for a given set of models. But that is hard-coding the set of possible models.

I then think it's a toss up between your solution - many thanks! - or my type: ignore[assignment].

I guess, would you agree that the dynamic system is what we want and that one of these two solutions is acceptable to stop mypy crippling what is actually a perfectly reasonable system?

dalonsoa

The changes look sensible. As you said, it does not look that different from the existing config. I've just spotted something that I would recommend to change, if not now, in the future, so approving.

dalonsoa · 2025-10-21T08:03:47Z

virtual_ecosystem/core/config_builder.py

+
+.. code-block:: python
+
+    config_object = ConfigurationLoader(...).get_configuration()


You can equally do:

Suggested change

config_object = ConfigurationLoader(...).get_configuration()

config_object = get_configuration(ConfigurationLoader(...).data)

And save an unnecessary method.

Indeed, this kind of usage of classes where the generated object is directly ditched just after its creation, using ever a function or attribute of the object, is often considered an anti pattern. Why creating an object that you are not using anywhere else?

I would suggest you re-write this using normal functions, passing arguments as needed.

Definitely true on the get_configuration method - will make that change. I can definitely see what you mean about the ConfigurationLoader class being a poltergeist. I'm going to make that a new issue - I'd like to get the new system pushed through so would prefer to park that for now.

davidorme added 2 commits October 16, 2025 15:48

New configuration builder - overrides not implemented yet

be2afca

Relocated pydantic tester and removed spammy file output

18025ed

Base automatically changed from config_via_pydantic to develop October 17, 2025 13:44

Merge branch 'config_via_pydantic' into 1087-new-configuration-generator

936a40a

davidorme linked an issue Oct 17, 2025 that may be closed by this pull request

New configuration generator #1087

Closed

Merge branch 'develop' into 1087-new-configuration-generator

f6d73b7

davidorme mentioned this pull request Oct 17, 2025

Added config_builder to wrong PR #1094

Closed

davidorme added 2 commits October 18, 2025 00:07

Revised usage pattern

c091a37

Merge branch 'develop' into 1087-new-configuration-generator

bee787a

davidorme requested a review from dalonsoa October 20, 2025 06:12

Making changes for proposed option 4 in design Q in #1093

b40cc26

@dalonsoa's cunning type hinting fix

bc23c23

davidorme marked this pull request as ready for review October 20, 2025 15:03

Doc fix

c6dd49d

dalonsoa approved these changes Oct 21, 2025

View reviewed changes

davidorme mentioned this pull request Oct 21, 2025

ConfigurationLoader is a poltergeist. #1100

Open

davidorme added 2 commits October 21, 2025 10:54

Remove micromethod from ConfigLoader

7104dba

Merge branch 'develop' into 1087-new-configuration-generator

5e1477f

davidorme merged commit 5b885bc into develop Oct 21, 2025
13 checks passed

davidorme deleted the 1087-new-configuration-generator branch October 21, 2025 10:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New configuration generator#1093

New configuration generator#1093
davidorme merged 11 commits intodevelopfrom
1087-new-configuration-generator

davidorme commented Oct 16, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Oct 16, 2025 •

edited

Loading

Uh oh!

davidorme commented Oct 20, 2025

Uh oh!

dalonsoa commented Oct 20, 2025

Uh oh!

davidorme commented Oct 20, 2025 •

edited

Loading

Uh oh!

dalonsoa left a comment

Uh oh!

dalonsoa Oct 21, 2025

Uh oh!

davidorme Oct 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		.. code-block:: python

		config_object = ConfigurationLoader(...).get_configuration()

	config_object = ConfigurationLoader(...).get_configuration()
	config_object = get_configuration(ConfigurationLoader(...).data)

Conversation

davidorme commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

LATE BREAKING DESIGN QUESTION

TODO

Type of change

Key checklist

Further checks

Uh oh!

codecov-commenter commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

davidorme commented Oct 20, 2025

Uh oh!

dalonsoa commented Oct 20, 2025

Uh oh!

davidorme commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dalonsoa left a comment

Choose a reason for hiding this comment

Uh oh!

dalonsoa Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

davidorme Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

davidorme commented Oct 16, 2025 •

edited

Loading

codecov-commenter commented Oct 16, 2025 •

edited

Loading

davidorme commented Oct 20, 2025 •

edited

Loading