Skip to content

New configuration generator#1093

Merged
davidorme merged 11 commits intodevelopfrom
1087-new-configuration-generator
Oct 21, 2025
Merged

New configuration generator#1093
davidorme merged 11 commits intodevelopfrom
1087-new-configuration-generator

Conversation

@davidorme
Copy link
Copy Markdown
Collaborator

@davidorme davidorme commented Oct 16, 2025

Description

This PR:

  • Adds the core.config_builder, which is a rework of core.config but using the new pydantic configuration classes.

  • It retains - with some minor alterations - the main helper functions from core.config for loading and merging dictionaries.

  • It adds the generate_configuration method, which takes the a compiled config dictionary and does two things:

    • It dynamically combines the core and model pydantic configuration models into a simulation specific validation model.
    • It passes the config data into that model, looks for any validation issues, and returns a validated pydantic dataclass containing the complete configuration settings for the simulation.
  • The existing home-brew Config class, which is a dict of config settings with a bunch of loading methods, is converted to ConfigurationLoader. Creating a class instance functions almost identically to Config and builds a single compiled configuration dictionary but this is no longer the endpoint. The ConfigurationLoader.get_configuration() method is used to pass the loaded data into generate_configuration and return the validated dataclass. This step now applies all of the validation that was previously handled through JSON Schema.

  • The ve_run function is extended to insert this new mechanism alongside the existing config loader. That turned out to work right out the box - the API for handling the inputs is basically the same, but now we can turn them into an integrated config model.

  • The PR updates the names of the model_name.model_config.ModelConfiguration classes to be model specific, using a simple expected name pattern from the model name model_name --> ModelNameConfiguration.

  • Tests - basically just a restructure of test_config.py for the new architecture.

LATE BREAKING DESIGN QUESTION

This needs resolving before a more wholesale review of the PR

Rolling out this change into the next steps of the configuration meta-issue #1085 has brought up a problem with using a dynamically generated configuration class. It definitely feels like the right thing to do to validate the configuration for the whole simulation once and to do that early - that's what the code currently does.

But... because the resulting Pydantic model instance is generated dynamically, the code does not explicitly define the class fields. We have to use a generic high level typing (Configuration) and then mypy makes it unusable because the nested fields are unknown. For example, if you then try to use configuration.plants.constants in PlantsModel.from_config(..., configuration, ...), the mypy throws all its toys out the cot, because it doesn't know that the plants attribute exists.

Ways forward:

  1. We can type all of the configuration stuff as Any. That will shut mypy up but equally we lose type checking on a core system. This is bad.

  2. We could validate early as is, and then say basically "OK - we know the data is good, we can then pass around the ConfigurationLoader and extract the model specific dictionary from data to get the specific model configuration instances inside models". That seems repetitive to me - we could use pydantic's BaseModel.model_construct() method to bypass validating, it still wasteful.

    plants_config: ModelConfiguration = ModelConfiguration.model_construct(
        configuration_loader_instance.data['plants']
    )
  3. We could drop the early validation and basically do the same as above, but validation only occurs at the Model level. It is possible that some models then need to load multiple sections and so validation gets repeated. This seems kinda clunky though and defers finding configuration problems until late in the startup. One of the key advantages to switching to pydantic is to be able to get the configuration validated right out of the gate.

  4. I think my preferred solution is that we create a specific Pydantic class to use as the base class for the compiled Configuration that provides an accessor for the first layer of fields:

    class CompiledConfiguration(Configuration):
        
        def get_subconfiguration(self, name: str):
            # handle core vs model.name
            try:
                return getattr(name)(self)
            except AttributeError:
                raise AttributeError(f"Model configuration for {name} not loaded")

    The in the from_config methods, we can write:

    plants_config: ModelConfiguration = configuration.get_subconfiguration("plants")

    Obviously you have to correctly type the output from that method with the actual ModelConfiguration class, but then mypy know about all the class internals and we've only validated once.

    I think this option also requires a change to the ModelConfiguration naming style. Before I spotted this issue, I'd intended that people could just use configuration.plants.etc and so the different implementations between models of plants.model_config.ModelConfiguration and abiotic.model_config.ModelConfiguration could be invisible. If we need to use those classes for typing within model code, having them all called the same thing is going to get awfully confusing and we're going to have to import ... as PlantsConfiguration if a model needs access to configuration settings from other models. So, I think it would be clearer if the classes actually have model specific names (as @dalonsoa actually originally suggested for config discovery).

TODO

Fixes #1087
Fixes #1088

Type of change

  • New feature (non-breaking change which adds functionality)
  • Optimization (back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)

Key checklist

  • Make sure you've run the pre-commit checks: $ pre-commit run -a
  • All tests pass: $ poetry run pytest

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works
  • Relevant documentation reviewed and updated

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Oct 16, 2025

Codecov Report

❌ Patch coverage is 91.81818% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.52%. Comparing base (ad53286) to head (5e1477f).
⚠️ Report is 3 commits behind head on develop.

Files with missing lines Patch % Lines
virtual_ecosystem/core/config_builder.py 93.71% 12 Missing ⚠️
virtual_ecosystem/core/configuration.py 60.00% 4 Missing ⚠️
virtual_ecosystem/core/registry.py 71.42% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1093      +/-   ##
===========================================
- Coverage    94.60%   94.52%   -0.08%     
===========================================
  Files           89       90       +1     
  Lines         7651     7856     +205     
===========================================
+ Hits          7238     7426     +188     
- Misses         413      430      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Base automatically changed from config_via_pydantic to develop October 17, 2025 13:44
@davidorme davidorme linked an issue Oct 17, 2025 that may be closed by this pull request
@davidorme davidorme requested a review from dalonsoa October 20, 2025 06:12
@davidorme
Copy link
Copy Markdown
Collaborator Author

@dalonsoa Could you have a quick look at this to get your thoughts on the design question in the PR description? The question hinges on this line in the new congfig_builder.py code, which assembles and validates a compiled class with the requested models:

https://github.com/ImperialCollegeLondon/virtual_ecosystem/pull/1093/files#diff-d295f93ddf30694239fe83cab2d4575ddc33b9b0f3163d9dbb88eca8d4138b8aR526-R531

@dalonsoa
Copy link
Copy Markdown
Collaborator

4. I think my preferred solution is that we create a specific Pydantic class to use as the base class for the compiled Configuration that provides an accessor for the first layer of fields:

```python
class CompiledConfiguration(Configuration):
    
    def get_subconfiguration(self, name: str):
        # handle core vs model.name
        try:
            return getattr(name)(self)
        except AttributeError:
            raise AttributeError(f"Model configuration for {name} not loaded")
```

The in the from_config methods, we can write:

```python
plants_config: ModelConfiguration = configuration.get_subconfiguration("plants")
```

Obviously you have to correctly type the output from that method with the actual ModelConfiguration class, but then mypy know about all the class internals and we've only validated once.

I don't have it clear at all how you can type annotate get_subconfiguration in such a way that it works for any model. Even if you refactor things so you have PlantsConfiguration, AnimalConfiguration, etc. you would need to indicate the type of the output in a static way, so mypy can handle it.

One option could be to use, in a somewhat hacky way, Generics. It could look something like this (not tested):

from typing import TypeVar, Callable

T = TypeVar("T")
...

    def get_subconfiguration(self, name: str, _: Callable[..., T]) -> T:

        try:
            return getattr(name)(self)
        except AttributeError:
            raise AttributeError(f"Model configuration for {name} not loaded")

And then using it as:

plants_config: PlantsConfiguration = configuration.get_subconfiguration("plants", PlantsConfiguration)

It is a bit hacky because the only purpose of the second input argument is to indicate the type to mypy, and it has no role at runtime, but yet it needs to be there.

I might be missing something, though.


Keeping aside the above, as mypy is an static type checker, I don't see many other options to type annotate this if we want to keep it fully flexible and working regardless of how many models we have on a particular run.

@davidorme
Copy link
Copy Markdown
Collaborator Author

davidorme commented Oct 20, 2025

@dalonsoa - indeed you're right and it doesn't work. Running it through into the models still leads to a mypy assignment error.

In my prototyping this afternoon, I've simply been using type: ignore in XYZ.from_config.

        model_configuration: AbioticConfiguration = configuration.get_subconfiguration(
            "abiotic"
        )  # type: ignore[assignment]

Keeping aside the above, as mypy is an static type checker, I don't see many other options to type annotate this if we want to keep it fully flexible and working regardless of how many models we have on a particular run.

Yup - I couldn't see anything either. My strong feeling is that we do want to keep this flexibility. I think the only way to avoid some kind of a mypy hack is to define a compiled configuration class that formally defines fields for all models as ModelConfiguration | None and then only populate the required ones for a given set of models. But that is hard-coding the set of possible models.

I then think it's a toss up between your solution - many thanks! - or my type: ignore[assignment].

I guess, would you agree that the dynamic system is what we want and that one of these two solutions is acceptable to stop mypy crippling what is actually a perfectly reasonable system?

@davidorme davidorme marked this pull request as ready for review October 20, 2025 15:03
Copy link
Copy Markdown
Collaborator

@dalonsoa dalonsoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look sensible. As you said, it does not look that different from the existing config. I've just spotted something that I would recommend to change, if not now, in the future, so approving.


.. code-block:: python

config_object = ConfigurationLoader(...).get_configuration()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can equally do:

Suggested change
config_object = ConfigurationLoader(...).get_configuration()
config_object = get_configuration(ConfigurationLoader(...).data)

And save an unnecessary method.

Indeed, this kind of usage of classes where the generated object is directly ditched just after its creation, using ever a function or attribute of the object, is often considered an anti pattern. Why creating an object that you are not using anywhere else?

I would suggest you re-write this using normal functions, passing arguments as needed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely true on the get_configuration method - will make that change. I can definitely see what you mean about the ConfigurationLoader class being a poltergeist. I'm going to make that a new issue - I'd like to get the new system pushed through so would prefer to park that for now.

@davidorme davidorme merged commit 5b885bc into develop Oct 21, 2025
13 checks passed
@davidorme davidorme deleted the 1087-new-configuration-generator branch October 21, 2025 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add new configuration generator into ve_run New configuration generator

3 participants