Skip to content

feat!: Rewrite dataset Schema to fix issue 911#916

Merged
anaprietonem merged 11 commits intomainfrom
fix/dataset_schema_911
Feb 20, 2026
Merged

feat!: Rewrite dataset Schema to fix issue 911#916
anaprietonem merged 11 commits intomainfrom
fix/dataset_schema_911

Conversation

@anaprietonem
Copy link
Contributor

@anaprietonem anaprietonem commented Feb 18, 2026

Description

Breaking change to dataloader dataset config: per-dataset reader options now use dataset_config with dataset as the source key (inner key), aligned with open_dataset({"dataset": ..., ...}).
This PR updates:

  • schema validation (dataset_config/dataset only)
  • dataset loading flow
  • default dataloader templates (native_grid, multi)
  • unit tests
  • docs and migration guidance.
    Old dataset/name shapes are no longer supported.

What problem does this change solve?

#911

What issue or task does this change relate to?

#911

Additional notes

As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/

By opening this pull request, I affirm that all authors agree to the Contributor License Agreement.


📚 Documentation preview 📚: https://anemoi-training--916.org.readthedocs.build/en/916/


📚 Documentation preview 📚: https://anemoi-graphs--916.org.readthedocs.build/en/916/


📚 Documentation preview 📚: https://anemoi-models--916.org.readthedocs.build/en/916/

@anaprietonem anaprietonem added ATS Approval Needed Approval needed by ATS labels Feb 19, 2026
@anaprietonem anaprietonem requested review from JPXKQX, VeraChristina, dietervdb-meteo and icedoom888 and removed request for icedoom888 February 19, 2026 15:52
@icedoom888
Copy link
Contributor

LGTM, running some jobs with this rn

@dietervdb-meteo
Copy link
Contributor

Ok, for me as well. [didn't have time to look in detail though]

@anaprietonem anaprietonem force-pushed the fix/dataset_schema_911 branch from 4669d8e to 502e372 Compare February 20, 2026 08:51
@anaprietonem anaprietonem added ATS Approved Approved by ATS and removed ATS Approval Needed Approval needed by ATS labels Feb 20, 2026
@anaprietonem
Copy link
Contributor Author

@ecmwf/anemoi_technical_subgroup - this PR was discussed an agreed that the breaking changes were accepted. Implementation details should be discussed at PR level.

@anaprietonem
Copy link
Contributor Author

Many thanks @dietervdb-meteo @icedoom888! I am running now the integration tests to confirm. Dieter, the current implementation leaves start/end to be outside keys (not inside the dataset_config). There is check/test that those can't be passed in the inside level to avoid confusion. What do you think about this ?

@dietervdb-meteo
Copy link
Contributor

Hi @anaprietonem, thanks. I found peace with leaving start and end as separate arguments :) So far so good. Not sure if a check on forbidding them inside is a good idea? Are we sure this still allows use cases where e.g. a user concatenates two datasets by providing start and end for each of them, then selects sub periods for training and validation by providing start and end in the dataloader?

@dietervdb-meteo
Copy link
Contributor

I.e. although I see the point of a check to avoid confusion, we should make sure it doesn't exclude valid use cases that existed before. Maybe a warning? Or just leave it as is? [We had the 'confusing' situation before as well]

Copy link
Contributor

@dietervdb-meteo dietervdb-meteo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, with the current config layout. As @JPXKQX wrote we can revisit at a later stage in a wider context.

@icedoom888
Copy link
Contributor

Tested and working!

@github-project-automation github-project-automation bot moved this from To be triaged to For merging in Anemoi-dev Feb 20, 2026
@anaprietonem anaprietonem merged commit b198783 into main Feb 20, 2026
14 checks passed
@anaprietonem anaprietonem deleted the fix/dataset_schema_911 branch February 20, 2026 14:00
@github-project-automation github-project-automation bot moved this from For merging to Done in Anemoi-dev Feb 20, 2026
@DeployDuck DeployDuck mentioned this pull request Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants