feat(training): Deprecate tensorboard and Harmonise loggers interface#850
feat(training): Deprecate tensorboard and Harmonise loggers interface#850
Conversation
…ecmwf/anemoi-core into deprecate_tensorboard_and_clean_loggers
|
@anaprietonem |
Hey Alberto, we flagged PR with config breaking changed by adding a ! to the tittle. If I missed that here is because this should have provided backward compatibility. What error are you getting? if you send the error log I can have a look Probably this shouldn't have been flagged as a breaking change PR Alberto, sorry for the oversight |
|
Hey @anaprietonem, no worries! I would get: One for the whole tensorboard entry (which is gone) and another for the missing My question is how do we make this more transparent?
Ideally I would like to know right away how to fix a config, and which config entries have to be updated! here a dumb example generated with Gemini to get the feeling: |
|
Yes so the ticket was initially discussed at ATS and that's the one that had the label #538. Still it's true the PR could have used the ATS approved label too to be more consistent. Regarding your question, it's an imperfect system at the moment and very much dependant on how you have your configs set up. For example if you a top-level config and just overwrite there some parameters - likely you don't get errors cause you get the defaults directly from the source folder (if you get them if should be broken configs in purpose for the user to be aware something major has changed). If you don't use the repo defaults or have a different set up, I appreciate this process is a painful one at the moment. A way to usually get a feeling of what has changed is look at changes in the AICON integration test. The migration system is a good idea. But note with checkpoints we don't migrate them on the fly either (it's still the user responsibility to migrate and understand those migrations). |
|
@anaprietonem FYI no one (except ECMWF probably) is running with configs built within the anemoi-core config folder. Every time a config change is introduced, it triggers a 1/2 hours effort to migrate or change all existing configs (without even mentioning the versioning issue: experiment X was trained using config Y and versions Z, now config Y does not work with version Z.1 etc etc).. I think we should really prioritize a migration function that the user can trigger: |
|
Looking at AICON integration test it's a nice trick, but you can agree this is not really sustainable and doesn't really help out scaling changes to 100 configs |

Description
PR to clean up loggers:
Changes here include proposed changes in #530 (cc: @floriankrb )
What problem does this change solve?
What issue or task does this change relate to?
Additional notes
As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/
By opening this pull request, I affirm that all authors agree to the Contributor License Agreement.