feat(mlflow auth): Support for multiple servers#217
Merged
Conversation
anaprietonem
approved these changes
Sep 29, 2025
Collaborator
anaprietonem
left a comment
There was a problem hiding this comment.
left question about the use of this feature with syncing but otherwise LGTM!
anaprietonem
pushed a commit
that referenced
this pull request
Sep 30, 2025
🤖 Automated Release PR This PR was created by `release-please` to prepare the next release. Once merged: 1. A new version tag will be created 2. A GitHub release will be published 3. The changelog will be updated Changes to be included in the next release: --- ## [0.4.37](0.4.36...0.4.37) (2025-09-30) ### Features * **mlflow auth:** Support for multiple servers ([#217](#217)) ([8ccfb1a](8ccfb1a)) ### Bug Fixes * Update s3 chunk size to 10 MB ([#220](#220)) ([aa20fa8](aa20fa8)) * Use `yaml` and `json` flag in metadata get command ([#222](#222)) ([6af46c4](6af46c4)) --- > [!IMPORTANT] > Please do not change the PR title, manifest file, or any other automatically generated content in this PR unless you understand the implications. Changes here can break the release process. >⚠️ Merging this PR will: > - Create a new release > - Trigger deployment pipelines > - Update package versions **Before merging:** - Ensure all tests pass - Review the changelog carefully - Get required approvals [Release-please documentation](https://github.com/googleapis/release-please)
gmertes
added a commit
to ecmwf/anemoi-core
that referenced
this pull request
Oct 1, 2025
## Description Update the `mlflow login` command and anemoi-utils dependency to use the multi-server functionality introduced in ecmwf/anemoi-utils#217 Also, introduce two new options that make use of this: - `--list` list all known servers and their expiry time - `--all` log in to all known servers ***As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/*** By opening this pull request, I affirm that all authors agree to the [Contributor License Agreement.](https://github.com/ecmwf/codex/blob/main/Legal/contributor_license_agreement.md)
ssmmnn11
pushed a commit
to ecmwf/anemoi-core
that referenced
this pull request
Oct 7, 2025
## Description Update the `mlflow login` command and anemoi-utils dependency to use the multi-server functionality introduced in ecmwf/anemoi-utils#217 Also, introduce two new options that make use of this: - `--list` list all known servers and their expiry time - `--all` log in to all known servers ***As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/*** By opening this pull request, I affirm that all authors agree to the [Contributor License Agreement.](https://github.com/ecmwf/codex/blob/main/Legal/contributor_license_agreement.md)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Implement storing tokens from multiple mlflow servers on disk, so that the user can switch between them without having to login every time.
For example, the following snippet will log in once to a prod and test server, store both server tokens on disk, and authentication can then happen to both servers without having to log in again:
Practically for users of anemoi-training: it means your training runs can switch seamlessly between logging to prod and test (assuming you logged in to both servers once before).
Notes
Pydantic is used to validate the data on disk. This file is loaded into a
ServerStore, which itself is a RootModel dictionary ofServerConfigobjects, where the URL of each server is the root dictionary key. EachServerConfigcontains the token information for a given server.I am deprecating the
load_configfunction, which used to directly return the underlying data format. That function was there so that downstream code in anemoi-training could get the last used server URL from the config file. Now that there are multiple servers, that is handled by a different abstractionget_servers. There's also no need for the underlying data format to be part of the public API. But for now it is still there in deprecated form, and everything is still backwards compatible.After merging this PR I will update the downstream code in anemoi-training to stop using
load_config, and then we can eventually remove it after some time.Data format
The legacy data format stored in the json was just a simple dictionary with all information for one server:
{ "url": "https://server.url", "refresh_token": "refresh-token", "refresh_expires": 123 }The format has now changed into a dictionary of servers, where
urlis no longer a member but used as index:{ "https://server-1.url": { "refresh_token": "refresh-token-1", "refresh_expires": 1, }, "https://server-2.url": { "refresh_token": "refresh-token-2", "refresh_expires": 2, }, }Going from legacy to new format should be seamless for users, config files will be converted into the new format next time a user logs in to a server. There are tests for both legacy and new format.
What issue or task does this change relate to?
#188
As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/
By opening this pull request, I affirm that all authors agree to the Contributor License Agreement.