feat(mlflow auth): Support for multiple servers by gmertes · Pull Request #217 · ecmwf/anemoi-utils

gmertes · 2025-09-16T15:29:36Z

Description

Implement storing tokens from multiple mlflow servers on disk, so that the user can switch between them without having to login every time.

For example, the following snippet will log in once to a prod and test server, store both server tokens on disk, and authentication can then happen to both servers without having to log in again:

from anemoi.utils.mlflow.auth import TokenAuth

# log in to prod and test, stores both tokens on disk
TokenAuth("http://prod.server").login()
TokenAuth("http://test.server").login()

# read the respective server tokens from disk and authenticate
TokenAuth("http://prod.server").authenticate()
TokenAuth("http://test.server").authenticate()

Practically for users of anemoi-training: it means your training runs can switch seamlessly between logging to prod and test (assuming you logged in to both servers once before).

Notes

Pydantic is used to validate the data on disk. This file is loaded into a ServerStore, which itself is a RootModel dictionary of ServerConfig objects, where the URL of each server is the root dictionary key. Each ServerConfig contains the token information for a given server.

I am deprecating the load_config function, which used to directly return the underlying data format. That function was there so that downstream code in anemoi-training could get the last used server URL from the config file. Now that there are multiple servers, that is handled by a different abstraction get_servers. There's also no need for the underlying data format to be part of the public API. But for now it is still there in deprecated form, and everything is still backwards compatible.

After merging this PR I will update the downstream code in anemoi-training to stop using load_config, and then we can eventually remove it after some time.

Data format

The legacy data format stored in the json was just a simple dictionary with all information for one server:

{
  "url": "https://server.url",
  "refresh_token": "refresh-token",
  "refresh_expires": 123
}

The format has now changed into a dictionary of servers, where url is no longer a member but used as index:

{
  "https://server-1.url": {
      "refresh_token": "refresh-token-1",
      "refresh_expires": 1,
  },
  "https://server-2.url": {
      "refresh_token": "refresh-token-2",
      "refresh_expires": 2,
  },
}

Going from legacy to new format should be seamless for users, config files will be converted into the new format next time a user logs in to a server. There are tests for both legacy and new format.

What issue or task does this change relate to?

#188

As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/

By opening this pull request, I affirm that all authors agree to the Contributor License Agreement.

src/anemoi/utils/mlflow/auth.py

anaprietonem

left question about the use of this feature with syncing but otherwise LGTM!

🤖 Automated Release PR This PR was created by `release-please` to prepare the next release. Once merged: 1. A new version tag will be created 2. A GitHub release will be published 3. The changelog will be updated Changes to be included in the next release: --- ## [0.4.37](0.4.36...0.4.37) (2025-09-30) ### Features * **mlflow auth:** Support for multiple servers ([#217](#217)) ([8ccfb1a](8ccfb1a)) ### Bug Fixes * Update s3 chunk size to 10 MB ([#220](#220)) ([aa20fa8](aa20fa8)) * Use `yaml` and `json` flag in metadata get command ([#222](#222)) ([6af46c4](6af46c4)) --- > [!IMPORTANT] > Please do not change the PR title, manifest file, or any other automatically generated content in this PR unless you understand the implications. Changes here can break the release process. > ⚠️ Merging this PR will: > - Create a new release > - Trigger deployment pipelines > - Update package versions **Before merging:** - Ensure all tests pass - Review the changelog carefully - Get required approvals [Release-please documentation](https://github.com/googleapis/release-please)

## Description Update the `mlflow login` command and anemoi-utils dependency to use the multi-server functionality introduced in ecmwf/anemoi-utils#217 Also, introduce two new options that make use of this: - `--list` list all known servers and their expiry time - `--all` log in to all known servers ***As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/*** By opening this pull request, I affirm that all authors agree to the [Contributor License Agreement.](https://github.com/ecmwf/codex/blob/main/Legal/contributor_license_agreement.md)

gmertes added 7 commits September 15, 2025 11:32

multi-server config format

b504ee6

add tests

791d717

use pydantic

54eaa0a

fix empty config

212e3ec

refactor store handling

a6b78b8

update tests

3d92ce9

deprecate load_config

4c5d15d

gmertes requested a review from anaprietonem September 16, 2025 15:29

gmertes linked an issue Sep 16, 2025 that may be closed by this pull request

Store tokens of multiple mlflow servers #188

Closed

github-project-automation bot added this to Anemoi-dev Sep 16, 2025

github-project-automation bot moved this to To be triaged in Anemoi-dev Sep 16, 2025

github-actions bot added the tests label Sep 16, 2025

gmertes added the ATS Approval not needed label Sep 16, 2025

gmertes and others added 4 commits September 16, 2025 16:41

Merge branch 'main' into feat/188-multiple-mlflow-servers

e0aa060

typo

b831dc9

update get_servers

a1f0bb1

fix test

ce156b5

anaprietonem reviewed Sep 29, 2025

View reviewed changes

src/anemoi/utils/mlflow/auth.py Show resolved Hide resolved

anaprietonem reviewed Sep 29, 2025

View reviewed changes

src/anemoi/utils/mlflow/auth.py Show resolved Hide resolved

anaprietonem approved these changes Sep 29, 2025

View reviewed changes

Merge branch 'main' into feat/188-multiple-mlflow-servers

4e3e72e

gmertes merged commit 8ccfb1a into main Sep 30, 2025
70 checks passed

gmertes deleted the feat/188-multiple-mlflow-servers branch September 30, 2025 09:45

github-project-automation bot moved this from To be triaged to Done in Anemoi-dev Sep 30, 2025

DeployDuck mentioned this pull request Sep 30, 2025

chore(main): Release 0.4.37 #221

Merged

gmertes mentioned this pull request Sep 30, 2025

feat(mlflow login): Support for multiple servers ecmwf/anemoi-core#573

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mlflow auth): Support for multiple servers#217

feat(mlflow auth): Support for multiple servers#217
gmertes merged 12 commits intomainfrom
feat/188-multiple-mlflow-servers

gmertes commented Sep 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

anaprietonem left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gmertes commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Notes

Data format

What issue or task does this change relate to?

Uh oh!

Uh oh!

Uh oh!

anaprietonem left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gmertes commented Sep 16, 2025 •

edited

Loading