Skip to content

feat(mlflow auth): Support for multiple servers#217

Merged
gmertes merged 12 commits intomainfrom
feat/188-multiple-mlflow-servers
Sep 30, 2025
Merged

feat(mlflow auth): Support for multiple servers#217
gmertes merged 12 commits intomainfrom
feat/188-multiple-mlflow-servers

Conversation

@gmertes
Copy link
Member

@gmertes gmertes commented Sep 16, 2025

Description

Implement storing tokens from multiple mlflow servers on disk, so that the user can switch between them without having to login every time.

For example, the following snippet will log in once to a prod and test server, store both server tokens on disk, and authentication can then happen to both servers without having to log in again:

from anemoi.utils.mlflow.auth import TokenAuth

# log in to prod and test, stores both tokens on disk
TokenAuth("http://prod.server").login()
TokenAuth("http://test.server").login()

# read the respective server tokens from disk and authenticate
TokenAuth("http://prod.server").authenticate()
TokenAuth("http://test.server").authenticate()

Practically for users of anemoi-training: it means your training runs can switch seamlessly between logging to prod and test (assuming you logged in to both servers once before).

Notes

Pydantic is used to validate the data on disk. This file is loaded into a ServerStore, which itself is a RootModel dictionary of ServerConfig objects, where the URL of each server is the root dictionary key. Each ServerConfig contains the token information for a given server.

I am deprecating the load_config function, which used to directly return the underlying data format. That function was there so that downstream code in anemoi-training could get the last used server URL from the config file. Now that there are multiple servers, that is handled by a different abstraction get_servers. There's also no need for the underlying data format to be part of the public API. But for now it is still there in deprecated form, and everything is still backwards compatible.

After merging this PR I will update the downstream code in anemoi-training to stop using load_config, and then we can eventually remove it after some time.

Data format

The legacy data format stored in the json was just a simple dictionary with all information for one server:

{
  "url": "https://server.url",
  "refresh_token": "refresh-token",
  "refresh_expires": 123
}

The format has now changed into a dictionary of servers, where url is no longer a member but used as index:

{
  "https://server-1.url": {
      "refresh_token": "refresh-token-1",
      "refresh_expires": 1,
  },
  "https://server-2.url": {
      "refresh_token": "refresh-token-2",
      "refresh_expires": 2,
  },
}

Going from legacy to new format should be seamless for users, config files will be converted into the new format next time a user logs in to a server. There are tests for both legacy and new format.

What issue or task does this change relate to?

#188

As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/

By opening this pull request, I affirm that all authors agree to the Contributor License Agreement.

@gmertes gmertes linked an issue Sep 16, 2025 that may be closed by this pull request
@github-project-automation github-project-automation bot moved this to To be triaged in Anemoi-dev Sep 16, 2025
@github-actions github-actions bot added the tests label Sep 16, 2025
Copy link
Collaborator

@anaprietonem anaprietonem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left question about the use of this feature with syncing but otherwise LGTM!

@gmertes gmertes merged commit 8ccfb1a into main Sep 30, 2025
70 checks passed
@gmertes gmertes deleted the feat/188-multiple-mlflow-servers branch September 30, 2025 09:45
@github-project-automation github-project-automation bot moved this from To be triaged to Done in Anemoi-dev Sep 30, 2025
anaprietonem pushed a commit that referenced this pull request Sep 30, 2025
🤖 Automated Release PR

This PR was created by `release-please` to prepare the next release.
Once merged:

1. A new version tag will be created
2. A GitHub release will be published
3. The changelog will be updated

Changes to be included in the next release:
---


##
[0.4.37](0.4.36...0.4.37)
(2025-09-30)


### Features

* **mlflow auth:** Support for multiple servers
([#217](#217))
([8ccfb1a](8ccfb1a))


### Bug Fixes

* Update s3 chunk size to 10 MB
([#220](#220))
([aa20fa8](aa20fa8))
* Use `yaml` and `json` flag in metadata get command
([#222](#222))
([6af46c4](6af46c4))

---
> [!IMPORTANT]
> Please do not change the PR title, manifest file, or any other
automatically generated content in this PR unless you understand the
implications. Changes here can break the release process.
> ⚠️ Merging this PR will:
> - Create a new release
> - Trigger deployment pipelines
> - Update package versions

 **Before merging:**
 - Ensure all tests pass
 - Review the changelog carefully
 - Get required approvals

[Release-please
documentation](https://github.com/googleapis/release-please)
gmertes added a commit to ecmwf/anemoi-core that referenced this pull request Oct 1, 2025
## Description
Update the `mlflow login` command and anemoi-utils dependency to use the
multi-server functionality introduced in ecmwf/anemoi-utils#217

Also, introduce two new options that make use of this:
- `--list` list all known servers and their expiry time
- `--all` log in to all known servers

***As a contributor to the Anemoi framework, please ensure that your
changes include unit tests, updates to any affected dependencies and
documentation, and have been tested in a parallel setting (i.e., with
multiple GPUs). As a reviewer, you are also responsible for verifying
these aspects and requesting changes if they are not adequately
addressed. For guidelines about those please refer to
https://anemoi.readthedocs.io/en/latest/***

By opening this pull request, I affirm that all authors agree to the
[Contributor License
Agreement.](https://github.com/ecmwf/codex/blob/main/Legal/contributor_license_agreement.md)
ssmmnn11 pushed a commit to ecmwf/anemoi-core that referenced this pull request Oct 7, 2025
## Description
Update the `mlflow login` command and anemoi-utils dependency to use the
multi-server functionality introduced in ecmwf/anemoi-utils#217

Also, introduce two new options that make use of this:
- `--list` list all known servers and their expiry time
- `--all` log in to all known servers

***As a contributor to the Anemoi framework, please ensure that your
changes include unit tests, updates to any affected dependencies and
documentation, and have been tested in a parallel setting (i.e., with
multiple GPUs). As a reviewer, you are also responsible for verifying
these aspects and requesting changes if they are not adequately
addressed. For guidelines about those please refer to
https://anemoi.readthedocs.io/en/latest/***

By opening this pull request, I affirm that all authors agree to the
[Contributor License
Agreement.](https://github.com/ecmwf/codex/blob/main/Legal/contributor_license_agreement.md)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Store tokens of multiple mlflow servers

2 participants