Track high level performance of training in mlflow

### Describe the task. Describe the task. It can be a feature, a set of experiments, documentation, etc.

We print in the logs some telemetry bits with the duration of various training tasks (reading data, running the encoder, decoder, etc.). We should collect them in mlflow to track performance over time. The current bottlenecks are unclear to me.

The goal of this issue:
- propose a few key durations investigate
- propose the mlflow schema to store them
- implement it in the training pipeline

Guidelines:
- just a few high level metrics. Going deeper can be done with NV nsigts + pytorch profiler + flamegraphs etc. 
- we already log the config, including all the information with number of channels etc. no need log that


### Hedgedoc URL, if you are keeping notes, plots, logs in hedgedoc.

_No response_

### URL to the design document

_No response_

### Area

- [ ] datasets, data readers, data preparation and transfer
- [ ] model
- [ ] science
- [x] infrastructure and engineering
- [ ] evaluation, export and visualization
- [ ] documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track high level performance of training in mlflow #1930

Describe the task. Describe the task. It can be a feature, a set of experiments, documentation, etc.

Hedgedoc URL, if you are keeping notes, plots, logs in hedgedoc.

URL to the design document

Area

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Track high level performance of training in mlflow #1930

Description

Describe the task. Describe the task. It can be a feature, a set of experiments, documentation, etc.

Hedgedoc URL, if you are keeping notes, plots, logs in hedgedoc.

URL to the design document

Area

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions