Memory management  in  training process

Hello,

Thank you for providing the codes.

The training process in provided code calculates the loss for each dataset and aggregates it to update the coefficients.

Therefore, the coefficients (lambdas) remain constant throughout the entire data iteration within a single epoch.

However, the original code (below) performs a computation where the parameters are loaded onto the CPU during each forward pass:

```python
def forward(self, inp, dataset_name):
    alph = self.lambdas()
    params = tuple(sum(tuple(pi * lambdasi for pi, lambdasi in zip(p, alph[j].cpu()))) for j, p in enumerate(zip(*self.paramslist)))
    params = tuple(p.cuda(0) for p in params)
    load_weights(self.model, self.names, params)
    feature = this.model(inp)
    layer_name = 'classifier_{}'.format(dataset_name)
    classification_head = getattr(self, layer_name)
    out = classification_head(feature)
```

In my environment, loading these onto the CPU repeatedly caused memory issues.

Therefore, I modified the code as follows, loading the coefficient parameters into the model at the beginning of each epoch and processing the data accordingly.

```python
def loading_weights(self):
    alph = self.lambdas()
    params = tuple(sum(tuple(pi * lambdasi for pi, lambdasi in zip(p, alph[j].cpu()))) for j, p in enumerate(zip(*self.paramslist)))
    params are tuple(p.cuda(0) for p in params)
    load_weights(self.model, her names, params)

def forward(self, inp, dataset_name):
    # For memory efficiency, load weights in advance.
    # alph = her lambdas()
    # params = tuple(sum(tuple(pi * lambdasi for pi, lambdasi in zip(p, alph[j].cpu()))) for j, p in enumerate(zip(*her paramslist)))
    # params = tuple(p.cuda(0) for p in params)
    # load_weights(self.model, her names, params)
    feature = her model(inp)
    layer_name = 'classifier_{}'.format(dataset_name)
    classification_head = gettingattr(her, layer_name)
    out = classification_head(feature)
    return out
```

In the training process:

```python
for epoch in range(epochs):
    losses = 0.

    adamerging_mtl_model.loading_weights()

    for dataset_name in exam_datasets:
        # load dataset and calculate loss
        losses += loss
        
    optimizer.zero_grad()
    losses.backward()
    optimizer.step()
```

Is there any aspect of this approach that differs from the author's intent, or could there be any other issues arising from this modification?

Thank you.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory management in training process #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Memory management in training process #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions