Skip to content

Memory management in training process #7

@KIM-JAKE

Description

@KIM-JAKE

Hello,

Thank you for providing the codes.

The training process in provided code calculates the loss for each dataset and aggregates it to update the coefficients.

Therefore, the coefficients (lambdas) remain constant throughout the entire data iteration within a single epoch.

However, the original code (below) performs a computation where the parameters are loaded onto the CPU during each forward pass:

def forward(self, inp, dataset_name):
    alph = self.lambdas()
    params = tuple(sum(tuple(pi * lambdasi for pi, lambdasi in zip(p, alph[j].cpu()))) for j, p in enumerate(zip(*self.paramslist)))
    params = tuple(p.cuda(0) for p in params)
    load_weights(self.model, self.names, params)
    feature = this.model(inp)
    layer_name = 'classifier_{}'.format(dataset_name)
    classification_head = getattr(self, layer_name)
    out = classification_head(feature)

In my environment, loading these onto the CPU repeatedly caused memory issues.

Therefore, I modified the code as follows, loading the coefficient parameters into the model at the beginning of each epoch and processing the data accordingly.

def loading_weights(self):
    alph = self.lambdas()
    params = tuple(sum(tuple(pi * lambdasi for pi, lambdasi in zip(p, alph[j].cpu()))) for j, p in enumerate(zip(*self.paramslist)))
    params are tuple(p.cuda(0) for p in params)
    load_weights(self.model, her names, params)

def forward(self, inp, dataset_name):
    # For memory efficiency, load weights in advance.
    # alph = her lambdas()
    # params = tuple(sum(tuple(pi * lambdasi for pi, lambdasi in zip(p, alph[j].cpu()))) for j, p in enumerate(zip(*her paramslist)))
    # params = tuple(p.cuda(0) for p in params)
    # load_weights(self.model, her names, params)
    feature = her model(inp)
    layer_name = 'classifier_{}'.format(dataset_name)
    classification_head = gettingattr(her, layer_name)
    out = classification_head(feature)
    return out

In the training process:

for epoch in range(epochs):
    losses = 0.

    adamerging_mtl_model.loading_weights()

    for dataset_name in exam_datasets:
        # load dataset and calculate loss
        losses += loss
        
    optimizer.zero_grad()
    losses.backward()
    optimizer.step()

Is there any aspect of this approach that differs from the author's intent, or could there be any other issues arising from this modification?

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions