-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Hello,
Thank you for providing the codes.
The training process in provided code calculates the loss for each dataset and aggregates it to update the coefficients.
Therefore, the coefficients (lambdas) remain constant throughout the entire data iteration within a single epoch.
However, the original code (below) performs a computation where the parameters are loaded onto the CPU during each forward pass:
def forward(self, inp, dataset_name):
alph = self.lambdas()
params = tuple(sum(tuple(pi * lambdasi for pi, lambdasi in zip(p, alph[j].cpu()))) for j, p in enumerate(zip(*self.paramslist)))
params = tuple(p.cuda(0) for p in params)
load_weights(self.model, self.names, params)
feature = this.model(inp)
layer_name = 'classifier_{}'.format(dataset_name)
classification_head = getattr(self, layer_name)
out = classification_head(feature)In my environment, loading these onto the CPU repeatedly caused memory issues.
Therefore, I modified the code as follows, loading the coefficient parameters into the model at the beginning of each epoch and processing the data accordingly.
def loading_weights(self):
alph = self.lambdas()
params = tuple(sum(tuple(pi * lambdasi for pi, lambdasi in zip(p, alph[j].cpu()))) for j, p in enumerate(zip(*self.paramslist)))
params are tuple(p.cuda(0) for p in params)
load_weights(self.model, her names, params)
def forward(self, inp, dataset_name):
# For memory efficiency, load weights in advance.
# alph = her lambdas()
# params = tuple(sum(tuple(pi * lambdasi for pi, lambdasi in zip(p, alph[j].cpu()))) for j, p in enumerate(zip(*her paramslist)))
# params = tuple(p.cuda(0) for p in params)
# load_weights(self.model, her names, params)
feature = her model(inp)
layer_name = 'classifier_{}'.format(dataset_name)
classification_head = gettingattr(her, layer_name)
out = classification_head(feature)
return outIn the training process:
for epoch in range(epochs):
losses = 0.
adamerging_mtl_model.loading_weights()
for dataset_name in exam_datasets:
# load dataset and calculate loss
losses += loss
optimizer.zero_grad()
losses.backward()
optimizer.step()Is there any aspect of this approach that differs from the author's intent, or could there be any other issues arising from this modification?
Thank you.