The overridden def training_step() in line 76 of collect_bert_task_grads.py, backwards the loss to the model with self.accelerator.backward(loss) (line 95). As far as I have read the hugging face trainer class. This loss is later used in Trainer.train() to update the model weights. Thus the model weights are changing after every mini_batch, which might lead to incorrect fisher calculation. I have tried running the collect_bert_task_grads.py but the trainer doesn't work due to unexpected argument. Though I am not worried about that, but want to confirm was the model weights getting trained while collecting gradients to calculate fisher information matrix?
If Yes won't it make the fisher calculation wrong?
Here are references to the Hugging face trainer which calls the optimizer.step() function:
function definition of inner_training_loop() --> Link
call to your overridden training set inside inner_training_loop() --> Link
calling optimizer.step() on the model --> Link