Skip to content

Comments

Change tie_weights to no-op and add docstring for weight tying clarification#503

Merged
2015aroras merged 8 commits intoallenai:mainfrom
djliden:dl/hf-weight-tying
Apr 2, 2024
Merged

Change tie_weights to no-op and add docstring for weight tying clarification#503
2015aroras merged 8 commits intoallenai:mainfrom
djliden:dl/hf-weight-tying

Conversation

@djliden
Copy link
Contributor

@djliden djliden commented Mar 15, 2024

This pull request addresses #485 and updates the tie_weights function in the OLMoForCausalLM class to be a no-op and adds a docstring explaining the reasoning behind this change.

The tie_weights function is being made a no-op because weight tying is already handled explicitly in other parts of the codebase. This change avoids redundant weight tying logic and improves code clarity.

The added docstring explains why the function is a no-op and directs developers to the specific locations where weight tying is handled. This improves code readability and maintainability.

This doesn't fix the warning message from the accelerate library, which expects a fundamentally different mechanism for weight tying.

@2015aroras ready for your initial review—thanks!

djliden and others added 4 commits March 15, 2024 13:04
The `tie_weights` function in the `OLMoForCausalLM` class is now a no-op
because weight tying is already handled in other parts of the codebase:

1. During model initialization in `olmo/model.py`:
   - The `ff_out` layer is conditionally defined based on the `weight_tying`
     configuration using `if not config.weight_tying: self.transformer.update(...)`.

2. When computing logits in the `forward` method:
   - The `wte` weights are used directly if `weight_tying` is enabled using
     `if self.config.weight_tying: logits = F.linear(x, self.transformer.wte.weight, None)`.

Since weight tying is handled explicitly in these places, there is no need
for the `tie_weights` function to perform any additional weight tying.

The function is updated with a docstring that explains the reasoning behind
the no-op and provides references to the relevant code snippets where weight
tying is handled.

This change improves code clarity and avoids redundant weight tying logic.
@2015aroras 2015aroras requested a review from AkshitaB March 15, 2024 22:31
@2015aroras 2015aroras self-requested a review March 18, 2024 23:38
Copy link
Collaborator

@2015aroras 2015aroras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@2015aroras 2015aroras merged commit db2dee2 into allenai:main Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants