Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
cc @jxmorris12 I have an implementation of TinyLoRA if you can kindly have a look? |
githubnemo
left a comment
There was a problem hiding this comment.
Hey @kashif :)
Thanks for the PR, this is already solid.
Merging with main should hopefully resolve the CI errors.
Some questions and comments below.
|
thanks fixing |
Co-authored-by: githubnemo <githubnemo@users.noreply.github.com>
Co-authored-by: githubnemo <githubnemo@users.noreply.github.com>
Co-authored-by: githubnemo <githubnemo@users.noreply.github.com>
Co-authored-by: githubnemo <githubnemo@users.noreply.github.com>
Co-authored-by: githubnemo <githubnemo@users.noreply.github.com>
Co-authored-by: githubnemo <githubnemo@users.noreply.github.com>
|
@githubnemo should be ready for another review thanks |
githubnemo
left a comment
There was a problem hiding this comment.
Thanks for the quick response :)
I think implementation-wise this is, except for two nits, good to go.
Let's add an example that showcases the primary use-case and add the method to the method comparison suite (maybe copy from method_comparison/MetaMathQA/experiments/lora/... and see where it takes us).
It'd also be stellar to have a commit message / PR description that is meaningful in the commit history.
|
ready @githubnemo |
|
Thanks for the PR Kashif. I ran the experiments on my machine and for got a test accuracy of 0% and 0.002% :) |
|
yes @BenjaminBossan i will test with the RL setup, we can wait if its ok, I want to also double check that nothing is wrong |
|
Out of curiosity, I wanted to check if TinyLoRA can achieve better scores if we increase the number of trainable parameters. So I took the default* setting and increased
Given the still tiny number of trainable parameters, this result is quite respectable. This is also a nice confirmation that there is no major bug in the implementation. I wonder if it would make sense to have a "maximalist" and a "minimalist" config, i.e. one with more trainable parameters and better score and one with extremely few trainable parameters (basically the current *One more change I did was to increase |
I think that's a good thing to have! I also wondered if it would make sense to extend the target modules to |
|
should we just document this? or add it somewhere else? |
I'd rather target either the attention xor the MLP part for consistence with other experiments.
What does "this" reference here? |
|
ah sorry, i meant the minimal/maximal config? |
Yes, let's add a 'maximalist' config with (possibly) r > 2 and u >= 2048. VeRA has ~128k parameters with 37.6% task accuracy according to https://huggingface.co/spaces/peft-internal-testing/PEFT-method-comparison - maybe it makes sense to match that setting ( |
Adds TinyLoRA, a new PEFT method based on "TinyLoRA: Learning to Reason in 13 Parameters". TinyLoRA achieves extreme parameter efficiency by replacing LoRA's trainable low-rank matrices with a tiny trainable vector projected through fixed random bases.
The key idea: given a frozen SVD decomposition
W ≈ B @ A(whereB = U @ sqrt(S)andA = sqrt(S) @ V^T), the weight update isdelta_W = B @ R @ AwhereRis anr x rtrainable matrix (following LoRA-XS). TinyLoRA takes this further by parameterizingRas a linear combination of fixed random projection matrices:where
vis the only trainable parameter (as small as 13 values) andP_iare fixed random matrices seeded deterministically.Features
uper target module (or even less with weight tying), compared tor * (in + out)for LoRAvvectors across layers viaweight_tying(0.0 = no sharing, 1.0 = all layers share onev)AandBmatrices computed from truncated SVD of pretrained weights, with singular values distributed equally viasqrt(S)nn.Linear,Conv1D, andnn.Embeddingsupports_lora_conversion()-> True — delta weights can be converted to standard LoRA format viaget_delta_weightPmatrices are seeded per-layer for reproducibility; optionally saved in checkpoints (save_projection=True)Config
Architecture
get_delta_weight,supports_lora_conversion