Problem
The unit test test_convert_dcp_to_hf show that if the model is small enough, the model does not save properly with online method.
I see these logs when the test tries to load from what's saved on disk from the online saving method:
Some weights of the model checkpoint at /tmp/tmpgsnuh3yy/test_hf_and_dcp-hf were not used when initializing LlamaForCausalLM: {'_flat_param'}
- This IS expected if you are initializing LlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of LlamaForCausalLM were not initialized from the model checkpoint at /tmp/tmpgsnuh3yy/test_hf_and_dcp-hf and are newly initialized: ['embed_tokens.weight', 'layers.0.input_layernorm.weight', 'layers.0.mlp.down_proj.weight', 'layers.0.mlp.gate_proj.weight', 'layers.0.mlp.up_proj.weight', 'layers.0.post_attention_layernorm.weight', 'layers.0.self_attn.k_proj.weight', 'layers.0.self_attn.o_proj.weight', 'layers.0.self_attn.q_proj.weight', 'layers.0.self_attn.v_proj.weight', 'layers.1.input_layernorm.weight', 'layers.1.mlp.down_proj.weight', 'layers.1.mlp.gate_proj.weight', 'layers.1.mlp.up_proj.weight', 'layers.1.post_attention_layernorm.weight', 'layers.1.self_attn.k_proj.weight', 'layers.1.self_attn.o_proj.weight', 'layers.1.self_attn.q_proj.weight', 'layers.1.self_attn.v_proj.weight', 'lm_head.weight', 'norm.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Repro
- Change these model refs to
from tests.unit.conftest import TEST_ASSETS
TEST_ASSETS.TINY_LLAMA_MODEL_PATH # <-- replace with this
- run
uv run --group test bash tests/run_unit.sh -k test_convert_dcp_to_hf
Work around
Oddly if you increase that TINY_LLAMA_MODEL_PATH config to:
config = LlamaConfig(
num_hidden_layers=2*20,
hidden_size=64*16,
intermediate_size=32*16,
num_attention_heads=2*16,
vocab_size=128256,
tie_word_embeddings=False,
num_key_value_heads=None,
)
the test passes. So it's unexpected that checkpointing is model size dependent. AFAICT the offline script works even for that small model.
Problem
The unit test
test_convert_dcp_to_hfshow that if the model is small enough, the model does not save properly with online method.I see these logs when the test tries to load from what's saved on disk from the online saving method:
Repro
Work around
Oddly if you increase that TINY_LLAMA_MODEL_PATH config to:
the test passes. So it's unexpected that checkpointing is model size dependent. AFAICT the offline script works even for that small model.