Problem Statement
Implement the FastMTP model architecture - a single MTP head with position-shared weights that performs recursive K-step prediction.
We need to understand their exact architecture and implement it within the speculators framework.
References
What We Need
- FastMTPConfig - Configuration class registered with the speculators framework
- FastMTPSpeculator - Model class implementing the MTP head and recursive prediction
- Tests - Verify model instantiation, forward pass, verifier attachment
- Documentation - Architecture explanation and usage
Success Criteria
- Model can be loaded via
SpeculatorModel.from_pretrained()
- Forward pass produces correct output shapes for K-step prediction
- Verifier attachment works (train_only mode for training)
- Integration tests pass
Notes
The architecture must match the reference implementation. Key questions: How is the MTP head structured? How are hidden states and embeddings combined? How do position-shared weights work?
Problem Statement
Implement the FastMTP model architecture - a single MTP head with position-shared weights that performs recursive K-step prediction.
We need to understand their exact architecture and implement it within the speculators framework.
References
src/speculators/models/eagle3/for patternsWhat We Need
Success Criteria
SpeculatorModel.from_pretrained()Notes
The architecture must match the reference implementation. Key questions: How is the MTP head structured? How are hidden states and embeddings combined? How do position-shared weights work?