RAT+

Official implementation of RAT+, a dense-pretraining architecture that augments attention with full-sequence recurrence and active recurrence learning.

A single RAT+ model is pretrained densely once and can then be flexibly switched at inference time to dilated attention (optionally with local windows) or hybrid layer/head compositions. This requires only a short 1B-token resolution adaptation rather than retraining separate sparse models.

This repository currently provides the core architecture implementation. The full codebase, including training scripts and evaluation pipelines, is currently being cleaned up and will be released in a future update of this repository.

Citation

If you find this work useful, please cite:

The repository structure is built upon https://github.com/CLAIRE-Labo/RAT.

@article{wei2025rat,
  title={RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling},
  author={Wei, Xiuying and Yadav, Anunay and Pascanu, Razvan and Gulcehre, Caglar},
  journal={arXiv preprint arXiv:2507.04416},
  year={2025}
}

@article{wei2026ratplus,
  title={RAT+: Train Dense, Infer Sparse--Recurrence Augmented Attention for Dilated Inference},
  author={Wei, Xiuying and Gulcehre, Caglar},
  journal={arXiv preprint arXiv:2602.18196},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
op		op
LICENSE		LICENSE
README.md		README.md
cache.py		cache.py
ratplus.py		ratplus.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAT+

Citation

About

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAT+

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages