microgpt

A micro GPT implementation and training pipeline in PyTorch. I built this to understand a few things:

Basics of attention and RoPE
Training a GPT-like model with multiple GPUs including checkpointing and other considerations to make the run successful
Multi phase training including combining/souping the model weights for the second stage with 3 runs of smaller amounts of high quality data

The model itself is very simple and I didn't include any fancy new features. A better future version might include some of these but are out of scope for this project (It's very costly!):

Post training features:
- Supervised finetuning
- Reinforcement learning
Model features:
- Variations of attention mechanisms - GQA, MHLA, etc.
- Variations of normalizations - LayerNorm, RMSNorm, etc.
- Experimenting with post-normalization similar to OLMo
- Mixture of experts

Pretrained model

The pretrained models can be found in the pretrained directory. It was trained in 2 stages:

Stage 1: Training using large amounts of mostly web based data
Stage 2: Training using 3 runs of smaller amounts of high quality data and combining/souping the model weights

Comparison with OpenAI's GPT-2

Infrastructure used for training

8x H200 SXM GPUs (80GB) on runpod.io
- Time taken: ~4 hours
- Hourly Cost: $32 per hour
- Total cost: ~$128
1 c8g.4xlarge instance on AWS
- Time taken: ~16 hours
- Hourly Cost: $0.43184 per hour
- Total cost: ~$6.75

Features

Tokenizer
- Loading pretrained gpt tokenizers
- Training custom byte-pair encoding tokenizers
- Loading custom byte-pair encoding tokenizers from files
Micro GPT model implementation
- Loading pretrained gpt models
- Training custom gpt models with support for DDP
- Training checkpoints
- Loading custom gpt models from files
Training using text, files, urls or huggingface datasets
RoPE implementation
Reproducing GPT-2 with a custom tokenizer and model
HellaSwag eval
2 stage model training including combining/souping the model weights for the second stage with 3 runs of smaller amounts of high quality data

Usage

Install uv
Install make
Setup a virtual environment

uv venv --python 3.12
source .venv/bin/activate

Install dependencies

make sync

Download the pretrained model and tokenizer

python scripts/download_pretrained.py

Example usage:

from microgpt.model import (
    load_model,
    PretrainedModelConfig,
)

model = await load_model(
    config=PretrainedModelConfig(),
)
generated_text = model.generate_text(
    text="Hi, I'm a language model,",
    max_new_tokens=50,
)

Go through the notebooks to understand how to use the library.

Acknowledgements

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.vscode		.vscode
assets		assets
notebooks		notebooks
scripts		scripts
src/microgpt		src/microgpt
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

microgpt

Pretrained model

Comparison with OpenAI's GPT-2

Infrastructure used for training

Features

Usage

Acknowledgements

License

About

Uh oh!

Languages

License

gpahal/microgpt

Folders and files

Latest commit

History

Repository files navigation

microgpt

Pretrained model

Comparison with OpenAI's GPT-2

Infrastructure used for training

Features

Usage

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages