ANEE: Adaptive Neural Execution Engine

Dynamic sparse inference for autoregressive Transformers

ANEE is a lightweight research library that adds per-token adaptive computation to autoregressive Transformer models (GPT-2 + other GPT-style open models). It uses a profiler + learned controller to skip redundant layers, while keeping the KV-cache aligned, enabling coherent generation even when large portions of the network are bypassed.

ANEE has been tested on GPT-2 small, GPT-2 medium, GPT-2 large, and GPT-2-XL, achieving up to 50–55% theoretical FLOPs savings on large models at low compute budgets.

Key Features

✔ Dynamic Layer Skipping (Per Token)

ANEE decides, for every token, which layers are necessary and which can be skipped.

✔ Profiler-Driven State

Each layer is evaluated using:

entropy
hidden-state L2 norm
delta-norm
activation variance
remaining compute budget
depth position

These form the controller’s state vector.

✔ Safe Partial KV-Skipping

Skipped layers still update the KV-cache (keys/values only), keeping attention alignment intact while avoiding heavy matrix multiplications.

✔ RL or Heuristic Controller

Use either:

a simple heuristic
a learned controller trained via REINFORCE

✔ Plug-in Model Adapters (Extensible)

Current adapters:

GPT-2 family (all sizes)

Ready for extension to:

GPT-J
LLaMA
Falcon
Mistral

via model adapters.

Install

pip install anee

Quick Start

import torch
from transformers import GPT2TokenizerFast
from anee import ANEEConfig
from anee.wrapper import ANEEWrapper

config = ANEEConfig(model_name="gpt2-xl", energy_budget=0.2)
model = ANEEWrapper(config).eval()

tokenizer = GPT2TokenizerFast.from_pretrained("gpt2-xl")

text = model.generate(
    tokenizer=tokenizer,
    prompt="The future of AI",
    max_new_tokens=30,
)

print(text)

How ANEE Works

For each token, ANEE:

Profiles the hidden states using: entropy, max-softmax probability, L2 norm, delta-norm, variance, and remaining budget
Builds a controller state vector
Passes the state into an MLP controller to choose:
- PROCESS — run the full layer
- SKIP — update KV-cache only
- EXIT — optional early stop
Maintains safe KV-cache alignment
Produces logits through the model’s final LN + LM head

FLOPs Savings Example (GPT-2-XL)

Budget = 0.2
Layers executed per token: ~19–21 of 48
Layers skipped per token: 28–29
Average theoretical savings: 53–55%

The largest models show the strongest redundancy and highest savings.

Project Structure

src/anee/
    wrapper.py            – core KV-safe executor
    controller.py         – heuristic + learned controller
    profiler.py           – entropy/norm/variance metrics
    utils.py              – FLOPs estimates, helpers
    reward.py             – RL reward functions
    config.py             – configuration dataclass

experiments/
    01_sanity_check.py    – simple text generation test
    visualize_heatmap.py  – layer-usage heatmaps
    train_controller.py   – supervised controller (optional)
    train_controller_rl.py – RL controller training

Supported Models

Model	Status
GPT-2 (all sizes)	✔ Full support
GPT-J 6B	☐ Adapter planned
LLaMA / Mistral	☐ Adapter planned
Falcon	☐ Adapter planned

Adapters can be added by implementing a ModelAdapter subclass.

Why ANEE?

Transformers waste computation on many tokens. ANEE reduces theoretical FLOPs while preserving sequence coherence by:

identifying redundant layers
skipping only semantic-middle layers
preserving structure and output formatting layers

This creates a “Sandwich Pattern” of low–middle–low compute which appears consistently across models.

Citation

If ANEE is useful in your work:

@software{anee2025,
  author = {Ahmed Bin Khalid},
  title  = {ANEE: Adaptive Neural Execution Engine},
  year   = {2025},
  doi    = {10.5281/zenodo.17741880},
  note   = {Dynamic sparse inference for autoregressive Transformers}
}

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
controllers		controllers
datasets		datasets
experiments		experiments
src/anee		src/anee
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
anee_mri_scan.png		anee_mri_scan.png
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ANEE: Adaptive Neural Execution Engine

Key Features

✔ Dynamic Layer Skipping (Per Token)

✔ Profiler-Driven State

✔ Safe Partial KV-Skipping

✔ RL or Heuristic Controller

✔ Plug-in Model Adapters (Extensible)

Install

Quick Start

How ANEE Works

FLOPs Savings Example (GPT-2-XL)

Project Structure

Supported Models

Why ANEE?

Citation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ANEE: Adaptive Neural Execution Engine

Key Features

✔ Dynamic Layer Skipping (Per Token)

✔ Profiler-Driven State

✔ Safe Partial KV-Skipping

✔ RL or Heuristic Controller

✔ Plug-in Model Adapters (Extensible)

Install

Quick Start

How ANEE Works

FLOPs Savings Example (GPT-2-XL)

Project Structure

Supported Models

Why ANEE?

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages