self-improving-loop

A regression guard for AI agents.

Wrap any LangGraph / Hermes / custom agent node. Record traces. Detect success-rate or latency regression. Roll back bad config changes. Preserve event evidence.

Latest verified release: v0.1.1 · Agent data strategy: AGENT_DATA_STRATEGY.md · Eval labels: docs/ANNOTATION_GUIDELINE.md · External repro: EXTERNAL_REPRO.md · Hermes-style guard: docs/HERMES_SKILL_GUARD.md

中文定位：self-improving-loop 是 AI Agent 的回归保护层。它包住 LangGraph / Hermes / 自定义 agent 节点，记录 trace，检测成功率或延迟退化，回滚坏配置，并保留可复查事件证据。

Most "self-improving agent" projects stop at "log the failures, let the next run read the log". That's a methodology, not a loop. This package is the loop, implemented as a compact pure-stdlib Python runtime — no framework lock-in, no LLM dependency, no cloud.

The optional Yijing strategy is an internal policy router: runtime signals are mapped into six engineering lines, recognized as a hexagram state, converted into a bounded policy patch, then verified through the same rollback guard as any other strategy. It is a state machine, not fortune telling.

Overhead is negligible for normal LLM/HTTP agent calls; for sub-10ms in-memory functions, measure before wrapping.

Wrap any function, get:

📊 Automatic execution tracking (success rate, latency, rolling window)
🗄 Trace storage choice: readable JSONL by default, SQLite/WAL for multi-worker deployments
🧠 Adaptive thresholds per agent profile (high-freq / mid-freq / low-freq / critical)
☯️ Optional hexagram state strategy: six runtime dimensions → policy patch
🛠 Strategy hook for proposing improvement configs when failure patterns are detected
🧩 ConfigAdapter contract for real config backup / patch / restore
🛡 Rollback trigger when the new config regresses (>10% success drop, >20% latency gain, or 5 consecutive failures)
📬 Pluggable notifier (stub by default — swap in Telegram / Slack / whatever)

Extracted from TaijiOS, where the same six-line state model is used for production-scale agent workloads.

Demo artifacts: terminal transcript · asciinema cast

Install

Latest verified GitHub release:

pip install https://github.com/yangfei222666-9/self-improving-loop/releases/download/v0.1.1/self_improving_loop-0.1.1-py3-none-any.whl

From source:

pip install git+https://github.com/yangfei222666-9/self-improving-loop.git@v0.1.1

Zero required dependencies. Everything is Python stdlib, including optional SQLite trace storage via sqlite3.

Not a...

...methodology doc. Many "self-improving agent" repos are markdown templates that ask you to log learnings to CLAUDE.md. This is the runtime loop that does it for you.
...heavyweight framework. Compact stdlib code. Drop it next to your existing code. No decorators forced on you. No background process.
...LLM-dependent. The analysis is statistical, not LLM-based. If you want LLM-authored config tweaks, pass an improvement_strategy object and ask your favorite LLM inside its analyze() method.

What is stable today

Stable:

Execution tracking
Adaptive failure thresholds
JSONL trace storage with a cross-process lock
Optional SQLite/WAL trace storage
Strategy-triggered config patching
ConfigAdapter-backed rollback when a patch regresses
Optional Yijing hexagram strategy as a deterministic state router: runtime traces -> six engineering lines -> hexagram -> bounded policy patch

Experimental:

Choosing the best config patch automatically. The loop calls your improvement_strategy; it does not pretend to know your agent better than your production tests.
Full 64-hexagram policy coverage. The first Yijing strategy supports only eight core states and should be treated as a bounded policy router.

30-second example

from self_improving_loop import SelfImprovingLoop

loop = SelfImprovingLoop()

def my_agent_work():
    # Your actual agent call / LLM chain / tool invocation
    return {"status": "ok", "data": ...}

result = loop.execute_with_improvement(
    agent_id="my-agent",
    task="handle user query",
    execute_fn=my_agent_work,
)

if result["improvement_triggered"]:
    print(f"Strategy applied {result['improvement_applied']} config tweaks")

if result["rollback_executed"]:
    print(f"Rolled back because: {result['rollback_executed']['reason']}")

That's it. The loop watches every execution and decides when to trigger tuning. To mutate and restore real agent config, provide a strategy hook plus either a ConfigAdapter or the legacy strategy current_config/apply/rollback methods.

Run the six useful examples

From a repo checkout, start here:

python3 examples/01_basic_tracking.py
python3 examples/02_config_rollback.py
python3 examples/03_langgraph_adapter.py
python3 examples/04_yijing_strategy.py
python3 examples/05_langgraph_regression_guard.py
python3 examples/06_hermes_skill_regression_guard.py

They prove the six important contracts:

01_basic_tracking.py: wrapper records traces and exposes stats.
02_config_rollback.py: a bad patch is applied, regression is detected, and ConfigAdapter.rollback_config() restores the previous config.
03_langgraph_adapter.py: a LangGraph-style node can be wrapped without adopting a new framework.
04_yijing_strategy.py: traces become six runtime lines, a hexagram state, and a bounded policy patch.
05_langgraph_regression_guard.py: a LangGraph-style node regresses, traces are recorded, rollback runs, and an event trail survives.
06_hermes_skill_regression_guard.py: a Hermes-style skill call regresses, rollback restores the skill config, and an event trail survives.

For the verbose rollback event trail, run:

python3 examples/regression_rollback_demo.py --data-dir .repro-demo
python3 examples/verify_regression_rollback_event_trail.py .repro-demo/regression_rollback_event_trail.jsonl

For the bundled agent-failure eval packet, run:

python3 examples/verify_agent_eval_cases.py examples/agent_eval_cases.jsonl

The packet contains 30 non-authorizing cases for silent failure, stale artifacts, provider drift, missing event trails, rollback gaps, and unsafe action escalation. It is eval data only: no judgment, paper-buy, trade, or promote.

Use it as a safety layer for your current agent

This package is not trying to replace LangGraph, CrewAI, AutoGen, OpenAI Agents, or your own internal runner. It wraps the callable you already trust:

result = loop.execute_with_improvement(
    agent_id="support-agent",
    task="answer ticket",
    execute_fn=lambda: existing_agent.run(ticket),
    context={"framework": "your-current-stack"},
)

loop.track(...) is also available as a shorter alias for the same API.

Dependency-free examples show the integration seam:

python3 examples/03_langgraph_adapter.py
python3 examples/05_langgraph_regression_guard.py
python3 examples/06_hermes_skill_regression_guard.py
python3 examples/wrap_existing_agent.py

The goal is narrow: traces, thresholds, guarded strategy application, and rollback evidence around an agent you already have.

Trace storage

By default, traces are written to a readable traces.jsonl file with a cross-platform sidecar lock. For multi-worker deployments, switch to SQLite:

from self_improving_loop import SelfImprovingLoop

loop = SelfImprovingLoop(storage="sqlite")

This writes traces.sqlite3 with WAL mode enabled. The public API is unchanged: execute_with_improvement() records traces, and the loop reads them back for thresholds, metrics, and rollback checks.

For long-running JSONL deployments, enable size-based rotation and call compaction from cron or your scheduler:

loop = SelfImprovingLoop(
    storage="jsonl",
    jsonl_max_bytes=50 * 1024 * 1024,
    jsonl_max_archives=7,
)

# Keep the latest 100k valid active traces.
loop.trace_store.compact(max_entries=100_000)

Rotated JSONL files are gzipped under trace_archives/ by default. Compaction drops corrupt rows and keeps the latest valid entries in the active trace file.

Optional Yijing policy strategy

The Yijing layer is implemented as a deterministic state machine, not as a fortune-telling layer:

runtime traces -> six engineering lines -> hexagram state -> policy patch

The six lines are:

stability
efficiency
learning activity
routing accuracy
collaboration
governance

Use it as the strategy:

from self_improving_loop import SelfImprovingLoop, YijingEvolutionStrategy

loop = SelfImprovingLoop(
    strategy=YijingEvolutionStrategy(),
    config_adapter=my_config_adapter,
)

improvement_strategy= remains supported for backward compatibility.

The engineering mapping is explicit:

Line	Dimension	Yang means	Yin means
1	stability	dependencies are healthy	API/network/dependency failure
2	efficiency	high success, low latency	low success or high latency
3	learning activity	feedback / recovery signal exists	repeated failure without learning
4	routing accuracy	model/tool choice looks correct	wrong model/tool/schema drift
5	collaboration	tools / agents hand off cleanly	conflicts or context breaks
6	governance	cost and rollout are bounded	quota, cost, or policy drift

The first version supports eight core policy states: Qian, Kun, Zhen, Kan, Bo, Fu, Ji Ji, and Wei Ji. It returns a bounded config patch and relies on the same canary/rollback path as any other strategy.

Why this exists

Most agents have this failure mode:

You ship an agent.
It works for a week.
Something upstream changes (rate limits, schema drift, a new edge case).
Your agent starts failing.
You find out three days later from angry users.
You tweak a config, hope for the best, ship it.
The tweak makes another scenario worse.
You roll it back manually, losing the original learning.

self-improving-loop turns steps 3–8 into a tight feedback loop that runs inside your process, without needing observability infra, Kubernetes, or a dedicated ML team.

Adaptive thresholds (no magic numbers)

Different agents have different "pulse rates". A critical alerting agent should reconsider after 1 failure; a batch classifier can tolerate 5 before triggering analysis. The library classifies agents by execution frequency and adjusts:

The automatic profile is based on exec_count_24h; override it with set_manual_threshold() when production semantics matter more than raw frequency.

Agent profile	Failure trigger	Analysis window	Cooldown
High-frequency (>10/day)	5 failures	48h	3h
Medium-frequency (3-10/day)	3 failures	24h	6h
Low-frequency (<3/day)	2 failures	72h	12h
Critical (user-marked)	1 failure	24h	6h

Or bypass the classifier and set manually:

from self_improving_loop import AdaptiveThreshold

adaptive = AdaptiveThreshold()
adaptive.set_manual_threshold(
    "critical-agent",
    failure_threshold=1,
    analysis_window_hours=12,
    cooldown_hours=1,
    is_critical=True,
)

Auto-rollback (the safety net)

When a config change ships, the loop keeps watching. It rolls back if any of these become true:

Success rate drops >10%
Average latency increases >20%
≥5 consecutive failures after the change

Real rollback requires a config hook. Prefer an explicit ConfigAdapter:

from self_improving_loop import SelfImprovingLoop

class MyConfigAdapter:
    def get_config(self, agent_id):
        return load_agent_config(agent_id)

    def apply_config(self, agent_id, config_patch):
        save_agent_config(agent_id, {**load_agent_config(agent_id), **config_patch})
        return True

    def rollback_config(self, agent_id, backup_config):
        save_agent_config(agent_id, backup_config)

loop = SelfImprovingLoop(
    improvement_strategy=my_strategy,
    config_adapter=MyConfigAdapter(),
)

Without a config adapter or strategy rollback hook, the loop will record the rollback decision but will not claim that your external agent config was restored.

# See recent rollbacks
rollback_history = loop.auto_rollback.get_rollback_history("my-agent")
for event in rollback_history:
    print(event["reason"], event["timestamp"])

Pluggable notifier

The built-in TelegramNotifier is a stub — it logs to stdout. Override _send_message() to hook any channel:

from self_improving_loop import TelegramNotifier

class MySlackNotifier(TelegramNotifier):
    def __init__(self, webhook_url, **kw):
        super().__init__(**kw)
        self.webhook_url = webhook_url

    def _send_message(self, message, priority="normal"):
        import requests
        requests.post(self.webhook_url, json={"text": f"[{priority}] {message}"})

loop = SelfImprovingLoop(notifier=MySlackNotifier(webhook_url="https://hooks..."))

Performance

Measured locally with benchmarks/overhead.py (200 iterations per workload, Python 3.12, Windows):

Workload profile	Absolute overhead	Relative overhead
~100 ms agent call (typical LLM)	+0.27 ms	+0.3%
~10 ms agent call (tool call)	+0.31 ms	+3.0%
sub-millisecond call	+0.08 ms	>>% (don't wrap these)

The wrapper adds a stable ~300 μs of fixed cost per call (trace append + threshold check). Whether that's negligible depends on your workload:

LLM calls (>500 ms): overhead is ≤0.06% — invisible
HTTP / DB calls (~30-100 ms): ≤1%
Fast in-memory work (<10 ms): 3%+ — reconsider whether you need this for those

Rerun the benchmark on your own machine with python benchmarks/overhead.py.

Restart / recovery startup cost can be checked with:

python benchmarks/startup_recovery.py --traces 1000 10000 100000

SelfImprovingLoop.__init__ only loads loop_state.json; trace history is loaded on demand for stats, thresholds, and rollback checks.

Separate operation costs (triggered occasionally, not per-call):

Operation	Cost
Failure analysis (only when threshold crossed)	~100 ms
Applying improvement config	~200 ms
Rollback execution	~10 ms

Background

Extracted from TaijiOS — a self-learning AI operating system with 5 I Ching–bound engines and a 346-heartbeat Ising physics experiment. The parent project has 14 modules; this one is the most generally reusable, so it lives as a standalone package.

TaijiOS started on Chinese New Year 2026-02-17 and has been built through multi-AI collaboration since then.

License

MIT. Ship it wherever.

Contact / Feedback

This is a very early release. Every bug report, every "didn't work for me", every "I wish it did X" is read:

GitHub Issue: open one
Parent project: TaijiOS

"Safety first, then automation."

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.github		.github
assets/demo		assets/demo
benchmarks		benchmarks
docs		docs
examples		examples
self_improving_loop		self_improving_loop
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENT_DATA_STRATEGY.md		AGENT_DATA_STRATEGY.md
CHANGELOG.md		CHANGELOG.md
EXTERNAL_REPRO.md		EXTERNAL_REPRO.md
LAUNCH_COPY_BILINGUAL.md		LAUNCH_COPY_BILINGUAL.md
LAUNCH_OBSERVATION_72H.md		LAUNCH_OBSERVATION_72H.md
LAUNCH_POST_SELF_IMPROVING_LOOP.md		LAUNCH_POST_SELF_IMPROVING_LOOP.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
PATCH_REPORT.md		PATCH_REPORT.md
README.md		README.md
SHOW_HN_DRAFT.md		SHOW_HN_DRAFT.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

self-improving-loop

Install

Not a...

What is stable today

30-second example

Run the six useful examples

Use it as a safety layer for your current agent

Trace storage

Optional Yijing policy strategy

Why this exists

Adaptive thresholds (no magic numbers)

Auto-rollback (the safety net)

Pluggable notifier

Performance

Background

License

Contact / Feedback

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

self-improving-loop

Install

Not a...

What is stable today

30-second example

Run the six useful examples

Use it as a safety layer for your current agent

Trace storage

Optional Yijing policy strategy

Why this exists

Adaptive thresholds (no magic numbers)

Auto-rollback (the safety net)

Pluggable notifier

Performance

Background

License

Contact / Feedback

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages