Skip to content

yangfei222666-9/self-improving-loop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

self-improving-loop

A regression guard for AI agents.

Wrap any LangGraph / Hermes / custom agent node. Record traces. Detect success-rate or latency regression. Roll back bad config changes. Preserve event evidence.

GitHub Release CI License: MIT Python LLM overhead

Latest verified release: v0.1.1 · Agent data strategy: AGENT_DATA_STRATEGY.md · Eval labels: docs/ANNOTATION_GUIDELINE.md · External repro: EXTERNAL_REPRO.md · Hermes-style guard: docs/HERMES_SKILL_GUARD.md

中文定位:self-improving-loop 是 AI Agent 的回归保护层。它包住 LangGraph / Hermes / 自定义 agent 节点,记录 trace,检测成功率或延迟退化,回滚坏配置,并保留可复查事件证据。

Most "self-improving agent" projects stop at "log the failures, let the next run read the log". That's a methodology, not a loop. This package is the loop, implemented as a compact pure-stdlib Python runtime — no framework lock-in, no LLM dependency, no cloud.

The optional Yijing strategy is an internal policy router: runtime signals are mapped into six engineering lines, recognized as a hexagram state, converted into a bounded policy patch, then verified through the same rollback guard as any other strategy. It is a state machine, not fortune telling.

Overhead is negligible for normal LLM/HTTP agent calls; for sub-10ms in-memory functions, measure before wrapping.

Wrap any function, get:

  • 📊 Automatic execution tracking (success rate, latency, rolling window)
  • 🗄 Trace storage choice: readable JSONL by default, SQLite/WAL for multi-worker deployments
  • 🧠 Adaptive thresholds per agent profile (high-freq / mid-freq / low-freq / critical)
  • ☯️ Optional hexagram state strategy: six runtime dimensions → policy patch
  • 🛠 Strategy hook for proposing improvement configs when failure patterns are detected
  • 🧩 ConfigAdapter contract for real config backup / patch / restore
  • 🛡 Rollback trigger when the new config regresses (>10% success drop, >20% latency gain, or 5 consecutive failures)
  • 📬 Pluggable notifier (stub by default — swap in Telegram / Slack / whatever)

Extracted from TaijiOS, where the same six-line state model is used for production-scale agent workloads.

self-improving-loop terminal demo

Demo artifacts: terminal transcript · asciinema cast


Install

Latest verified GitHub release:

pip install https://github.com/yangfei222666-9/self-improving-loop/releases/download/v0.1.1/self_improving_loop-0.1.1-py3-none-any.whl

From source:

pip install git+https://github.com/yangfei222666-9/self-improving-loop.git@v0.1.1

Zero required dependencies. Everything is Python stdlib, including optional SQLite trace storage via sqlite3.


Not a...

  • ...methodology doc. Many "self-improving agent" repos are markdown templates that ask you to log learnings to CLAUDE.md. This is the runtime loop that does it for you.
  • ...heavyweight framework. Compact stdlib code. Drop it next to your existing code. No decorators forced on you. No background process.
  • ...LLM-dependent. The analysis is statistical, not LLM-based. If you want LLM-authored config tweaks, pass an improvement_strategy object and ask your favorite LLM inside its analyze() method.

What is stable today

Stable:

  • Execution tracking
  • Adaptive failure thresholds
  • JSONL trace storage with a cross-process lock
  • Optional SQLite/WAL trace storage
  • Strategy-triggered config patching
  • ConfigAdapter-backed rollback when a patch regresses
  • Optional Yijing hexagram strategy as a deterministic state router: runtime traces -> six engineering lines -> hexagram -> bounded policy patch

Experimental:

  • Choosing the best config patch automatically. The loop calls your improvement_strategy; it does not pretend to know your agent better than your production tests.
  • Full 64-hexagram policy coverage. The first Yijing strategy supports only eight core states and should be treated as a bounded policy router.

30-second example

from self_improving_loop import SelfImprovingLoop

loop = SelfImprovingLoop()

def my_agent_work():
    # Your actual agent call / LLM chain / tool invocation
    return {"status": "ok", "data": ...}

result = loop.execute_with_improvement(
    agent_id="my-agent",
    task="handle user query",
    execute_fn=my_agent_work,
)

if result["improvement_triggered"]:
    print(f"Strategy applied {result['improvement_applied']} config tweaks")

if result["rollback_executed"]:
    print(f"Rolled back because: {result['rollback_executed']['reason']}")

That's it. The loop watches every execution and decides when to trigger tuning. To mutate and restore real agent config, provide a strategy hook plus either a ConfigAdapter or the legacy strategy current_config/apply/rollback methods.


Run the six useful examples

From a repo checkout, start here:

python3 examples/01_basic_tracking.py
python3 examples/02_config_rollback.py
python3 examples/03_langgraph_adapter.py
python3 examples/04_yijing_strategy.py
python3 examples/05_langgraph_regression_guard.py
python3 examples/06_hermes_skill_regression_guard.py

They prove the six important contracts:

  • 01_basic_tracking.py: wrapper records traces and exposes stats.
  • 02_config_rollback.py: a bad patch is applied, regression is detected, and ConfigAdapter.rollback_config() restores the previous config.
  • 03_langgraph_adapter.py: a LangGraph-style node can be wrapped without adopting a new framework.
  • 04_yijing_strategy.py: traces become six runtime lines, a hexagram state, and a bounded policy patch.
  • 05_langgraph_regression_guard.py: a LangGraph-style node regresses, traces are recorded, rollback runs, and an event trail survives.
  • 06_hermes_skill_regression_guard.py: a Hermes-style skill call regresses, rollback restores the skill config, and an event trail survives.

For the verbose rollback event trail, run:

python3 examples/regression_rollback_demo.py --data-dir .repro-demo
python3 examples/verify_regression_rollback_event_trail.py .repro-demo/regression_rollback_event_trail.jsonl

For the bundled agent-failure eval packet, run:

python3 examples/verify_agent_eval_cases.py examples/agent_eval_cases.jsonl

The packet contains 30 non-authorizing cases for silent failure, stale artifacts, provider drift, missing event trails, rollback gaps, and unsafe action escalation. It is eval data only: no judgment, paper-buy, trade, or promote.


Use it as a safety layer for your current agent

This package is not trying to replace LangGraph, CrewAI, AutoGen, OpenAI Agents, or your own internal runner. It wraps the callable you already trust:

result = loop.execute_with_improvement(
    agent_id="support-agent",
    task="answer ticket",
    execute_fn=lambda: existing_agent.run(ticket),
    context={"framework": "your-current-stack"},
)

loop.track(...) is also available as a shorter alias for the same API.

Dependency-free examples show the integration seam:

python3 examples/03_langgraph_adapter.py
python3 examples/05_langgraph_regression_guard.py
python3 examples/06_hermes_skill_regression_guard.py
python3 examples/wrap_existing_agent.py

The goal is narrow: traces, thresholds, guarded strategy application, and rollback evidence around an agent you already have.


Trace storage

By default, traces are written to a readable traces.jsonl file with a cross-platform sidecar lock. For multi-worker deployments, switch to SQLite:

from self_improving_loop import SelfImprovingLoop

loop = SelfImprovingLoop(storage="sqlite")

This writes traces.sqlite3 with WAL mode enabled. The public API is unchanged: execute_with_improvement() records traces, and the loop reads them back for thresholds, metrics, and rollback checks.

For long-running JSONL deployments, enable size-based rotation and call compaction from cron or your scheduler:

loop = SelfImprovingLoop(
    storage="jsonl",
    jsonl_max_bytes=50 * 1024 * 1024,
    jsonl_max_archives=7,
)

# Keep the latest 100k valid active traces.
loop.trace_store.compact(max_entries=100_000)

Rotated JSONL files are gzipped under trace_archives/ by default. Compaction drops corrupt rows and keeps the latest valid entries in the active trace file.


Optional Yijing policy strategy

The Yijing layer is implemented as a deterministic state machine, not as a fortune-telling layer:

runtime traces -> six engineering lines -> hexagram state -> policy patch

The six lines are:

  1. stability
  2. efficiency
  3. learning activity
  4. routing accuracy
  5. collaboration
  6. governance

Use it as the strategy:

from self_improving_loop import SelfImprovingLoop, YijingEvolutionStrategy

loop = SelfImprovingLoop(
    strategy=YijingEvolutionStrategy(),
    config_adapter=my_config_adapter,
)

improvement_strategy= remains supported for backward compatibility.

The engineering mapping is explicit:

Line Dimension Yang means Yin means
1 stability dependencies are healthy API/network/dependency failure
2 efficiency high success, low latency low success or high latency
3 learning activity feedback / recovery signal exists repeated failure without learning
4 routing accuracy model/tool choice looks correct wrong model/tool/schema drift
5 collaboration tools / agents hand off cleanly conflicts or context breaks
6 governance cost and rollout are bounded quota, cost, or policy drift

The first version supports eight core policy states: Qian, Kun, Zhen, Kan, Bo, Fu, Ji Ji, and Wei Ji. It returns a bounded config patch and relies on the same canary/rollback path as any other strategy.


Why this exists

Most agents have this failure mode:

  1. You ship an agent.
  2. It works for a week.
  3. Something upstream changes (rate limits, schema drift, a new edge case).
  4. Your agent starts failing.
  5. You find out three days later from angry users.
  6. You tweak a config, hope for the best, ship it.
  7. The tweak makes another scenario worse.
  8. You roll it back manually, losing the original learning.

self-improving-loop turns steps 3–8 into a tight feedback loop that runs inside your process, without needing observability infra, Kubernetes, or a dedicated ML team.


Adaptive thresholds (no magic numbers)

Different agents have different "pulse rates". A critical alerting agent should reconsider after 1 failure; a batch classifier can tolerate 5 before triggering analysis. The library classifies agents by execution frequency and adjusts:

The automatic profile is based on exec_count_24h; override it with set_manual_threshold() when production semantics matter more than raw frequency.

Agent profile Failure trigger Analysis window Cooldown
High-frequency (>10/day) 5 failures 48h 3h
Medium-frequency (3-10/day) 3 failures 24h 6h
Low-frequency (<3/day) 2 failures 72h 12h
Critical (user-marked) 1 failure 24h 6h

Or bypass the classifier and set manually:

from self_improving_loop import AdaptiveThreshold

adaptive = AdaptiveThreshold()
adaptive.set_manual_threshold(
    "critical-agent",
    failure_threshold=1,
    analysis_window_hours=12,
    cooldown_hours=1,
    is_critical=True,
)

Auto-rollback (the safety net)

When a config change ships, the loop keeps watching. It rolls back if any of these become true:

  • Success rate drops >10%
  • Average latency increases >20%
  • ≥5 consecutive failures after the change

Real rollback requires a config hook. Prefer an explicit ConfigAdapter:

from self_improving_loop import SelfImprovingLoop

class MyConfigAdapter:
    def get_config(self, agent_id):
        return load_agent_config(agent_id)

    def apply_config(self, agent_id, config_patch):
        save_agent_config(agent_id, {**load_agent_config(agent_id), **config_patch})
        return True

    def rollback_config(self, agent_id, backup_config):
        save_agent_config(agent_id, backup_config)

loop = SelfImprovingLoop(
    improvement_strategy=my_strategy,
    config_adapter=MyConfigAdapter(),
)

Without a config adapter or strategy rollback hook, the loop will record the rollback decision but will not claim that your external agent config was restored.

# See recent rollbacks
rollback_history = loop.auto_rollback.get_rollback_history("my-agent")
for event in rollback_history:
    print(event["reason"], event["timestamp"])

Pluggable notifier

The built-in TelegramNotifier is a stub — it logs to stdout. Override _send_message() to hook any channel:

from self_improving_loop import TelegramNotifier

class MySlackNotifier(TelegramNotifier):
    def __init__(self, webhook_url, **kw):
        super().__init__(**kw)
        self.webhook_url = webhook_url

    def _send_message(self, message, priority="normal"):
        import requests
        requests.post(self.webhook_url, json={"text": f"[{priority}] {message}"})

loop = SelfImprovingLoop(notifier=MySlackNotifier(webhook_url="https://hooks..."))

Performance

Measured locally with benchmarks/overhead.py (200 iterations per workload, Python 3.12, Windows):

Workload profile Absolute overhead Relative overhead
~100 ms agent call (typical LLM) +0.27 ms +0.3%
~10 ms agent call (tool call) +0.31 ms +3.0%
sub-millisecond call +0.08 ms >>% (don't wrap these)

The wrapper adds a stable ~300 μs of fixed cost per call (trace append + threshold check). Whether that's negligible depends on your workload:

  • LLM calls (>500 ms): overhead is ≤0.06% — invisible
  • HTTP / DB calls (~30-100 ms): ≤1%
  • Fast in-memory work (<10 ms): 3%+ — reconsider whether you need this for those

Rerun the benchmark on your own machine with python benchmarks/overhead.py.

Restart / recovery startup cost can be checked with:

python benchmarks/startup_recovery.py --traces 1000 10000 100000

SelfImprovingLoop.__init__ only loads loop_state.json; trace history is loaded on demand for stats, thresholds, and rollback checks.

Separate operation costs (triggered occasionally, not per-call):

Operation Cost
Failure analysis (only when threshold crossed) ~100 ms
Applying improvement config ~200 ms
Rollback execution ~10 ms

Background

Extracted from TaijiOS — a self-learning AI operating system with 5 I Ching–bound engines and a 346-heartbeat Ising physics experiment. The parent project has 14 modules; this one is the most generally reusable, so it lives as a standalone package.

TaijiOS started on Chinese New Year 2026-02-17 and has been built through multi-AI collaboration since then.


License

MIT. Ship it wherever.

Contact / Feedback

This is a very early release. Every bug report, every "didn't work for me", every "I wish it did X" is read:


"Safety first, then automation."

About

Rollback-first reliability layer for AI agents: trace failures, apply guarded changes, and rollback on regression.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages