template-document-processor-AGET

Status: 🟡 DORMANT (stable template - v2.8.0 released 2025-11-03)

Production-ready template. Activated when new document processing agents are needed.

Template Version: 2.8.0 AGET Framework: v2.7.0 Type: AGET Template (specialized from worker template) Domain: Document Processing

A production-ready template for creating document processing agents with LLM pipelines, security protocols, format preservation, and multi-provider support.

Note: This template is based on template-worker-aget v2.7.0 with specialized document processing capabilities. Template version (v2.8.0) tracks template-specific features independently from AGET framework version (v2.7.0).

Overview

This template provides a complete foundation for agents that:

✅ Process documents using LLM assistance (OpenAI, Anthropic, Google)
✅ Preserve DOCX formatting (Track Changes, comments, annotations)
✅ Prevent catastrophic format loss (L245-type failures)
✅ Support batch operations with validation pipelines
✅ Implement security protocols (injection prevention, content filtering)
✅ Provide caching, metrics, and observability
✅ Enable task decomposition for large documents
✅ Support rollback and version management

Based on L208

This template is based on L208: Document Processing Agent Template Pattern, which analyzed production document processing agents to extract common patterns and best practices.

Time Savings: 60-70% reduction in new agent setup (3-5 hours → 1-2 hours)

Quick Start

1. Clone Template

gh repo clone aget-framework/template-document-processor-AGET my-document-agent
cd my-document-agent

2. Customize Configuration

Update these files:

.aget/version.json:

{
  "agent_name": "my-document-agent",
  "domain": "your-domain"
}

configs/validation_rules.yaml:

max_file_size_mb: 10
allowed_extensions: [".pdf", ".docx", ".txt", ".md"]
required_validations:
  - file_size
  - file_format
  - content_safety

configs/llm_providers.yaml:

providers:
  openai:
    api_key_env: OPENAI_API_KEY
    enabled: true
  anthropic:
    api_key_env: ANTHROPIC_API_KEY
    enabled: false
budget:
  monthly_limit_usd: 300.0

3. Update Mission

Edit AGENTS.md to describe your agent's specific purpose and domain.

4. Run Tests

python3 -m pytest tests/ -v

5. Deploy

git remote set-url origin <your-repo-url>
git add .
git commit -m "feat: Initialize from template-document-processor-AGET v2.7.0"
git push -u origin main

Architecture

Core Components

src/ingestion/ - Queue management, validation, batch processing
src/processing/ - LLM providers, model routing, caching, schema validation
src/output/ - Publishing, version management, rollback
src/security/ - Input sanitization, content filtering, resource limiting
src/pipeline/ - Task decomposition, orchestration, metrics
src/wikitext/ - Document format support (docx, wikitext, extensible for additional formats)

Configuration Files (9 YAMLs)

configs/validation_rules.yaml - Document validation criteria
configs/llm_providers.yaml - LLM provider configuration
configs/model_routing.yaml - Model selection strategy
configs/models.yaml - Model definitions and capabilities
configs/security_policy.yaml - Security and content filtering
configs/processing_limits.yaml - Resource limits (tokens, time, cost)
configs/caching.yaml - Cache settings and TTL
configs/metrics.yaml - Metrics collection and alerts
configs/orchestration.yaml - Task decomposition and pipeline

Customization Points

1. Document Validation

Customize configs/validation_rules.yaml for your document format:

File extensions
Size limits
Format-specific validation rules

2. LLM Providers

Set up providers in configs/llm_providers.yaml:

API keys (use environment variables)
Model selection (cost vs quality)
Fallback chain
Budget limits

3. Security

Configure security in configs/security_policy.yaml:

Input sanitization rules
Content filtering (PII detection)
Resource limits (tokens, time, cost)

4. Metrics

Define metrics in configs/metrics.yaml:

Accuracy tracking
Latency monitoring (p50/p95/p99)
Cost tracking
Alert thresholds

Protocols

The template includes 10 operational protocols in .aget/docs/protocols/:

Format Preservation - Prevent L245-type catastrophic failures (Track Changes loss)
Queue Management - Managing document queues
Processing Authorization - Approval gates and STOP protocol
Validation Pipeline - Pre/post validation
Rollback - Version management and recovery
Security Validation - Input/output sanitization
Task Decomposition - Breaking large documents into subtasks
Model Routing - Selecting optimal LLM for each task
Caching - LLM response caching for cost/speed
Metrics Collection - Tracking accuracy/latency/cost

Each protocol includes bash commands and code examples.

Format Preservation Guide: See .aget/docs/FORMAT_PRESERVATION_GUIDE.md for complete usage guide.

Scripts

The template provides 17 operational tools (15 scripts + 2 helper tools):

Session Management:

session_protocol.py - Wake up/wind down/sign off
queue_status.py - Queue management CLI
health_check.py - System diagnostics

Core Operations (scripts/):

validate.py - Document validation CLI
process.py - End-to-end processing pipeline
queue_status.py - Queue status and management
rollback.py - Version rollback operations
cache_setup.py - Cache initialization
metrics.py - Metrics display and export

Supporting Operations (scripts/):

health_check.py - System health diagnostics
security_check.py - Security validation
audit.py - Audit trail viewer
model_router.py - Model routing recommendations
cache_stats.py - Cache statistics
cache_clear.py - Cache clearing

Specialized Tools (scripts/ and .aget/tools/):

session_protocol.py - Session lifecycle (wake/wind-down/sign-off)
checkpoint.py - Checkpoint save/load/list
task_planner.py - Task decomposition planning
.aget/tools/analyze_agent_fit.py - Use case fit analysis
.aget/tools/instantiate_template.py - Template instantiation helper

Testing

Template includes 30 contract tests (100% passing):

# Run all tests
python3 -m pytest tests/ -v

# Run specific test category
python3 -m pytest tests/test_processing.py -v

Test coverage:

Smoke tests (20 tests): All 20 modules tested
Integration tests (10 tests): End-to-end workflows, script integration, contract validation

# Run smoke tests only
python3 tests/smoke_test.py

# Run integration tests only
python3 tests/test_integration.py

Helper Tools

Two helper tools are provided for template users:

Analyze Agent Fit:

# Check if use case fits this template
python3 .aget/tools/analyze_agent_fit.py "Process legal contracts and extract structured data"

# Interactive mode
python3 .aget/tools/analyze_agent_fit.py --interactive

Instantiate Template:

# Create new agent from template
python3 .aget/tools/instantiate_template.py invoice-processor ~/github/invoice-processor-AGET

# Verify instantiation
python3 .aget/tools/instantiate_template.py --check ~/github/invoice-processor-AGET

Version History

Template Versioning

This template uses dual versioning:

Template Version (v2.8.0): Template-specific features and enhancements
AGET Framework (v2.7.0): Framework compliance version

Template v2.8.0 tracks format preservation capabilities added to this specialized template. AGET v2.7.0 indicates compliance with AGET framework standards and base worker template.

Template v2.8.0 (2025-11-02) - AGET Framework v2.7.0

Format Preservation Framework: Prevent L245-type catastrophic failures
New capabilities:
- OOXML verification for Track Changes, comments, annotations
- Round-trip validation (before → process → after)
- Multi-stage checkpoint system for pipeline verification
- L245 failure detection (100% format loss prevention)
Documentation:
- FORMAT_PRESERVING_DECISION_PROTOCOL.md (5-question architecture checklist)
- FORMAT_PRESERVATION_GUIDE.md (implementation guide)
Test coverage: 17 tests, 38% coverage (critical paths validated)
API: 5-module framework with simple and advanced usage patterns

Template v2.7.0 (2025-10-26) - AGET Framework v2.7.0

Initial template release
Based on L208 document processing pattern analysis
20 source modules (Gate 2A: 8, Gate 2B: 7, Gate 2C: 5)
9 configuration files (YAML-based)
9 operational protocols (tested and validated)
3 formal specifications (168 EARS requirements)
17 operational tools (15 scripts + 2 helper tools)
30 contract tests (20 smoke + 10 integration, 100% passing)
Multi-provider LLM support (OpenAI, Anthropic, Google)
Security protocols (input sanitization, content filtering, resource limits)
Document format support (docx, wikitext, extensible to additional formats)

Support

For template issues or questions:

Issue tracker: https://github.com/aget-framework/aget/issues
Template repository: https://github.com/aget-framework/template-document-processor-AGET
AGET framework: https://github.com/aget-framework

License

Licensed under the Apache License, Version 2.0. See LICENSE file for details.

Generated by AGET v2.7.0

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.aget		.aget
configs		configs
docs		docs
examples		examples
knowledge		knowledge
scripts		scripts
sessions		sessions
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
DEPLOYMENT_VERIFICATION.md		DEPLOYMENT_VERIFICATION.md
LICENSE		LICENSE
MIGRATION_v2.7_to_v2.8.md		MIGRATION_v2.7_to_v2.8.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

template-document-processor-AGET

Overview

Based on L208

Quick Start

1. Clone Template

2. Customize Configuration

3. Update Mission

4. Run Tests

5. Deploy

Architecture

Core Components

Configuration Files (9 YAMLs)

Customization Points

1. Document Validation

2. LLM Providers

3. Security

4. Metrics

Protocols

Scripts

Testing

Helper Tools

Version History

Template Versioning

Support

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

template-document-processor-AGET

Overview

Based on L208

Quick Start

1. Clone Template

2. Customize Configuration

3. Update Mission

4. Run Tests

5. Deploy

Architecture

Core Components

Configuration Files (9 YAMLs)

Customization Points

1. Document Validation

2. LLM Providers

3. Security

4. Metrics

Protocols

Scripts

Testing

Helper Tools

Version History

Template Versioning

Support

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages