Status: 🟡 DORMANT (stable template - v2.8.0 released 2025-11-03)
Production-ready template. Activated when new document processing agents are needed.
Template Version: 2.8.0 AGET Framework: v2.7.0 Type: AGET Template (specialized from worker template) Domain: Document Processing
A production-ready template for creating document processing agents with LLM pipelines, security protocols, format preservation, and multi-provider support.
Note: This template is based on template-worker-aget v2.7.0 with specialized document processing capabilities. Template version (v2.8.0) tracks template-specific features independently from AGET framework version (v2.7.0).
This template provides a complete foundation for agents that:
- ✅ Process documents using LLM assistance (OpenAI, Anthropic, Google)
- ✅ Preserve DOCX formatting (Track Changes, comments, annotations)
- ✅ Prevent catastrophic format loss (L245-type failures)
- ✅ Support batch operations with validation pipelines
- ✅ Implement security protocols (injection prevention, content filtering)
- ✅ Provide caching, metrics, and observability
- ✅ Enable task decomposition for large documents
- ✅ Support rollback and version management
This template is based on L208: Document Processing Agent Template Pattern, which analyzed production document processing agents to extract common patterns and best practices.
Time Savings: 60-70% reduction in new agent setup (3-5 hours → 1-2 hours)
gh repo clone aget-framework/template-document-processor-AGET my-document-agent
cd my-document-agentUpdate these files:
.aget/version.json:
{
"agent_name": "my-document-agent",
"domain": "your-domain"
}configs/validation_rules.yaml:
max_file_size_mb: 10
allowed_extensions: [".pdf", ".docx", ".txt", ".md"]
required_validations:
- file_size
- file_format
- content_safetyconfigs/llm_providers.yaml:
providers:
openai:
api_key_env: OPENAI_API_KEY
enabled: true
anthropic:
api_key_env: ANTHROPIC_API_KEY
enabled: false
budget:
monthly_limit_usd: 300.0Edit AGENTS.md to describe your agent's specific purpose and domain.
python3 -m pytest tests/ -vgit remote set-url origin <your-repo-url>
git add .
git commit -m "feat: Initialize from template-document-processor-AGET v2.7.0"
git push -u origin mainsrc/ingestion/- Queue management, validation, batch processingsrc/processing/- LLM providers, model routing, caching, schema validationsrc/output/- Publishing, version management, rollbacksrc/security/- Input sanitization, content filtering, resource limitingsrc/pipeline/- Task decomposition, orchestration, metricssrc/wikitext/- Document format support (docx, wikitext, extensible for additional formats)
configs/validation_rules.yaml- Document validation criteriaconfigs/llm_providers.yaml- LLM provider configurationconfigs/model_routing.yaml- Model selection strategyconfigs/models.yaml- Model definitions and capabilitiesconfigs/security_policy.yaml- Security and content filteringconfigs/processing_limits.yaml- Resource limits (tokens, time, cost)configs/caching.yaml- Cache settings and TTLconfigs/metrics.yaml- Metrics collection and alertsconfigs/orchestration.yaml- Task decomposition and pipeline
Customize configs/validation_rules.yaml for your document format:
- File extensions
- Size limits
- Format-specific validation rules
Set up providers in configs/llm_providers.yaml:
- API keys (use environment variables)
- Model selection (cost vs quality)
- Fallback chain
- Budget limits
Configure security in configs/security_policy.yaml:
- Input sanitization rules
- Content filtering (PII detection)
- Resource limits (tokens, time, cost)
Define metrics in configs/metrics.yaml:
- Accuracy tracking
- Latency monitoring (p50/p95/p99)
- Cost tracking
- Alert thresholds
The template includes 10 operational protocols in .aget/docs/protocols/:
- Format Preservation - Prevent L245-type catastrophic failures (Track Changes loss)
- Queue Management - Managing document queues
- Processing Authorization - Approval gates and STOP protocol
- Validation Pipeline - Pre/post validation
- Rollback - Version management and recovery
- Security Validation - Input/output sanitization
- Task Decomposition - Breaking large documents into subtasks
- Model Routing - Selecting optimal LLM for each task
- Caching - LLM response caching for cost/speed
- Metrics Collection - Tracking accuracy/latency/cost
Each protocol includes bash commands and code examples.
Format Preservation Guide: See .aget/docs/FORMAT_PRESERVATION_GUIDE.md for complete usage guide.
The template provides 17 operational tools (15 scripts + 2 helper tools):
Session Management:
session_protocol.py- Wake up/wind down/sign offqueue_status.py- Queue management CLIhealth_check.py- System diagnostics
Core Operations (scripts/):
validate.py- Document validation CLIprocess.py- End-to-end processing pipelinequeue_status.py- Queue status and managementrollback.py- Version rollback operationscache_setup.py- Cache initializationmetrics.py- Metrics display and export
Supporting Operations (scripts/):
health_check.py- System health diagnosticssecurity_check.py- Security validationaudit.py- Audit trail viewermodel_router.py- Model routing recommendationscache_stats.py- Cache statisticscache_clear.py- Cache clearing
Specialized Tools (scripts/ and .aget/tools/):
session_protocol.py- Session lifecycle (wake/wind-down/sign-off)checkpoint.py- Checkpoint save/load/listtask_planner.py- Task decomposition planning.aget/tools/analyze_agent_fit.py- Use case fit analysis.aget/tools/instantiate_template.py- Template instantiation helper
Template includes 30 contract tests (100% passing):
# Run all tests
python3 -m pytest tests/ -v
# Run specific test category
python3 -m pytest tests/test_processing.py -vTest coverage:
- Smoke tests (20 tests): All 20 modules tested
- Integration tests (10 tests): End-to-end workflows, script integration, contract validation
# Run smoke tests only
python3 tests/smoke_test.py
# Run integration tests only
python3 tests/test_integration.pyTwo helper tools are provided for template users:
Analyze Agent Fit:
# Check if use case fits this template
python3 .aget/tools/analyze_agent_fit.py "Process legal contracts and extract structured data"
# Interactive mode
python3 .aget/tools/analyze_agent_fit.py --interactiveInstantiate Template:
# Create new agent from template
python3 .aget/tools/instantiate_template.py invoice-processor ~/github/invoice-processor-AGET
# Verify instantiation
python3 .aget/tools/instantiate_template.py --check ~/github/invoice-processor-AGETThis template uses dual versioning:
- Template Version (v2.8.0): Template-specific features and enhancements
- AGET Framework (v2.7.0): Framework compliance version
Template v2.8.0 tracks format preservation capabilities added to this specialized template. AGET v2.7.0 indicates compliance with AGET framework standards and base worker template.
Template v2.8.0 (2025-11-02) - AGET Framework v2.7.0
- Format Preservation Framework: Prevent L245-type catastrophic failures
- New capabilities:
- OOXML verification for Track Changes, comments, annotations
- Round-trip validation (before → process → after)
- Multi-stage checkpoint system for pipeline verification
- L245 failure detection (100% format loss prevention)
- Documentation:
- FORMAT_PRESERVING_DECISION_PROTOCOL.md (5-question architecture checklist)
- FORMAT_PRESERVATION_GUIDE.md (implementation guide)
- Test coverage: 17 tests, 38% coverage (critical paths validated)
- API: 5-module framework with simple and advanced usage patterns
Template v2.7.0 (2025-10-26) - AGET Framework v2.7.0
- Initial template release
- Based on L208 document processing pattern analysis
- 20 source modules (Gate 2A: 8, Gate 2B: 7, Gate 2C: 5)
- 9 configuration files (YAML-based)
- 9 operational protocols (tested and validated)
- 3 formal specifications (168 EARS requirements)
- 17 operational tools (15 scripts + 2 helper tools)
- 30 contract tests (20 smoke + 10 integration, 100% passing)
- Multi-provider LLM support (OpenAI, Anthropic, Google)
- Security protocols (input sanitization, content filtering, resource limits)
- Document format support (docx, wikitext, extensible to additional formats)
For template issues or questions:
- Issue tracker: https://github.com/aget-framework/aget/issues
- Template repository: https://github.com/aget-framework/template-document-processor-AGET
- AGET framework: https://github.com/aget-framework
Licensed under the Apache License, Version 2.0. See LICENSE file for details.
Copyright 2025 AGET Framework Contributors
Generated by AGET v2.7.0