Skip to content

[FEATURE][POLICY]: Policy audit trail and decision logging #2225

@crivetimihai

Description

@crivetimihai

📝 Feature: Policy Audit Trail & Decision Logging

Goal

Implement comprehensive audit logging for all policy decisions with structured records, decision explanations, SIEM integration, and long-term retention for compliance and forensics.

Why Now?

  1. Compliance Requirement: FedRAMP, HIPAA, SOC2 require audit trails for access decisions
  2. Forensic Capability: Security incidents require understanding who accessed what and when
  3. Scattered Logging: Current logging is inconsistent across policy engines
  4. No Decision Explanation: Users don't know why they were denied access
  5. SIEM Gap: Security teams need to integrate access decisions into their SIEM

📖 User Stories

US-1: Security Analyst - Query Access Decisions

As a Security Analyst
I want to query all access decisions for a user or resource
So that I can investigate suspicious activity

Acceptance Criteria:

Given comprehensive audit logging is enabled:
When I query decisions:
  GET /audit/decisions?
    subject_email=user@example.com&
    decision=deny&
    start_time=2024-01-01&
    end_time=2024-01-31
Then I receive all matching decisions with:
  - Timestamp and request_id
  - Subject details (email, roles, teams, clearance)
  - Action and resource
  - Decision (allow/deny) and reason
  - Matching policies with explanations
  - Context (IP, time, MFA status)
US-2: Security Team - Export to SIEM

As a Security Team Lead
I want access decisions exported to our SIEM
So that we can correlate with other security events

Acceptance Criteria:

Given SIEM integration is configured:
  siem:
    type: splunk
    endpoint: https://splunk.example.com:8088
Then all policy decisions are:
  - Sent in real-time via HTTP Event Collector
  - Formatted in structured JSON
  - Include correlation IDs for tracing
  - Batched for performance (max 100 events)

🏗 Architecture

Audit Record Schema

{
  "id": "decision-uuid",
  "timestamp": "2024-01-15T10:30:00Z",
  "request_id": "req-12345",
  "gateway_node": "gateway-1",
  
  "subject": {
    "type": "user",
    "id": "user-uuid",
    "email": "user@example.com",
    "roles": ["developer"],
    "teams": ["engineering"],
    "clearance_level": 2
  },
  
  "action": "tools.invoke",
  "resource": {
    "type": "tool",
    "id": "db-query",
    "server": "production-db"
  },
  
  "decision": "deny",
  "reason": "Insufficient clearance level",
  
  "matching_policies": [
    {
      "id": "mac-policy-1",
      "name": "production-data-access",
      "engine": "mac",
      "result": "deny",
      "explanation": "User clearance (2) < Resource classification (4)"
    }
  ],
  
  "context": {
    "ip_address": "10.0.0.50",
    "user_agent": "claude-desktop/1.0",
    "mfa_verified": true,
    "time_of_day": "10:30"
  },
  
  "duration_ms": 5
}

📋 Implementation Tasks

  • Define audit record schema
  • Create database table for decisions
  • Implement decision logger service
  • Add structured JSON logging
  • Implement decision query API
  • Add SIEM integrations:
    • Splunk HEC
    • Elasticsearch
    • Webhook (generic)
  • Create Admin UI for audit viewer
  • Add log retention policies
  • Add log rotation
  • Write unit tests
  • Create documentation
  • Pass make verify checks

⚙️ Configuration Example

audit:
  decisions:
    enabled: true
    log_allowed: true
    log_denied: true
    include_context: true
    include_explanation: true
    
  storage:
    type: database
    retention_days: 365
    partition_by: month
    
  siem:
    enabled: true
    type: splunk  # splunk | elasticsearch | webhook
    endpoint: "https://splunk.example.com:8088"
    token_env: "SPLUNK_HEC_TOKEN"
    batch_size: 100
    flush_interval_seconds: 5
    
  real_time:
    enabled: true
    websocket_endpoint: "/ws/audit"

✅ Success Criteria

  • All policy decisions logged with full context
  • Query API functional with filtering
  • SIEM integration (Splunk, Elasticsearch)
  • Admin UI audit viewer
  • Log retention and rotation working
  • Real-time decision stream
  • 80%+ test coverage

🔗 Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    COULDP3: Nice-to-have features with minimal impact if left out; included if time permitsenhancementNew feature or requestpluginspythonPython / backend development (FastAPI)securityImproves securitysweng-group-5Group 5 - Policy-as-Code Security & Compliance AutomationtcdSwEng Projects

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions