Skip to content

NOISSUE - Introduce computation runner, log forwarder, ingress, and egress proxy services.#559

Merged
drasko merged 24 commits intoultravioletrs:mainfrom
SammyOina:new-arch
Feb 9, 2026
Merged

NOISSUE - Introduce computation runner, log forwarder, ingress, and egress proxy services.#559
drasko merged 24 commits intoultravioletrs:mainfrom
SammyOina:new-arch

Conversation

@SammyOina
Copy link
Contributor

@SammyOina SammyOina commented Dec 17, 2025

Agent Architecture Documentation

Overview

The Agent has been refactored from a monolithic service into a microservices architecture with specialized components for security, isolation, and maintainability. The Agent now acts as an orchestrator rather than executing computations directly.

Architecture Diagram

graph TB
    subgraph "External"
        CLI[CLI/User]
        Manager[Computation management service]
    end
    
    subgraph "CVM Boundary"
        subgraph "Ingress"
            IngressProxy[Ingress Proxy<br/>ATLS Termination]
        end
        
        subgraph "Core Services"
            Agent[Agent Service<br/>Orchestrator]
            AttSvc[Attestation Service<br/>Quote Provider]
            LogFwd[Log Forwarder<br/>Event Aggregator]
            Runner[Computation Runner<br/>Execution Engine]
        end
        
        subgraph "Egress"
            EgressProxy[Egress Proxy<br/>Traffic Filter]
        end
    end
    
    CLI -->|gRPC/ATLS| IngressProxy
    IngressProxy -->|gRPC/Plaintext| Agent
    Manager -->|gRPC| Agent
    
    Agent -->|Unix Socket| AttSvc
    Agent -->|Unix Socket| Runner
    Agent -->|Unix Socket| LogFwd
    
    Runner -->|Unix Socket| LogFwd
    Runner -->|HTTP/HTTPS| EgressProxy
    Agent -->|HTTP/HTTPS| EgressProxy
    
    EgressProxy -->|Filtered| Internet[Internet]
    
    style IngressProxy fill:#e1f5ff
    style EgressProxy fill:#ffe1e1
    style Agent fill:#fff4e1
    style Runner fill:#f0e1ff
Loading

Service Components

1. Ingress Proxy (New)

Purpose: Single entry point for external connections with ATLS termination.

Responsibilities:

  • Terminates ATLS/TLS connections from CLI
  • Validates client certificates (if mTLS)
  • Forwards decrypted traffic to Agent backend
  • Supports per-computation dynamic configuration

Configuration: Receives AgentConfig from Manager per computation

  • Port, certificates, ATLS flag
  • Starts/stops dynamically with computation lifecycle

Location: cmd/ingress-proxy, pkg/ingress

Communication:

  • Inbound: gRPC/ATLS from CLI (port 7002 by default)
  • Outbound: gRPC/Plaintext to Agent (localhost:7001)

2. Agent Service (Refactored)

Purpose: Orchestrator and state machine for computation lifecycle.

Responsibilities:

  • Manages computation state (IdleReceivingManifestReceivingAlgorithmReceivingDataRunningConsumingResultsComplete)
  • Validates manifests and data hashes
  • Delegates execution to Computation Runner
  • Coordinates with Manager via gRPC stream
  • Does NOT execute code or handle network directly

Configuration:

  • AGENT_ENABLE_ATLS=false (ATLS now handled by Ingress Proxy)
  • Binds to localhost:7001 (not exposed externally)

Location: cmd/agent, agent/service.go

Communication:

  • Inbound: gRPC from Ingress Proxy (localhost), Manager (CVM stream)
  • Outbound: Unix sockets to Attestation Service, Runner, Log Forwarder

3. Computation Runner (New)

Purpose: Isolated execution environment for algorithms.

Responsibilities:

  • Executes Python/Binary algorithms in isolated process
  • Manages algorithm requirements (pip install, etc.)
  • Writes results to shared filesystem
  • Sends execution logs to Log Forwarder

Configuration:

  • RUNNER_LOG_LEVEL or AGENT_LOG_LEVEL
  • Uses egress proxy for network access

Location: cmd/computation-runner, agent/runner/service

Communication:

  • Inbound: Unix socket from Agent (/run/cocos/runner.sock)
  • Outbound: Unix socket to Log Forwarder, HTTP/HTTPS via Egress Proxy

4. Log Forwarder (New)

Purpose: Centralized logging and event aggregation.

Responsibilities:

  • Receives logs from Agent, Runner, and other services
  • Forwards logs to Manager via gRPC
  • Buffers logs during network interruptions

Configuration:

  • LOG_FORWARDER_LOG_LEVEL or AGENT_LOG_LEVEL
  • AGENT_CVM_GRPC_* for Manager connection

Location: cmd/log-forwarder, agent/log/service

Communication:

  • Inbound: Unix socket from Agent, Runner (/run/cocos/log.sock)
  • Outbound: gRPC to Manager

5. Attestation Service (Existing, now standalone)

Purpose: Provides attestation quotes for the CVM.

Responsibilities:

  • Generates SNP/vTPM/Azure attestation reports
  • Handles nonce-based attestation requests
  • Isolated from Agent for security

Configuration: Minimal, uses platform-specific attestation APIs

Location: cmd/attestation-service

Communication:

  • Inbound: Unix socket from Agent (/run/cocos/attestation.sock)
  • Outbound: Platform attestation APIs (SNP, vTPM)

6. Egress Proxy (New)

Purpose: Controls and monitors all outbound network traffic.

Responsibilities:

  • HTTP/HTTPS proxy for Runner and Agent
  • Enforces allowlist policies (TODO)
  • Logs all outbound requests
  • Prevents unauthorized network access

Configuration:

  • COCOS_PROXY_PORT=3128
  • Services use HTTP_PROXY/HTTPS_PROXY environment variables

Location: cmd/egress-proxy, pkg/egress

Communication:

  • Inbound: HTTP/HTTPS from Agent, Runner (localhost:3128)
  • Outbound: Filtered HTTP/HTTPS to Internet

Data Flow

Computation Lifecycle

sequenceDiagram
    participant M as Manager
    participant A as Agent
    participant IP as Ingress Proxy
    participant R as Runner
    participant L as Log Forwarder
    participant AS as Attestation Svc
    
    M->>A: ComputationRunReq (manifest + AgentConfig)
    A->>IP: Start(AgentConfig, Computation)
    Note over IP: Starts ATLS listener on port 7002
    A->>A: Transition to ReceivingManifest
    
    CLI->>IP: Algo(algorithm, requirements)
    IP->>A: Forward (plaintext)
    A->>A: Validate hash, save to disk
    A->>A: Transition to ReceivingData
    
    CLI->>IP: Data(dataset)
    IP->>A: Forward
    A->>A: Validate hash, save to /datasets
    A->>A: Transition to Running
    
    A->>R: Run(algorithm, requirements, args)
    R->>L: SendLog("Starting execution")
    L->>M: Forward logs
    R->>R: Execute algorithm
    R->>L: SendLog("Execution complete")
    R->>A: RunResponse(results)
    
    A->>A: Zip results, transition to ConsumingResults
    
    CLI->>IP: Result()
    IP->>A: Forward
    A->>CLI: Stream results
    
    A->>A: Transition to Complete
    M->>A: StopComputation
    A->>IP: Stop()
    A->>R: Stop()
    Note over IP: Stops ATLS listener
Loading

Attestation Flow

sequenceDiagram
    participant CLI
    participant IP as Ingress Proxy
    participant A as Agent
    participant AS as Attestation Svc
    
    CLI->>IP: Attestation(nonce)
    IP->>A: Forward
    A->>AS: GetAttestation(nonce, type)
    AS->>AS: Generate SNP/vTPM quote
    AS->>A: Return quote
    A->>IP: Forward quote
    IP->>CLI: Stream attestation
Loading

Configuration Management

Manager → Agent Configuration Flow

  1. Manager creates /etc/cocos/environment file with:

    AGENT_LOG_LEVEL=info
    AGENT_CVM_GRPC_URL=<manager-url>
    AGENT_CVM_ID=<cvm-id>
    AGENT_CERTS_TOKEN=<token>
    AGENT_CVM_CA_URL=<ca-url>
  2. Systemd services load this file via EnvironmentFile=/etc/cocos/environment

  3. All services respect AGENT_LOG_LEVEL for consistent logging

  4. Per-computation config sent via ComputationRunReq.AgentConfig:

    • Port, certificates, ATLS flag
    • Ingress Proxy and Agent Server start with these settings

Service Dependencies (Systemd)

egress-proxy.service
  ↓
log-forwarder.service
  ↓
attestation-service.service
  ↓
computation-runner.service
  ↓
ingress-proxy.service
  ↓
cocos-agent.service

Security Improvements

Attack Surface Reduction

  • Agent: No longer executes untrusted code
  • Runner: Isolated process, network restricted via egress proxy
  • Ingress Proxy: Single point for ATLS validation
  • Egress Proxy: Prevents unauthorized outbound connections

Network Isolation

  • Internal: Unix sockets (Agent ↔ Runner, Agent ↔ Attestation, etc.)
  • External Ingress: ATLS-only via Ingress Proxy
  • External Egress: Filtered via Egress Proxy

Privilege Separation

  • Each service runs as separate process
  • Runner can be sandboxed further (seccomp, namespaces)
  • Attestation Service isolated from computation

File System Layout

/run/cocos/
├── runner.sock          # Computation Runner socket
├── log.sock             # Log Forwarder socket
└── attestation.sock     # Attestation Service socket

/var/lib/cocos/agent/    # Agent storage
└── pending_messages/    # Buffered messages

/cocos/
├── algo                 # Algorithm binary/script
├── datasets/            # Input datasets
└── results/             # Computation results

/var/log/cocos/
├── agent.stdout
├── agent.stderr
├── runner.stdout
├── runner.stderr
├── log-forwarder.stdout
└── log-forwarder.stderr

Key Design Decisions

  1. Unix Sockets for Internal Communication: Lower latency, no network exposure
  2. HTTP/2 for Ingress Proxy: Supports gRPC, efficient multiplexing
  3. Shared Filesystem: Agent writes datasets, Runner reads/writes results
  4. Dynamic Proxy Configuration: Ingress Proxy starts/stops per computation
  5. Generic Proxy Design: Ingress Proxy reusable for Manager, Attestation Service

Migration from Old Architecture

Before (Monolithic Agent)

  • Agent handled ATLS, execution, logging, attestation
  • Direct network access from Agent
  • Single point of failure

After (Microservices)

  • Agent orchestrates, delegates execution
  • Ingress Proxy handles ATLS
  • Egress Proxy controls network
  • Log Forwarder aggregates logs
  • Runner executes in isolation

Backward Compatibility

  • CLI protocol unchanged (gRPC)
  • Manager protocol unchanged (CVM stream)
  • AgentConfig per-computation still supported
  • Attestation API unchanged

@SammyOina SammyOina self-assigned this Dec 17, 2025
…s proxy services.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>
…new architecture and repository.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>
…mmit, add log-forwarder pre-start hook, and rename proxy binaries.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>
…e logging for service connections and message processing.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>
…t logging to slog, and adjust ingress/egress proxy build and install steps.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>
Signed-off-by: Sammy Oina <sammyoina@gmail.com>
Signed-off-by: Sammy Oina <sammyoina@gmail.com>
Signed-off-by: Sammy Oina <sammyoina@gmail.com>
…se specific commit hashes

Signed-off-by: Sammy Oina <sammyoina@gmail.com>
…and add gRPC test utility

Signed-off-by: Sammy Oina <sammyoina@gmail.com>
…rom a new repository, change agent gRPC port to 7001, and add a gRPC test client.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>
…ingress proxy to port 7002, and update build hashes.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>
…component versions.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>
…ent versions across several packages.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>
Signed-off-by: Sammy Oina <sammyoina@gmail.com>
…update component versions.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>
…d reduce agent logging verbosity.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>
…gRPC test

Signed-off-by: Sammy Oina <sammyoina@gmail.com>
Signed-off-by: Sammy Oina <sammyoina@gmail.com>
@codecov
Copy link

codecov bot commented Dec 18, 2025

Codecov Report

❌ Patch coverage is 76.02459% with 117 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.98%. Comparing base (c422afe) to head (42f2ff0).
⚠️ Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
pkg/egress/proxy.go 59.39% 49 Missing and 5 partials ⚠️
agent/runner/service/service.go 73.97% 10 Missing and 9 partials ⚠️
agent/cvms/api/grpc/client.go 37.50% 13 Missing and 2 partials ⚠️
pkg/ingress/proxy.go 91.26% 5 Missing and 4 partials ⚠️
pkg/server/grpc/grpc.go 40.00% 6 Missing and 3 partials ⚠️
agent/service.go 80.00% 4 Missing and 3 partials ⚠️
pkg/clients/grpc/log/client.go 91.66% 1 Missing and 1 partial ⚠️
pkg/clients/grpc/runner/client.go 87.50% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #559      +/-   ##
==========================================
+ Coverage   68.80%   69.98%   +1.17%     
==========================================
  Files          77       85       +8     
  Lines        5594     6043     +449     
==========================================
+ Hits         3849     4229     +380     
- Misses       1402     1446      +44     
- Partials      343      368      +25     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>
Signed-off-by: Sammy Oina <sammyoina@gmail.com>
Signed-off-by: Sammy Oina <sammyoina@gmail.com>
…oxy tests.

Signed-off-by: Sammy Oina <sammyoina@gmail.com>
Signed-off-by: Sammy Oina <sammyoina@gmail.com>
@drasko drasko merged commit a3265bc into ultravioletrs:main Feb 9, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants