System Architecture

Overview of Probe's architecture and component relationships.

TL;DR

Probe is a three-layer system:

Rust Core: High-performance search and extraction
Node.js SDK: AI agent orchestration
Interfaces: CLI, Chat, MCP Server

System Overview

┌─────────────────────────────────────────────────────────────┐
│                    USER INTERFACES                          │
├──────────────────┬──────────────────┬──────────────────────┤
│  CLI             │  Chat CLI/Web    │  MCP Server          │
│  (search,        │  (examples/chat) │  (npm/src/mcp)       │
│   extract,       │                  │                      │
│   query)         │                  │                      │
└────────┬─────────┴──────────┬───────┴──────────┬───────────┘
         │                    │                   │
    ┌────▼────────────────────▼───────────────────▼───────────┐
    │           NODE.JS SDK LAYER (@probelabs/probe)          │
    ├─────────────────────────────────────────────────────────┤
    │ • ProbeAgent - AI orchestration                         │
    │ • Tool Definitions - search, query, extract, etc.       │
    │ • Binary Execution - Rust CLI wrapper                   │
    │ • Session Management - History, storage                 │
    │ • Telemetry - OpenTelemetry integration                │
    └────┬────────────────────────────────────────────────────┘
         │
    ┌────▼────────────────────────────────────────────────────┐
    │              RUST CORE (src/)                           │
    ├─────────────────────────────────────────────────────────┤
    │  ┌──────────────┬──────────────┬──────────────┐        │
    │  │ SEARCH       │ EXTRACT      │ LANGUAGE     │        │
    │  │              │              │              │        │
    │  │ • Ripgrep    │ • Tree-sitter│ • Parsers    │        │
    │  │ • BM25/TF-IDF│ • Line/Symbol│ • AST Blocks │        │
    │  │ • Tokenization│             │ • Test Detection │    │
    │  └──────────────┴──────────────┴──────────────┘        │
    └─────────────────────────────────────────────────────────┘

Rust Core

Search Pipeline

SearchOptions
    ↓
Parse Query (elastic_query.rs)
    ↓
Tokenize & Normalize
    ↓
Scan Files (ripgrep)
    ↓
Parse AST (tree-sitter)
    ↓
Early Ranking (BM25)
    ↓
Extract Code Blocks
    ↓
Final Ranking
    ↓
Merge Adjacent Blocks
    ↓
Apply Limits
    ↓
LimitedSearchResults

Key Modules

Module	Path	Purpose
Search Runner	`src/search/search_runner.rs`	Main orchestration
Ripgrep	`src/search/ripgrep_searcher.rs`	Fast file scanning
Elastic Query	`src/search/elastic_query.rs`	Query parsing
Ranking	`src/ranking.rs`	BM25/TF-IDF scoring
SIMD Ranking	`src/simd_ranking.rs`	Optimized scoring
Tokenization	`src/search/tokenization.rs`	NLP processing
Language	`src/language/`	Tree-sitter parsers
Extract	`src/extract/`	Code extraction

Language Support

All languages implement the LanguageImpl trait:

pub trait LanguageImpl {
    fn get_tree_sitter_language(&self) -> TSLanguage;
    fn is_acceptable_parent(&self, node: &Node) -> bool;
    fn is_test_node(&self, node: &Node, source: &[u8]) -> bool;
    fn get_symbol_signature(&self, node: &Node, source: &[u8]) -> Option<String>;
}

Supported: Rust, JavaScript, TypeScript, Python, Go, C, C++, Java, Ruby, PHP, Swift, C#, HTML, Markdown, YAML

Performance Optimizations

Optimization	Purpose
Parser Pool	Reuse tree-sitter parsers
Tree Cache	Cache parsed ASTs
SIMD Scoring	Vector operations
Early Ranking	Skip non-relevant files
Session Cache	Avoid duplicate results
Rayon	Parallel processing

Node.js SDK Layer

ProbeAgent

The intelligent AI agent:

const agent = new ProbeAgent({
  path: './src',
  provider: 'anthropic'
});

const response = await agent.answer('How does auth work?');

Responsibilities:

Multi-turn conversation management
Tool execution loop (max 30 iterations)
JSON/Mermaid validation
Token tracking
Retry and fallback

Tool System

Tool	Function
`search`	Semantic code search
`query`	AST pattern matching
`extract`	Code block extraction
`grep`	Ripgrep search
`bash`	Shell execution
`edit`	File modification
`delegate`	Sub-agent creation

Supporting Infrastructure

Module	Purpose
`probeTool.js`	Rust binary execution
`delegate.js`	Sub-agent orchestration
`storage/`	Session persistence
`hooks/`	Event callbacks
`mcp/`	MCP server
`tokenCounter.js`	Token tracking
`RetryManager.js`	Automatic retries
`FallbackManager.js`	Provider fallback

User Interfaces

CLI (Rust)

Direct command-line interface:

probe search "query" ./path
probe extract file.rs:42
probe query "pattern" --language rust

Chat (Node.js)

Interactive AI chat:

probe-chat ./project          # CLI mode
probe-chat --web ./project    # Web mode

MCP Server

Model Context Protocol integration:

npx -y @probelabs/probe mcp

Tools exposed: search_code, query_code, extract_code

Data Flow

Search Request

CLI Args → SearchOptions
    ↓
Rust: perform_probe()
    ↓
JSON Output → Node.js
    ↓
ProbeAgent processes result
    ↓
AI Response

AI Agent Loop

User Message
    ↓
System Prompt + Context
    ↓
AI Provider (Anthropic/OpenAI/Google)
    ↓
Tool Call → Execute → Result
    ↓
Continue (up to 30 iterations)
    ↓
Final Response

Integration Points

Rust → Node.js

// Binary execution
const result = await execFile('probe', ['search', query, path]);
const parsed = JSON.parse(result);

Node.js → AI Providers

import { streamText } from 'ai';
import { createAnthropic } from '@ai-sdk/anthropic';

const result = await streamText({
  model: createAnthropic()('claude-sonnet-4-6'),
  system: systemPrompt,
  messages: history,
  tools: toolDefinitions
});

MCP Integration

// STDIO transport
server.connect(new StdioServerTransport());

// Tools registered
server.setRequestHandler(ListToolsRequestSchema, handleListTools);
server.setRequestHandler(CallToolRequestSchema, handleCallTool);

Module Organization

Rust Core

src/
├── main.rs             # CLI entry point
├── cli.rs              # Argument parsing
├── lib.rs              # Public API
├── models.rs           # Data structures
├── ranking.rs          # BM25/TF-IDF
├── simd_ranking.rs     # SIMD scoring
├── query.rs            # AST patterns
├── grep.rs             # Ripgrep wrapper
├── search/             # Search pipeline
│   ├── mod.rs
│   ├── search_runner.rs
│   ├── ripgrep_searcher.rs
│   ├── elastic_query.rs
│   ├── tokenization.rs
│   ├── result_ranking.rs
│   └── ...
├── extract/            # Code extraction
│   ├── mod.rs
│   ├── processor.rs
│   ├── formatter.rs
│   └── ...
├── language/           # Language support
│   ├── mod.rs
│   ├── factory.rs
│   ├── rust.rs
│   ├── javascript.rs
│   └── ...
└── path_resolver/      # Dependency resolution

Node.js SDK

npm/src/
├── index.js            # Public exports
├── agent/
│   ├── ProbeAgent.js   # Main agent class
│   └── tools.js        # Tool instantiation
├── tools/
│   ├── common.js       # Shared schemas
│   ├── vercel.js       # Vercel AI SDK
│   └── langchain.js    # LangChain
├── mcp/
│   └── index.ts        # MCP server
├── hooks/
│   └── index.js        # Hook system
├── storage/
│   └── JsonChatStorage.js
└── ...

Chat Application

examples/chat/
├── index.js            # Entry point
├── webServer.js        # Web server
├── probeChat.js        # Chat wrapper
├── ChatSessionManager.js
├── auth.js             # Authentication
├── index.html          # Web UI
├── storage/
│   └── JsonChatStorage.js
└── implement/          # Code editing

Key Design Patterns

Error Handling

// Rust: Result<T> with anyhow
pub fn search(options: SearchOptions) -> Result<Vec<SearchResult>> {
    fs::read_to_string(path)
        .context("Failed to read file")?
}

// Node.js: try/catch with context
try {
  const result = await search(options);
} catch (error) {
  throw new ProbeError(`Search failed: ${error.message}`);
}

Thread Safety

// Parser pool with Arc
Arc<DashMap<String, Vec<Parser>>>

// Cache with concurrent access
DashMap<PathBuf, CachedTree>

Event System

// Tool execution events
agent.events.on('toolCall', (event) => {
  console.log(`Tool: ${event.name}, Status: ${event.status}`);
});

Security Boundaries

Boundary	Protection
File Access	`allowedFolders` configuration
Path Traversal	Canonicalization
Bash Execution	Allow/deny patterns
API Keys	Environment variables
Web Access	Basic authentication

Performance Characteristics

Component	Performance
Ripgrep	~1GB/s scanning
Tree-sitter	~1ms per file parsing
SIMD Ranking	4-8x faster scoring
Parser Pool	Avoid re-initialization
Session Cache	Deduplicated results

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System Architecture

TL;DR

System Overview

Rust Core

Search Pipeline

Key Modules

Language Support

Performance Optimizations

Node.js SDK Layer

ProbeAgent

Tool System

Supporting Infrastructure

User Interfaces

CLI (Rust)

Chat (Node.js)

MCP Server

Data Flow

Search Request

AI Agent Loop

Integration Points

Rust → Node.js

Node.js → AI Providers

MCP Integration

Module Organization

Rust Core

Node.js SDK

Chat Application

Key Design Patterns

Error Handling

Thread Safety

Event System

Security Boundaries

Performance Characteristics

Related Documentation

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

System Architecture

TL;DR

System Overview

Rust Core

Search Pipeline

Key Modules

Language Support

Performance Optimizations

Node.js SDK Layer

ProbeAgent

Tool System

Supporting Infrastructure

User Interfaces

CLI (Rust)

Chat (Node.js)

MCP Server

Data Flow

Search Request

AI Agent Loop

Integration Points

Rust → Node.js

Node.js → AI Providers

MCP Integration

Module Organization

Rust Core

Node.js SDK

Chat Application

Key Design Patterns

Error Handling

Thread Safety

Event System

Security Boundaries

Performance Characteristics

Related Documentation