Overview of Probe's architecture and component relationships.
Probe is a three-layer system:
- Rust Core: High-performance search and extraction
- Node.js SDK: AI agent orchestration
- Interfaces: CLI, Chat, MCP Server
┌─────────────────────────────────────────────────────────────┐
│ USER INTERFACES │
├──────────────────┬──────────────────┬──────────────────────┤
│ CLI │ Chat CLI/Web │ MCP Server │
│ (search, │ (examples/chat) │ (npm/src/mcp) │
│ extract, │ │ │
│ query) │ │ │
└────────┬─────────┴──────────┬───────┴──────────┬───────────┘
│ │ │
┌────▼────────────────────▼───────────────────▼───────────┐
│ NODE.JS SDK LAYER (@probelabs/probe) │
├─────────────────────────────────────────────────────────┤
│ • ProbeAgent - AI orchestration │
│ • Tool Definitions - search, query, extract, etc. │
│ • Binary Execution - Rust CLI wrapper │
│ • Session Management - History, storage │
│ • Telemetry - OpenTelemetry integration │
└────┬────────────────────────────────────────────────────┘
│
┌────▼────────────────────────────────────────────────────┐
│ RUST CORE (src/) │
├─────────────────────────────────────────────────────────┤
│ ┌──────────────┬──────────────┬──────────────┐ │
│ │ SEARCH │ EXTRACT │ LANGUAGE │ │
│ │ │ │ │ │
│ │ • Ripgrep │ • Tree-sitter│ • Parsers │ │
│ │ • BM25/TF-IDF│ • Line/Symbol│ • AST Blocks │ │
│ │ • Tokenization│ │ • Test Detection │ │
│ └──────────────┴──────────────┴──────────────┘ │
└─────────────────────────────────────────────────────────┘
SearchOptions
↓
Parse Query (elastic_query.rs)
↓
Tokenize & Normalize
↓
Scan Files (ripgrep)
↓
Parse AST (tree-sitter)
↓
Early Ranking (BM25)
↓
Extract Code Blocks
↓
Final Ranking
↓
Merge Adjacent Blocks
↓
Apply Limits
↓
LimitedSearchResults
| Module | Path | Purpose |
|---|---|---|
| Search Runner | src/search/search_runner.rs |
Main orchestration |
| Ripgrep | src/search/ripgrep_searcher.rs |
Fast file scanning |
| Elastic Query | src/search/elastic_query.rs |
Query parsing |
| Ranking | src/ranking.rs |
BM25/TF-IDF scoring |
| SIMD Ranking | src/simd_ranking.rs |
Optimized scoring |
| Tokenization | src/search/tokenization.rs |
NLP processing |
| Language | src/language/ |
Tree-sitter parsers |
| Extract | src/extract/ |
Code extraction |
All languages implement the LanguageImpl trait:
pub trait LanguageImpl {
fn get_tree_sitter_language(&self) -> TSLanguage;
fn is_acceptable_parent(&self, node: &Node) -> bool;
fn is_test_node(&self, node: &Node, source: &[u8]) -> bool;
fn get_symbol_signature(&self, node: &Node, source: &[u8]) -> Option<String>;
}Supported: Rust, JavaScript, TypeScript, Python, Go, C, C++, Java, Ruby, PHP, Swift, C#, HTML, Markdown, YAML
| Optimization | Purpose |
|---|---|
| Parser Pool | Reuse tree-sitter parsers |
| Tree Cache | Cache parsed ASTs |
| SIMD Scoring | Vector operations |
| Early Ranking | Skip non-relevant files |
| Session Cache | Avoid duplicate results |
| Rayon | Parallel processing |
The intelligent AI agent:
const agent = new ProbeAgent({
path: './src',
provider: 'anthropic'
});
const response = await agent.answer('How does auth work?');Responsibilities:
- Multi-turn conversation management
- Tool execution loop (max 30 iterations)
- JSON/Mermaid validation
- Token tracking
- Retry and fallback
| Tool | Function |
|---|---|
search |
Semantic code search |
query |
AST pattern matching |
extract |
Code block extraction |
grep |
Ripgrep search |
bash |
Shell execution |
edit |
File modification |
delegate |
Sub-agent creation |
| Module | Purpose |
|---|---|
probeTool.js |
Rust binary execution |
delegate.js |
Sub-agent orchestration |
storage/ |
Session persistence |
hooks/ |
Event callbacks |
mcp/ |
MCP server |
tokenCounter.js |
Token tracking |
RetryManager.js |
Automatic retries |
FallbackManager.js |
Provider fallback |
Direct command-line interface:
probe search "query" ./path
probe extract file.rs:42
probe query "pattern" --language rustInteractive AI chat:
probe-chat ./project # CLI mode
probe-chat --web ./project # Web modeModel Context Protocol integration:
npx -y @probelabs/probe mcpTools exposed: search_code, query_code, extract_code
CLI Args → SearchOptions
↓
Rust: perform_probe()
↓
JSON Output → Node.js
↓
ProbeAgent processes result
↓
AI Response
User Message
↓
System Prompt + Context
↓
AI Provider (Anthropic/OpenAI/Google)
↓
Tool Call → Execute → Result
↓
Continue (up to 30 iterations)
↓
Final Response
// Binary execution
const result = await execFile('probe', ['search', query, path]);
const parsed = JSON.parse(result);import { streamText } from 'ai';
import { createAnthropic } from '@ai-sdk/anthropic';
const result = await streamText({
model: createAnthropic()('claude-sonnet-4-6'),
system: systemPrompt,
messages: history,
tools: toolDefinitions
});// STDIO transport
server.connect(new StdioServerTransport());
// Tools registered
server.setRequestHandler(ListToolsRequestSchema, handleListTools);
server.setRequestHandler(CallToolRequestSchema, handleCallTool);src/
├── main.rs # CLI entry point
├── cli.rs # Argument parsing
├── lib.rs # Public API
├── models.rs # Data structures
├── ranking.rs # BM25/TF-IDF
├── simd_ranking.rs # SIMD scoring
├── query.rs # AST patterns
├── grep.rs # Ripgrep wrapper
├── search/ # Search pipeline
│ ├── mod.rs
│ ├── search_runner.rs
│ ├── ripgrep_searcher.rs
│ ├── elastic_query.rs
│ ├── tokenization.rs
│ ├── result_ranking.rs
│ └── ...
├── extract/ # Code extraction
│ ├── mod.rs
│ ├── processor.rs
│ ├── formatter.rs
│ └── ...
├── language/ # Language support
│ ├── mod.rs
│ ├── factory.rs
│ ├── rust.rs
│ ├── javascript.rs
│ └── ...
└── path_resolver/ # Dependency resolution
npm/src/
├── index.js # Public exports
├── agent/
│ ├── ProbeAgent.js # Main agent class
│ └── tools.js # Tool instantiation
├── tools/
│ ├── common.js # Shared schemas
│ ├── vercel.js # Vercel AI SDK
│ └── langchain.js # LangChain
├── mcp/
│ └── index.ts # MCP server
├── hooks/
│ └── index.js # Hook system
├── storage/
│ └── JsonChatStorage.js
└── ...
examples/chat/
├── index.js # Entry point
├── webServer.js # Web server
├── probeChat.js # Chat wrapper
├── ChatSessionManager.js
├── auth.js # Authentication
├── index.html # Web UI
├── storage/
│ └── JsonChatStorage.js
└── implement/ # Code editing
// Rust: Result<T> with anyhow
pub fn search(options: SearchOptions) -> Result<Vec<SearchResult>> {
fs::read_to_string(path)
.context("Failed to read file")?
}// Node.js: try/catch with context
try {
const result = await search(options);
} catch (error) {
throw new ProbeError(`Search failed: ${error.message}`);
}// Parser pool with Arc
Arc<DashMap<String, Vec<Parser>>>
// Cache with concurrent access
DashMap<PathBuf, CachedTree>// Tool execution events
agent.events.on('toolCall', (event) => {
console.log(`Tool: ${event.name}, Status: ${event.status}`);
});| Boundary | Protection |
|---|---|
| File Access | allowedFolders configuration |
| Path Traversal | Canonicalization |
| Bash Execution | Allow/deny patterns |
| API Keys | Environment variables |
| Web Access | Basic authentication |
| Component | Performance |
|---|---|
| Ripgrep | ~1GB/s scanning |
| Tree-sitter | ~1ms per file parsing |
| SIMD Ranking | 4-8x faster scoring |
| Parser Pool | Avoid re-initialization |
| Session Cache | Deduplicated results |
- Search Command - Search details
- API Reference - SDK API
- MCP Protocol - MCP integration
- Performance - Optimization guide