Krawall

Prove that your AI chatbot's API bill is an unguarded attack surface.

Krawall is a chatbot stress-testing platform that behaves like an overly engaged, perfectly valid user. No prompt injection. No jailbreaking. No auth bypass. Just very, very costly conversations.

Why Krawall?

Companies are shipping AI chatbots - sometimes customer-facing - that are thin wrappers around commercial LLM APIs with zero cost controls. Most deployments share the same blind spots:

No per-session token limits - the chatbot will happily generate 10,000+ tokens per response if you ask nicely
No meaningful rate limiting - or if there is, it's trivially high
No cost awareness - the backend forwards everything to the API and someone pays whatever the bill says
No conversation depth limits - sessions grow indefinitely, accumulating context window costs

This is the cloud billing equivalent of leaving your database exposed on the internet without a password. Except it's worse, because the "attacker" looks exactly like a legitimate user.

Krawall exploits this by mimicking natural conversation flows that are perfectly valid but maximally expensive: repetitive prompts, requests for verbose structured output (hi, XML), "helpful clarifications" on every response, and multi-turn context accumulation. Every request is something a real customer might send. Nothing to see here.

The goal isn't to cause damage - it's to prove the attack vector is real, trivial to exploit, and that companies need to take API cost security as seriously as application security. Read the full story behind the project in Let's Burn Some Tokens.

Highlights

Multi-Protocol HTTP/REST, WebSocket, gRPC, and Server-Sent Events - test any chatbot endpoint regardless of protocol.	8 Provider Presets One-click setup for OpenAI, Anthropic, Google Gemini, Azure OpenAI, Ollama, and custom endpoints.	12 Scenario Templates Pre-built tests across 7 categories: Stress, Edge Case, Context, Performance, Logic, Krawall, and Attack Surface.
Real-Time Metrics Response time, token usage, error rates, repetition detection, and quality scoring with P50/P95/P99 percentiles.	Fire-and-Forget Execution Async session processing via BullMQ with concurrency control, rate limiting, and automatic retry.	A/B Comparison Side-by-side statistical comparison of chatbot responses across different providers or configurations.
Browser WebSocket Discovery Automatically discovers WebSocket endpoints by navigating to chat pages with heuristic widget detection, WebSocket capture via CDP, Socket.IO auto-detection, and background token refresh.	Token Refresh Background BullMQ workers keep browser-discovered credentials fresh with Redis Pub/Sub hot-swap — no reconnect needed.	Live Discovery Logs Real-time SSE streaming of browser discovery progress with step-by-step timeline and raw response inspection.

Real-time command center with live sessions, quick execute, and recent activity

Features

Setup & Configuration

Interactive Setup Wizard - 8-step guided configurator with inline connection testing and live session monitoring
Provider Presets - Pre-configured templates for OpenAI, Anthropic, Gemini, Azure, Ollama, custom HTTP/WS/gRPC
Inline Connection Testing - Verify endpoints before committing to a configuration
Centralized Settings - Manage all application configuration from /settings

One-click provider presets for OpenAI, Anthropic, Gemini, Azure, Ollama, and custom endpoints

Scenario Management

Visual Flow Builder - Drag-and-drop editor with message, loop, delay, and conditional steps
12 Pre-built Templates - Stress tests, edge cases, context testing, rapid fire, branching logic, attack surface patterns
Handlebars Templating - Dynamic variable substitution with message index, timestamps, last response, and custom variables
YAML Import/Export - Version-control-friendly scenario format with bulk import

Visual flow builder with message, loop, delay, and conditional steps

Execution Engine

Async Job Queue - BullMQ-powered fire-and-forget execution with configurable worker concurrency
Conversation Context - Stateful session memory with message history, conversation ID tracking, and context windowing
Concurrency Control - Semaphore-based limiting (1–100 parallel sessions)
Rate Limiting - Token bucket algorithm with automatic 429 detection and exponential backoff
Browser WebSocket Discovery
- Heuristic Widget Detection - Three strategies: heuristic (auto-detects widgets using provider patterns for Intercom, Drift, Zendesk, LiveChat, Tawk, HubSpot, Crisp, Tidio, and more + positional matching), selector (direct CSS selector), and steps (ordered browser actions: click, type, wait, evaluate)
- WebSocket Capture via CDP - Chrome DevTools Protocol captures HTTP upgrade headers and frames for connection replay outside the browser
- Socket.IO Auto-Detection - Detects Socket.IO by URL patterns, Engine.IO handshake frames, and frame signal analysis; installs dedicated SocketIOHandler for proper framing and heartbeat
- Token Refresh - BullMQ scheduler with configurable refresh intervals, Redis Pub/Sub notifications for credential hot-swap without disconnecting
- Discovery Caching - Redis-backed cache with configurable TTL, force-fresh bypass, and full credential snapshots (cookies, headers, localStorage, sessionStorage)
Configurable Error Handling - Per-scenario retry policies, timeouts, and error injection for resilience testing
Session Actions - Restart, cancel, or delete sessions mid-flight

Monitoring & Analytics

Live Dashboard - Auto-refreshing widgets for active sessions, completion rate, response time, error rate, and token usage
Chart Visualizations - Response time (line/bar), token distribution (doughnut), error rate trends via Chart.js
Session Replay - Step-through playback with timeline, anomaly highlighting, and per-message metrics
Quality Scoring - Automated relevance, coherence, and completeness assessment
Data Export - CSV and JSON export for metrics and aggregated results

Integrations & Automation

Webhook Notifications - HMAC-SHA256 signed delivery for session.completed and session.failed events with retry
Batch Execution - Run the same scenario against multiple targets in parallel with aggregated results
Cron Scheduling - Standard cron expressions with timezone support for recurring test runs
Plugin System - Extensible architecture with Multi-Step Auth, OpenAI, Anthropic, and Audit plugins

Developer Experience

API Documentation - Built-in Swagger/OpenAPI explorer at /api-docs
Mock Chatbot Server - OpenAI-compatible mock with 5 personas (verbose, XML, ecommerce, support, repetitive)
Command Palette - Cmd+K keyboard shortcuts for power users
Worker Auto-Start - BullMQ workers launch automatically via Next.js instrumentation.ts - no separate process
File-Based Logging - High-performance JSONL format for session data without database bloat
40+ Task Commands - Comprehensive Taskfile for dev, test, build, database, and Docker operations

Quick Start

Prerequisites

Node.js >= 20 | pnpm >= 8 | Docker Desktop

Setup

# 1. Install dependencies
pnpm install

# 2. Start PostgreSQL & Redis
pnpm install -g @go-task/task
task docker:up

# 3. Initialize database
task db:generate && task db:push && task db:seed

# 4. Start dev server (includes workers)
task dev:full

Or run everything at once:

task setup

Access

URL	Description
localhost:3000	Dashboard
localhost:3000/guide	Interactive setup wizard
localhost:3000/api-docs	Swagger API explorer
localhost:3000/api/health	Health check
localhost:8081	Redis Commander

Tech Stack

Layer	Technology
Frontend	Next.js 16.1 (App Router), TypeScript, Tailwind CSS, Chart.js
Backend	Next.js API Routes, Prisma ORM, BullMQ, Zod
Database	PostgreSQL 16
Cache & Queue	Redis 7, BullMQ
Protocols	HTTP/REST (axios), WebSocket (ws), gRPC (grpc-js), SSE (eventsource), Browser WebSocket (Playwright)
Testing	Vitest, Testing Library, Mock Chatbot Server
DevOps	Docker Compose, GitLab CI/CD, Taskfile, Nginx

Architecture

Krawall follows a Next.js App Router architecture with background workers for async job processing. The frontend renders a rich dashboard UI while API routes handle CRUD operations and fire-and-forget execution. BullMQ workers pick up jobs from Redis and execute test scenarios through the connector abstraction layer.

Connector System

All chatbot protocols extend a BaseConnector abstract class with a registry pattern for dynamic protocol resolution:

Protocol	Connector	Features
HTTP/REST	`HTTPConnector`	Request/response templating, auth injection
WebSocket	`WebSocketConnector`	Bidirectional messaging, auto-reconnect
gRPC	`GRPCConnector`	Proto loading, TLS support
SSE	`SSEConnector`	Streaming response handling
Browser WebSocket	`BrowserWebSocketConnector`	Heuristic widget detection, CDP capture, Socket.IO/raw WS, token refresh

Provider Presets

Preset	Provider	Auth
`openai-chat`	OpenAI Chat Completions	Bearer Token
`anthropic-messages`	Anthropic Messages API	Custom Header
`google-gemini`	Google Gemini	API Key
`azure-openai`	Azure OpenAI	API Key
`ollama`	Ollama (local)	None
`custom-http`	Custom HTTP	Configurable
`custom-websocket`	Custom WebSocket	Configurable
`custom-grpc`	Custom gRPC	Configurable

Configurable presets support all authentication methods: Bearer Token, API Key (custom header name), Basic Auth, Custom Headers, and None.

Worker Pipeline

Session Execution - Execute test scenarios with connector lifecycle management
Metrics Aggregation - Compute P50/P95/P99 percentiles from raw session data
Webhook Delivery - HMAC-signed event delivery with exponential backoff retry
Token Refresh - Background credential refresh for browser-discovered WebSocket sessions with Redis Pub/Sub notification

Workers start automatically via instrumentation.ts and shut down gracefully on SIGTERM/SIGINT.

Browser Discovery Pipeline

Krawall can automatically discover and connect to WebSocket-based chatbots embedded in web pages — no manual endpoint configuration required.

How it works:

Playwright launches headless Chromium and navigates to the target page
CDP listener attaches to capture WebSocket upgrade headers
Widget detector locates and activates the chat widget using the configured strategy
WebSocket capture intercepts the resulting connection and collects frames
Protocol detector analyzes frames to determine raw WS vs Socket.IO
Credential extractor harvests cookies, localStorage, and sessionStorage
Result is cached in Redis for reuse within the session TTL

Widget Detection Strategies:

Strategy	Use Case	How It Works
Heuristic	Unknown widgets (default)	Tries hint-derived selectors, then common provider patterns (Intercom, Drift, Zendesk, etc.), then positional matching
Selector	Known implementations	Clicks a user-provided CSS selector directly
Steps	Complex interactions	Executes ordered browser actions (click, type, wait, evaluate)

Supported Protocols:

Protocol	Detection	Features
Raw WebSocket	Default	Direct message relay, auto-reconnect
Socket.IO	URL patterns, handshake analysis, frame signals	Engine.IO heartbeat, namespace support, event framing

Project Structure

krawall/
├── app/                        # Next.js App Router
│   ├── (dashboard)/            # UI routes (dashboard, guide, targets, scenarios, sessions, etc.)
│   ├── api/                    # 20+ API route handlers
│   └── globals.css
├── components/                 # 47 React components
│   ├── ui/                     # 19 design system primitives
│   ├── guide/                  # Setup wizard (8 steps)
│   ├── sessions/               # LogViewer, SessionReplay
│   ├── scenarios/              # FlowBuilder, YamlImportExport
│   └── ...                     # targets, batches, webhooks, jobs, metrics
├── lib/                        # Core business logic
│   ├── connectors/             # HTTP, WebSocket, gRPC, SSE + registry + presets + plugins
│   ├── jobs/                   # BullMQ queue, workers, scheduler
│   ├── metrics/                # MetricsCollector, QualityScorer
│   ├── webhooks/               # Signer, emitter, delivery
│   ├── context/                # ConversationContext (stateful memory)
│   ├── rate-limit/             # Token bucket algorithm
│   └── utils/                  # Encryption (AES-256-GCM), helpers
├── prisma/                     # Schema, migrations, seed
├── tests/                      # 70+ tests (unit + integration)
├── infra/                      # Docker Compose (dev + prod)
├── docs/                       # API.md, DEPLOYMENT.md, MOCK_CHATBOT.md
└── instrumentation.ts          # Worker auto-start hook

Commands

Development

task dev              # Start development server
task dev:full         # Start dev + workers (recommended)
task build            # Production build
task type-check       # TypeScript checking
task lint             # ESLint
task format           # Prettier formatting

Database

task db:generate      # Generate Prisma client
task db:push          # Push schema changes
task db:migrate:dev   # Create migration
task db:seed          # Seed sample data
task db:studio        # Open Prisma Studio

Docker

task docker:up        # Start PostgreSQL + Redis
task docker:down      # Stop services
task docker:logs      # View logs
task docker:clean     # Remove volumes

Testing

task test             # Run tests
task test:watch       # Watch mode
task test:coverage    # With coverage report
task worker:status    # Check queue health

Security

Credential Encryption - AES-256-GCM at rest for all stored secrets
Input Validation - Zod schemas on every API endpoint
SQL Injection Prevention - Prisma parameterized queries
Webhook Signing - HMAC-SHA256 payload verification
Rate Limiting - Token bucket per target
Security Headers - Configured via Next.js middleware

Documentation

Document	Description
API Reference	Complete REST API documentation
Deployment Guide	Production deployment with Docker & Nginx
Mock Chatbot	Mock server configuration and personas
Scenario Templates	Pre-built template documentation
Plugin Development	Writing custom plugins for connectors
Configuration	Environment variables and infrastructure options
Installation	Step-by-step setup guide
Changelog	Implementation history and milestones

Built with Next.js, TypeScript, and Tailwind CSS

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.claude		.claude
app		app
components		components
docker		docker
docs		docs
infra		infra
lib		lib
prisma		prisma
public		public
static		static
tests		tests
.env.example		.env.example
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.prettierrc		.prettierrc
CLAUDE.md		CLAUDE.md
CONFIGURATION.md		CONFIGURATION.md
CUSTOM_ENDPOINTS_DEVELOPMENT.md		CUSTOM_ENDPOINTS_DEVELOPMENT.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
MILESTONES.md		MILESTONES.md
README.md		README.md
Screenshot 2026-02-07 at 09.31.27.png		Screenshot 2026-02-07 at 09.31.27.png
Taskfile.yml		Taskfile.yml
implementation-wss.md		implementation-wss.md
implementation.md		implementation.md
instrumentation.ts		instrumentation.ts
next.config.ts		next.config.ts
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

Krawall

Why Krawall?

Highlights

Multi-Protocol

8 Provider Presets

12 Scenario Templates

Real-Time Metrics

Fire-and-Forget Execution

A/B Comparison

Browser WebSocket Discovery

Token Refresh

Live Discovery Logs

Features

Setup & Configuration

Scenario Management

Execution Engine

Monitoring & Analytics

Integrations & Automation

Developer Experience

Quick Start

Prerequisites

Setup

Access

Tech Stack

Architecture

Connector System

Provider Presets

Worker Pipeline

Browser Discovery Pipeline

Commands

Security

Documentation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages