mcp-check

⚠️ It's not usable yet, but docs and a stable release are coming in the next few weeks. Stay tuned.

The TypeScript MCP testing library.

A TypeScript library for testing MCP (Model Context Protocol) servers with AI models. This library allows you to execute prompts against MCP servers using various AI models (Claude, GPT) and verify tool usage and results with comprehensive streaming support and chunk handling.

Installation

npm install mcp-check
# or
pnpm add mcp-check

Quick Start

import { client, McpServer } from "mcp-check";

// Configure your MCP server
const mcpServer = new McpServer({
  url: "https://example.com/api/mcp",
  authorizationToken: process.env.MCP_TOKEN!,
  name: "example-server",
  type: "url",
});

// Execute a prompt with multiple AI models
const result = await client(mcpServer, ["claude-3-haiku-20240307", "gpt-4"])
  .prompt("What tools are available and how do they work?");

// Get comprehensive results
const executionResult = result.getExecutionResult();
console.log("Execution time:", executionResult.summary.executionTime);
console.log("Successful models:", result.getSuccessfulAgents());
console.log("Common tools used:", executionResult.summary.commonTools);

// Check specific model responses
const claudeResponse = result.getResponse("claude-3-haiku-20240307");
console.log("Claude content:", claudeResponse?.content);
console.log("Tools used by Claude:", claudeResponse?.usedTools);

API Reference

McpServer

Configure your MCP server connection:

const mcpServer = new McpServer({
  url: string,              // MCP server URL
  authorizationToken?: string, // Optional authorization token  
  name: string,             // Server name
  type: string,             // Server type (e.g., "url")
});

client(mcpServer, models, scorers?, config?)

Create a client instance to execute prompts:

const result = await client(mcpServer, ["claude-3-haiku-20240307", "gpt-4"], {
  silent: true, // Suppress console output
  anthropicApiKey: process.env.ANTHROPIC_API_KEY,
  openaiApiKey: process.env.OPENAI_API_KEY,
  chunkHandlers: {
    onTextDelta: (data) => console.log("Text:", data.text),
    onToolCallStart: (data) => console.log("Tool started:", data.toolName),
    onError: (data) => console.error("Error:", data.error)
  }
})
  .scorers([
    {
      name: "contains id",
      tool: "list_branches",
      scorer: ({ output }) => {
        try {
          const branches = JSON.parse(output[0]?.text);
          return branches.some(branch => branch.id) ? 1 : 0;
        } catch {
          return 0;
        }
      }
    }
  ])
  .prompt("Your prompt here");

Parameters:

mcpServer: Configured MCP server instance
models: Array of AI model names to use
scorers?: Optional array of Scorer instances for tool evaluation
config?: Optional configuration including API keys, silent mode, and chunk handlers

Supported Models:

Claude models: claude-3-haiku-20240307, claude-3-5-sonnet-20240620, etc.
OpenAI models: gpt-4, gpt-3.5-turbo, etc.

Agent Methods

`.prompt(text: string)`

Execute the prompt against the MCP server and return results. This method automatically executes the prompt.

`.scorers(scorers: Array<{name: string; tool: string; scorer: Function}>)`

Configure scorers to evaluate tool call results:

.scorers([
  {
    name: "contains_data",
    tool: "fetch_data", 
    scorer: ({ output, input }) => {
      return output?.data ? 1 : 0;
    }
  }
])

`.allowTools(tools: string[])`

Restrict which tools can be used by the models.

Result Methods

The prompt() method returns an AgentsResult object with these methods:

`getResponse(model)`

Get the response for a specific model:

const response = result.getResponse("claude-3-haiku-20240307");
console.log(response?.content, response?.usedTools);

`getAllResponses()`

Get all model responses as a record.

`getExecutionResult()`

Get comprehensive execution statistics:

const execution = result.getExecutionResult();
console.log("Total models:", execution.summary.totalModels);
console.log("Successful:", execution.summary.successfulExecutions);
console.log("Common tools:", execution.summary.commonTools);
console.log("Execution time:", execution.summary.executionTime);

`getToolStats()`

Get detailed tool usage statistics:

const stats = result.getToolStats();
stats.forEach(stat => {
  console.log(`${stat.toolName}: ${stat.callCount} calls`);
});

`getContent(model)`

Get the content from a specific model.

`getUsedTools(model)`

Get tools used by a specific model.

`hasUsedTool(model, tool)`

Check if a model used a specific tool.

`getToolCallCount(model, tool)`

Get the number of times a tool was called by a model.

`getSuccessfulAgents()`

Get list of models that executed successfully.

`getFailedAgents()`

Get list of models that failed to execute.

`getScores(model)`

Get evaluation scores for a specific model's tool calls:

const scores = result.getScores("claude-3-haiku-20240307");
scores.forEach(score => {
  console.log(`${score.name}: ${score.score} for tool ${score.tool}`);
});

Scorer System

The scorer system allows you to evaluate and validate tool call results automatically:

const result = await client(mcpServer, ["claude-3-haiku-20240307"])
  .scorers([
    {
      name: "valid_branches",
      tool: "list_branches",
      scorer: ({ output, input }) => {
        try {
          const branches = JSON.parse(output[0]?.text);
          return branches.every(b => b.id && b.name) ? 1 : 0;
        } catch {
          return 0;
        }
      }
    },
    {
      name: "has_results",
      tool: "search_content", 
      scorer: ({ output }) => {
        return output?.results?.length > 0 ? 1 : 0;
      }
    }
  ])
  .prompt("List all branches and search for content");

// Get scores for evaluation
const scores = result.getScores("claude-3-haiku-20240307");
console.log("Evaluation results:", scores);

Scorer functions receive:

output: The tool's result/response
input: The tool's input arguments

Return a number (typically 0-1) representing the evaluation score.

Testing Example

import { client, McpServer } from "mcp-check";

const mcpServer = new McpServer({
  url: "https://example.com/api/mcp",
  authorizationToken: process.env.MCP_TOKEN!,
  name: "example-server", 
  type: "url",
});

describe("MCP Server Tests", () => {
  test("should use expected tools across multiple models", async () => {
    const result = await client(mcpServer, ["claude-3-haiku-20240307", "gpt-4"])
      .scorers([
        {
          name: "tool_success",
          tool: "update_blocks",
          scorer: ({ output }) => output?.success ? 1 : 0
        }
      ])
      .prompt("Update the content using the available tools.");

    // Verify execution summary
    const execution = result.getExecutionResult();
    expect(execution.summary.totalModels).toBe(2);
    expect(execution.summary.successfulExecutions).toBe(2);
    expect(execution.summary.commonTools).toContain("query_content");

    // Verify specific model responses
    const claudeResponse = result.getResponse("claude-3-haiku-20240307");
    expect(claudeResponse?.usedTools).toEqual(
      expect.arrayContaining(["query_content", "update_blocks"])
    );

    const gptResponse = result.getResponse("gpt-4");
    expect(gptResponse?.usedTools).toEqual(
      expect.arrayContaining(["query_content", "update_blocks"])
    );

    // Verify tool call details
    expect(result.hasUsedTool("claude-3-haiku-20240307", "query_content")).toBe(true);
    expect(result.getToolCallCount("claude-3-haiku-20240307", "update_blocks")).toBeGreaterThan(0);

    // Verify successful agents
    expect(result.getSuccessfulAgents()).toContain("claude-3-haiku-20240307");
    expect(result.getSuccessfulAgents()).toContain("gpt-4");
  }, 90000);
});

Environment Variables

Set the following environment variables for AI model authentication:

ANTHROPIC_API_KEY=your_anthropic_key_here
OPENAI_API_KEY=your_openai_key_here
MCP_TOKEN=your_mcp_server_token_here

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
.changeset		.changeset
.github/workflows		.github/workflows
packages		packages
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.json		tsconfig.json
turbo.json		turbo.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

mcp-check

Installation

Quick Start

API Reference

McpServer

client(mcpServer, models, scorers?, config?)

Agent Methods

`.prompt(text: string)`

`.scorers(scorers: Array<{name: string; tool: string; scorer: Function}>)`

`.allowTools(tools: string[])`

Result Methods

`getResponse(model)`

`getAllResponses()`

`getExecutionResult()`

`getToolStats()`

`getContent(model)`

`getUsedTools(model)`

`hasUsedTool(model, tool)`

`getToolCallCount(model, tool)`

`getSuccessfulAgents()`

`getFailedAgents()`

`getScores(model)`

Scorer System

Testing Example

Environment Variables

About

Uh oh!

Releases 15

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

runbasehq/mcp-check

Folders and files

Latest commit

History

Repository files navigation

mcp-check

Installation

Quick Start

API Reference

McpServer

client(mcpServer, models, scorers?, config?)

Agent Methods

.prompt(text: string)

.scorers(scorers: Array<{name: string; tool: string; scorer: Function}>)

.allowTools(tools: string[])

Result Methods

getResponse(model)

getAllResponses()

getExecutionResult()

getToolStats()

getContent(model)

getUsedTools(model)

hasUsedTool(model, tool)

getToolCallCount(model, tool)

getSuccessfulAgents()

getFailedAgents()

getScores(model)

Scorer System

Testing Example

Environment Variables

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 15

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

`.prompt(text: string)`

`.scorers(scorers: Array<{name: string; tool: string; scorer: Function}>)`

`.allowTools(tools: string[])`

`getResponse(model)`

`getAllResponses()`

`getExecutionResult()`

`getToolStats()`

`getContent(model)`

`getUsedTools(model)`

`hasUsedTool(model, tool)`

`getToolCallCount(model, tool)`

`getSuccessfulAgents()`

`getFailedAgents()`

`getScores(model)`

Packages