Skip to content

Conversation

@MrFlounder
Copy link
Contributor

No description provided.

Copy link

@promptfoo-scanner promptfoo-scanner bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this PR for LLM security vulnerabilities and found several high-severity issues related to the new Dropbox MCP tool integrations. The main concerns are prompt injection risks with unrestricted file access, overly permissive agent instructions, and a guardrails gap in the new multi-turn conversation feature.

Minimum severity threshold for this scan: High

Comment on lines 32 to +44
}) => {
// TODO: Unimplemented
// Mock implementation - returns sample retention offers
return {
offers: [
{
type: "discount",
description: "20% off for 12 months",
monthly_savings: 15,
},
{
type: "upgrade",
description: "Free upgrade to premium plan for 3 months",
value: 45,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 High

The Return Agent has unrestricted read access to Dropbox with requireApproval: "never", making it vulnerable to prompt injection attacks. An attacker could manipulate the agent through crafted user messages to search for and exfiltrate sensitive files beyond return policy documents (e.g., HR records, financial data, customer information).

💡 Suggested Fix

Enable approval gates and restrict tool capabilities to follow the principle of least privilege:

const mcp = hostedMcpTool({
  serverLabel: "dropbox",
  connectorId: "connector_dropbox",
  serverDescription: "Return Policy Knowledge",
  allowedTools: [
    "fetch_file",  // Only allow fetching specific files
    // Remove: "search", "search_files", "list_recent_files", "fetch", "get_profile"
  ],
  requireApproval: "always",  // Changed from "never"
});

Also update the Return Agent instructions (line 206) to specify exact file paths rather than vague "check dropbox" guidance.

🤖 AI Agent Prompt

The Return Agent at agent.ts:204-221 uses an MCP tool configured at agent.ts:32-44 that grants unrestricted Dropbox read access with no approval gates (requireApproval: "never"). This creates a prompt injection vulnerability where attackers can manipulate the agent to access arbitrary files.

Investigate the hostedMcpTool SDK documentation to determine what granular access controls are available (path restrictions, directory scoping, etc.). The goal is to:

  1. Change requireApproval from "never" to "always"
  2. Reduce allowedTools to only what's strictly necessary (likely just fetch_file)
  3. Restrict file access to specific paths if the SDK supports it (e.g., only /policies/returns/ directory)
  4. Update the Return Agent's instructions to reference specific file paths rather than broad "check dropbox" guidance

Consider whether the Dropbox connector configuration allows folder-level access restrictions, or if separate connectors should be created for different agents.

Comment on lines +237 to +241
Always check dropbox to give the accurate answers
Check all files first that you can see, use any files available to your access.
You can use all files but really only reference customer QA related content, nothing else`,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 High

The Information Agent's instructions actively encourage broad file access ("Check all files first that you can see, use any files available to your access"), making it highly vulnerable to prompt injection. Combined with the same unrestricted Dropbox access as the Return Agent, this creates an easy path for attackers to exfiltrate sensitive data. The soft instruction to "really only reference customer QA related content" provides no technical enforcement.

💡 Suggested Fix

Rewrite the instructions to specify exact file paths and remove dangerous language encouraging broad access:

instructions: `You are an information agent for answering customer service questions.

Reference the customer FAQ document at /knowledge-base/customer-qa/faq.pdf in Dropbox.
Only access and reference information from this specific document.
Do not access any other files or directories in Dropbox.
If the FAQ doesn't contain the answer, inform the user you need to escalate to a human agent.`

Also apply the same MCP tool restrictions as the Return Agent (change requireApproval to "always", reduce allowed tools).

🤖 AI Agent Prompt

The Information Agent at agent.ts:232-249 has instructions that explicitly encourage accessing all available files ("Check all files first that you can see, use any files available to your access"). This is paired with the same unrestricted Dropbox MCP tool access (mcp1 at lines 46-59) that has requireApproval: "never".

Your task is to:

  1. Rewrite the agent instructions to reference specific document paths rather than encouraging broad file access
  2. Remove the dangerous "check all files" and "use any files available" language
  3. Apply the same MCP tool restrictions as needed for the Return Agent (see previous issue)
  4. Consider whether the current soft instruction ("really only reference customer QA related content") can be technically enforced through tool configuration

The key issue is that relying on the LLM to self-limit access is insufficient - prompt injection can easily override these soft instructions. Technical controls are needed.

Comment on lines +344 to +353

// Extract text for guardrails check - use last user message
const lastUserMsg = conversationHistory
.filter((m) => "role" in m && m.role === "user")
.pop();
const guardrailsInputtext =
lastUserMsg &&
"content" in lastUserMsg &&
Array.isArray(lastUserMsg.content) &&
lastUserMsg.content[0] &&

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 High

The guardrails only check the last user message in multi-turn conversations, not the full conversation history. This allows attackers to spread prompt injection payloads across multiple turns - earlier malicious turns that passed guardrails individually won't be re-checked, but their context remains active when the agent processes later turns.

💡 Suggested Fix

Modify the guardrails to validate all user messages, not just the last one:

// Extract ALL user messages for guardrails check
const allUserMessages = conversationHistory
  .filter((m) => "role" in m && m.role === "user")
  .map((m) => {
    if ("content" in m && Array.isArray(m.content) &&
        m.content[0] && "text" in m.content[0]) {
      return m.content[0].text;
    }
    return "";
  })
  .filter(text => text.length > 0);

// Concatenate all user messages for guardrails
const guardrailsInputtext = allUserMessages.join("\n\n---\n\n");

Consider adding conversation length limits to bound the attack surface.

🤖 AI Agent Prompt

At agent.ts:344-353, the code extracts only the last user message for guardrails checking using .pop(), but then passes the full conversationHistory to agents. This creates a bypass where attackers can inject malicious instructions in earlier conversation turns that passed guardrails, then reference them in later turns.

Investigate the appropriate balance between security and performance:

  1. Check all user messages in the conversation (most secure but potentially expensive)
  2. Check the last N messages (e.g., last 3 turns - balanced approach)
  3. Implement caching of guardrails results from previous turns to avoid redundant checks

You'll need to modify the text extraction logic to collect multiple messages and potentially adjust how they're concatenated before being passed to runGuardrails(). Also consider whether conversation length limits should be added to bound the attack surface.

Comment on lines +46 to +59
],
};
},
});

const mcp = hostedMcpTool({
serverLabel: "dropbox",
connectorId: "connector_dropbox",
serverDescription: "Return Policy Knowledge",
allowedTools: [
"fetch",
"fetch_file",
"get_profile",
"list_recent_files",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 High

The Information Agent's MCP tool has the same excessive agency issues as the Return Agent - unrestricted file access with broad capabilities (search, list_recent_files, fetch, fetch_file, get_profile, search_files) and requireApproval: "never". This violates the principle of least privilege and amplifies the impact of any prompt injection attack.

💡 Suggested Fix

Apply the same restrictions as recommended for the Return Agent's MCP tool:

const mcp1 = hostedMcpTool({
  serverLabel: "dropbox",
  connectorId: "connector_dropbox",
  serverDescription: "Knowledge Base",
  allowedTools: [
    "fetch_file",  // Only allow fetching specific files
  ],
  requireApproval: "always",  // Changed from "never"
});

Remove search and listing capabilities that enable reconnaissance. The agent should only be able to fetch specific known documents.

🤖 AI Agent Prompt

The mcp1 tool at agent.ts:46-59 used by the Information Agent has the same excessive agency issues as the mcp tool used by the Return Agent. Apply similar restrictions here.

Your approach should be:

  1. Change requireApproval from "never" to "always"
  2. Reduce allowedTools to the minimum necessary (likely just fetch_file)
  3. Investigate whether path-based restrictions can be configured
  4. Consider whether this tool should use a separate Dropbox connector with access only to the knowledge-base folder

The goal is to ensure that even if the Information Agent is compromised via prompt injection (see previous vulnerabilities), the attacker's ability to access sensitive files is limited by technical controls, not just by agent instructions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants