-
Notifications
You must be signed in to change notification settings - Fork 1
feat: new agent #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed this PR for LLM security vulnerabilities and found several high-severity issues related to the new Dropbox MCP tool integrations. The main concerns are prompt injection risks with unrestricted file access, overly permissive agent instructions, and a guardrails gap in the new multi-turn conversation feature.
Minimum severity threshold for this scan: High
| }) => { | ||
| // TODO: Unimplemented | ||
| // Mock implementation - returns sample retention offers | ||
| return { | ||
| offers: [ | ||
| { | ||
| type: "discount", | ||
| description: "20% off for 12 months", | ||
| monthly_savings: 15, | ||
| }, | ||
| { | ||
| type: "upgrade", | ||
| description: "Free upgrade to premium plan for 3 months", | ||
| value: 45, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟠 High
The Return Agent has unrestricted read access to Dropbox with requireApproval: "never", making it vulnerable to prompt injection attacks. An attacker could manipulate the agent through crafted user messages to search for and exfiltrate sensitive files beyond return policy documents (e.g., HR records, financial data, customer information).
💡 Suggested Fix
Enable approval gates and restrict tool capabilities to follow the principle of least privilege:
const mcp = hostedMcpTool({
serverLabel: "dropbox",
connectorId: "connector_dropbox",
serverDescription: "Return Policy Knowledge",
allowedTools: [
"fetch_file", // Only allow fetching specific files
// Remove: "search", "search_files", "list_recent_files", "fetch", "get_profile"
],
requireApproval: "always", // Changed from "never"
});Also update the Return Agent instructions (line 206) to specify exact file paths rather than vague "check dropbox" guidance.
🤖 AI Agent Prompt
The Return Agent at agent.ts:204-221 uses an MCP tool configured at agent.ts:32-44 that grants unrestricted Dropbox read access with no approval gates (requireApproval: "never"). This creates a prompt injection vulnerability where attackers can manipulate the agent to access arbitrary files.
Investigate the hostedMcpTool SDK documentation to determine what granular access controls are available (path restrictions, directory scoping, etc.). The goal is to:
- Change
requireApprovalfrom "never" to "always" - Reduce
allowedToolsto only what's strictly necessary (likely justfetch_file) - Restrict file access to specific paths if the SDK supports it (e.g., only
/policies/returns/directory) - Update the Return Agent's instructions to reference specific file paths rather than broad "check dropbox" guidance
Consider whether the Dropbox connector configuration allows folder-level access restrictions, or if separate connectors should be created for different agents.
| Always check dropbox to give the accurate answers | ||
| Check all files first that you can see, use any files available to your access. | ||
| You can use all files but really only reference customer QA related content, nothing else`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟠 High
The Information Agent's instructions actively encourage broad file access ("Check all files first that you can see, use any files available to your access"), making it highly vulnerable to prompt injection. Combined with the same unrestricted Dropbox access as the Return Agent, this creates an easy path for attackers to exfiltrate sensitive data. The soft instruction to "really only reference customer QA related content" provides no technical enforcement.
💡 Suggested Fix
Rewrite the instructions to specify exact file paths and remove dangerous language encouraging broad access:
instructions: `You are an information agent for answering customer service questions.
Reference the customer FAQ document at /knowledge-base/customer-qa/faq.pdf in Dropbox.
Only access and reference information from this specific document.
Do not access any other files or directories in Dropbox.
If the FAQ doesn't contain the answer, inform the user you need to escalate to a human agent.`Also apply the same MCP tool restrictions as the Return Agent (change requireApproval to "always", reduce allowed tools).
🤖 AI Agent Prompt
The Information Agent at agent.ts:232-249 has instructions that explicitly encourage accessing all available files ("Check all files first that you can see, use any files available to your access"). This is paired with the same unrestricted Dropbox MCP tool access (mcp1 at lines 46-59) that has requireApproval: "never".
Your task is to:
- Rewrite the agent instructions to reference specific document paths rather than encouraging broad file access
- Remove the dangerous "check all files" and "use any files available" language
- Apply the same MCP tool restrictions as needed for the Return Agent (see previous issue)
- Consider whether the current soft instruction ("really only reference customer QA related content") can be technically enforced through tool configuration
The key issue is that relying on the LLM to self-limit access is insufficient - prompt injection can easily override these soft instructions. Technical controls are needed.
|
|
||
| // Extract text for guardrails check - use last user message | ||
| const lastUserMsg = conversationHistory | ||
| .filter((m) => "role" in m && m.role === "user") | ||
| .pop(); | ||
| const guardrailsInputtext = | ||
| lastUserMsg && | ||
| "content" in lastUserMsg && | ||
| Array.isArray(lastUserMsg.content) && | ||
| lastUserMsg.content[0] && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟠 High
The guardrails only check the last user message in multi-turn conversations, not the full conversation history. This allows attackers to spread prompt injection payloads across multiple turns - earlier malicious turns that passed guardrails individually won't be re-checked, but their context remains active when the agent processes later turns.
💡 Suggested Fix
Modify the guardrails to validate all user messages, not just the last one:
// Extract ALL user messages for guardrails check
const allUserMessages = conversationHistory
.filter((m) => "role" in m && m.role === "user")
.map((m) => {
if ("content" in m && Array.isArray(m.content) &&
m.content[0] && "text" in m.content[0]) {
return m.content[0].text;
}
return "";
})
.filter(text => text.length > 0);
// Concatenate all user messages for guardrails
const guardrailsInputtext = allUserMessages.join("\n\n---\n\n");Consider adding conversation length limits to bound the attack surface.
🤖 AI Agent Prompt
At agent.ts:344-353, the code extracts only the last user message for guardrails checking using .pop(), but then passes the full conversationHistory to agents. This creates a bypass where attackers can inject malicious instructions in earlier conversation turns that passed guardrails, then reference them in later turns.
Investigate the appropriate balance between security and performance:
- Check all user messages in the conversation (most secure but potentially expensive)
- Check the last N messages (e.g., last 3 turns - balanced approach)
- Implement caching of guardrails results from previous turns to avoid redundant checks
You'll need to modify the text extraction logic to collect multiple messages and potentially adjust how they're concatenated before being passed to runGuardrails(). Also consider whether conversation length limits should be added to bound the attack surface.
| ], | ||
| }; | ||
| }, | ||
| }); | ||
|
|
||
| const mcp = hostedMcpTool({ | ||
| serverLabel: "dropbox", | ||
| connectorId: "connector_dropbox", | ||
| serverDescription: "Return Policy Knowledge", | ||
| allowedTools: [ | ||
| "fetch", | ||
| "fetch_file", | ||
| "get_profile", | ||
| "list_recent_files", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟠 High
The Information Agent's MCP tool has the same excessive agency issues as the Return Agent - unrestricted file access with broad capabilities (search, list_recent_files, fetch, fetch_file, get_profile, search_files) and requireApproval: "never". This violates the principle of least privilege and amplifies the impact of any prompt injection attack.
💡 Suggested Fix
Apply the same restrictions as recommended for the Return Agent's MCP tool:
const mcp1 = hostedMcpTool({
serverLabel: "dropbox",
connectorId: "connector_dropbox",
serverDescription: "Knowledge Base",
allowedTools: [
"fetch_file", // Only allow fetching specific files
],
requireApproval: "always", // Changed from "never"
});Remove search and listing capabilities that enable reconnaissance. The agent should only be able to fetch specific known documents.
🤖 AI Agent Prompt
The mcp1 tool at agent.ts:46-59 used by the Information Agent has the same excessive agency issues as the mcp tool used by the Return Agent. Apply similar restrictions here.
Your approach should be:
- Change
requireApprovalfrom "never" to "always" - Reduce
allowedToolsto the minimum necessary (likely justfetch_file) - Investigate whether path-based restrictions can be configured
- Consider whether this tool should use a separate Dropbox connector with access only to the knowledge-base folder
The goal is to ensure that even if the Information Agent is compromised via prompt injection (see previous vulnerabilities), the attacker's ability to access sensitive files is limited by technical controls, not just by agent instructions.
No description provided.