Skip to content

Feature/feature web fetch#42

Merged
Aas-ee merged 2 commits intomainfrom
feature/feature-web-fetch
Mar 7, 2026
Merged

Feature/feature web fetch#42
Aas-ee merged 2 commits intomainfrom
feature/feature-web-fetch

Conversation

@Aas-ee
Copy link
Owner

@Aas-ee Aas-ee commented Mar 7, 2026

close #19

Aas-ee added 2 commits March 7, 2026 23:51
- Implement fetchWebContent function to scrape HTTP(S) link content
- Support automatic identification and parsing of Markdown files and regular web pages
- Integrate Cheerio library for HTML content extraction and cleaning
- Add content length limit and truncation functionality
- Support proxy configuration and HTTPS proxy
- Add web page title and metadata extraction functionality
- Implement content fallback mechanism for SPA pages
- Register the fetchWebContent tool in tool settings
- Update README document to add new feature descriptions
- Add support for unit tests and integration tests
- Configure test scripts for content fetching validation
- Added normalizeEngineName function to handle different client representations of engine names
- Implemented standardization for engine names such as Bing, DuckDuckGo, linux.do, etc.
- Integrated engine name conversion logic into Zod validation mode
- Add dedicated engine normalization test files and test cases
- Add new test command test:engine-normalization in package.json
@Aas-ee
Copy link
Owner Author

Aas-ee commented Mar 7, 2026

And adding general web page content scraping functionality

@Aas-ee Aas-ee merged commit b7c8d0b into main Mar 7, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d1a702eff4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const validateWebUrl = (url: string): boolean => {
try {
const urlObj = new URL(url);
return urlObj.protocol === 'http:' || urlObj.protocol === 'https:';

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Block private-network targets in web URL validation

fetchWebContent is exposed to MCP clients as a “public HTTP(S)” fetch tool, but this validator only checks the scheme and therefore accepts hosts like 127.0.0.1, localhost, RFC1918 ranges, or cloud metadata endpoints (for example 169.254.169.254). In deployments where the server can reach internal services, this becomes an SSRF primitive that lets callers read internal-only resources through the tool.

Useful? React with 👍 / 👎.

requestOptions.httpsAgent = proxyAgent;
}

const response = await axios.get(parsedUrl.toString(), requestOptions);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce response size limits before fetching page bodies

This request fetches arbitrary URLs with responseType: 'text' but does not set any download/body limits, so Axios will buffer the full response in memory before the later maxChars truncation is applied. A large file or intentionally oversized response can exhaust memory or stall the process even when callers request a small maxChars value.

Useful? React with 👍 / 👎.

@Aas-ee Aas-ee deleted the feature/feature-web-fetch branch March 7, 2026 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pb of mapping with MPCO from openwebui

1 participant