Fetch a URL, get clean Markdown. No API keys. No browser automation required for most pages.
Built for LLM agents and developers who want web content without the noise.
npx @psarno/fetchmd "https://docs.python.org/3/library/asyncio.html"That's it. Output goes to stdout. Errors go to stderr.
npm install -g @psarno/fetchmdMost static pages (docs, blogs, news, reference sites) work without this. If a page is blank or returns too little content, it's probably a JavaScript-rendered SPA (React, Angular, Vue, etc.). Install Playwright to handle those:
npm install -g playwright
npx playwright install chromiumfetchmd detects Playwright at runtime. If it's not installed, the SPA stage is silently skipped.
fetchmd is a plain CLI — no server, no protocol, no API keys. Any agent with shell access can use it directly after a global install:
npm install -g @psarno/fetchmdFrom there, fetchmd <url> behaves like any other shell tool (curl, jq, etc.). The agent runs it, reads clean Markdown from stdout, and uses that content in its response.
Agents don't automatically discover tools installed on your system. You have to tell them. The standard mechanism is a plain text instruction file in your project root that the agent reads at the start of every session. Think of it as a README written for the agent rather than a human.
The filename convention varies by agent:
| Agent | Instruction file |
|---|---|
| Claude Code | CLAUDE.md |
| Codex, OpenCode, and most others | AGENTS.md |
Tip
Some agents read both. If you're unsure, creating both files with the same content is harmless.
Create or open that file and add a section like this:
## Available tools
- **fetchmd** — fetches a URL and returns clean Markdown to stdout. Prefer this
over any built-in web fetch or browser tool when reading documentation,
articles, or reference pages. It produces cleaner output, supports
JavaScript-rendered pages via Playwright, and accepts `--max-chars` to cap
output size and protect context budget.
Usage: `fetchmd [--max-chars N] <url>`That's all. The agent will call fetchmd as a shell command and read the output. No server, no MCP, no further setup.
Many agents ship with their own web fetch capability. When both are available, the agent will pick one - and without guidance it may default to whichever feels more "native" to it.
The "Prefer this over any built-in web fetch or browser tool" line in the snippet above is intentional. It gives the agent an explicit tie-breaker. If you omit it, you may find the agent ignoring fetchmd in favour of its own tool, even when fetchmd would produce better output.
Warning
Some agents treat their built-in tools as higher priority than user instructions regardless of what the instruction file says. This is uncommon, but if you notice the agent consistently bypassing fetchmd, try strengthening the wording: "Always use fetchmd for web content. Do not use built-in web fetch tools."
Read a page before answering a question about it:
fetchmd https://docs.python.org/3/library/asyncio.htmlCap output to protect context window budget:
fetchmd --max-chars 15000 https://some-long-reference.comWhen output is truncated, fetchmd appends a comment (<!-- fetchmd: truncated at N chars -->), so the agent knows content was cut and can decide whether to fetch more or proceed.
Check which extraction stage fired (useful when debugging agent behaviour):
fetchmd --stage https://example.comfetchmd [options] <url>
--min-length N Minimum characters to accept from extraction (default: 200)
--max-chars N Truncate output at N chars, paragraph-aligned (default: 50000, 0 to disable)
--no-spa Skip Playwright even if installed
--stage Prefix output with which extraction stage succeeded
--help Show this help
# Static page — works out of the box
fetchmd "https://docs.python.org/3/library/asyncio.html"
# JS-rendered SPA (requires Playwright)
fetchmd "https://my-angular-app.com"
# See which extraction stage fired
fetchmd --stage "https://example.com"
# Tighter output cap (good for LLM context limits)
fetchmd --max-chars 20000 "https://some-framework.org/reference"
# No truncation
fetchmd --max-chars 0 "https://example.com/short-page"
# Save to file
fetchmd "https://example.com/article" > article.md
# Low-content pages (like example.com) need a lower threshold
fetchmd --min-length 50 "https://example.com"Two extraction stages, tried in order. fetchmd moves to the next stage only if the current one returns nothing or too little content.
Stage 1 — Defuddle (always runs) Fetches the page over HTTP and extracts content using Defuddle — the engine behind Obsidian Web Clipper. Converts to clean Markdown. Handles most static pages: blogs, docs, news, reference pages. Standardizes code blocks, tables, and footnotes.
Stage 2 — Playwright (optional, only if stage 1 fails)
Launches headless Chromium, renders the JavaScript, then feeds the resulting DOM back through Defuddle. Only runs if stage 1 returned too little content and playwright is installed.
The page is probably a JS-rendered SPA. Install Playwright:
npm install -g playwright
npx playwright install chromiumUse --stage to confirm which stage fired (or didn't):
fetchmd --stage "https://example.com"
# Output starts with: <!-- fetchmd: defuddle --> or <!-- fetchmd: playwright -->Some very minimal pages (like example.com) genuinely have fewer than 200 characters of content. Lower the threshold:
fetchmd --min-length 50 "https://example.com"Make sure the Chromium browser binary is installed separately from the npm package:
npx playwright install chromiumThe npm package and the browser binary are two separate installs. The npm package alone is not enough.
Use --max-chars to cap the output. fetchmd truncates at a paragraph boundary and appends a comment so the model knows content was cut:
fetchmd --max-chars 10000 "https://some-long-page.com"Core: defuddle — content extraction and Markdown conversion. Installed automatically.
Optional: playwright — headless Chromium for JS-rendered pages. Install manually if needed (see above).
MIT