---
name: url-to-markdown
description: Convert a public web URL into clean markdown suitable for LLM context. Use when the user asks to read an article, summarize a webpage, compare sources, or ingest HTML content into a conversation. Strips scripts, styles, navigation, and boilerplate automatically.
version: 0.1.0
license: MIT
homepage: https://url2md.automators.work
---

# url-to-markdown

Fetch a public URL through the url2md service and receive clean markdown — article-extracted by default, full-page if requested.

## When to use

- User pastes a URL and asks to read, summarize, translate, quote, or analyze its content.
- User wants to compare or aggregate content from multiple URLs.
- Agent needs to ingest HTML content but its own fetch tool returns unprocessed markup.

Do NOT use this skill for:
- Private, authenticated, or intranet URLs — the service only fetches public http(s) pages.
- Non-HTML resources (PDFs, images, binary downloads). The API returns `415`.
- Real-time or frequently-changing data (prices, scores). Results are cached 10 min.

## When to prefer this over alternatives

- **Over raw `fetch` / `curl`**: when you need only the article text, not navigation, ads, cookie banners, or scripts. Readability extraction removes boilerplate that wastes context tokens.
- **Over a headless browser**: when the target is a static article and JS rendering is not required. This is orders of magnitude cheaper and faster.
- **Over pasting HTML directly into context**: when the page is large. Markdown is roughly 3–5× smaller in tokens than equivalent HTML.

If the page is JS-heavy and the article content only appears after hydration, this skill will return limited content. Use a browser-based fetch tool in that case.

## Base URL

```
https://url2md.automators.work
```

## Build the request

```
GET {base}/md?url={urlEncoded}[&raw=1]
```

- `url` (required): the public page to convert. URL-encode it.
- `raw=1` (optional): skip Readability extraction and convert the whole page body. Use when default extraction returns 422 or misses the content you need.

## Examples

| Intent | Request |
|---|---|
| Summarize a blog post | `GET /md?url=https%3A%2F%2Fblog.example%2Fpost` |
| Fetch a Wikipedia article | `GET /md?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FUnicode` |
| Convert a page that extraction fails on | `GET /md?url=...&raw=1` |

## Response shape

The body begins with:

```
# {page title}

> Source: {original url}

{markdown content}
```

Always pass the full body — including the `Source:` line — into the LLM context. It preserves provenance.

## Error handling

| Status | Meaning | Agent action |
|---|---|---|
| 400 | invalid URL / disallowed scheme | re-prompt the user for a valid public URL |
| 413 | upstream > 5 MB | pick a shorter page or scrape a specific section |
| 415 | upstream is not HTML | fetch the resource differently (e.g. PDF parser) |
| 422 | could not extract | retry once with `&raw=1` |
| 502 | upstream unreachable | retry once; if still failing, report to user |

## Constraints

- GET only. Stateless. No auth.
- CORS: `*`.
- Cache: 10 min per URL.
- Response cap: 5 MB. Timeout: 10 s.
- Do not call with private/loopback hosts — blocked.
