Fetch
Fetch raw HTML from URLs through a scraping-proxy provider (ScrapeOps, ScraperAPI, ScrapingBee, Zyte). Distinct from `scrape`.
marmot fetch <url> [<url> …] [flags…]fetch is the proxy-aggregator verb. Given a URL, it issues the request through an unblock/proxy service and returns the raw HTML the target served. It is distinct from scrape (which returns clean markdown from a content-extraction provider): fetch puts you closer to the wire, with toggles for JS rendering, country targeting, and residential proxies, then optionally formats the result via --format markdown | text | next.
When to use fetch vs. scrape
fetchwhen you want raw HTML, when the page is bot-protected, or when you need to control geo / device / JS rendering. Providers:scrapeops,scraperapi,scrapingbee,zyte.scrapewhen you want clean markdown extracted by a content-aware provider. Providers:exa,firecrawl,parallel,tavily.
Providers
scrapeops, scraperapi, scrapingbee, zyte.
| Provider | Auth | Strengths |
|---|---|---|
| ScrapeOps | SCRAPEOPS_API_KEY (query param) | Aggregates 30+ proxy backends; 1,000 free credits/month at signup |
| ScraperAPI | SCRAPERAPI_API_KEY (query param) | Strong JS rendering via real Chromium; 5,000 free credits/month |
| ScrapingBee | SCRAPINGBEE_API_KEY (query param) | Residential proxies; defaults to render_js=true (marmot opts out by default) |
| Zyte | ZYTE_API_KEY (HTTP Basic) | POST-based with surfacing of target HTTP status; AI-extraction features available |
Flags
| Flag | Description |
|---|---|
--provider <slug> | One of the four. Falls back to defaults.fetch.provider. |
--api-key <key> | Override the env var for this call. |
--format <html|markdown|text|next> | Output format. Default html (raw). markdown runs through turndown. text strips tags and script/style. next extracts __NEXT_DATA__ JSON from Next.js Pages Router pages into pages[].nextData. |
--render-js | Run a headless browser to execute JavaScript before returning HTML. Slower; more credits. |
--country <code> | ISO-2 country code for geo-targeting the upstream proxy (us, gb, de, …). |
--premium | Use residential / premium-tier proxies. Provider-specific cost multiplier. Zyte ignores this (single tier). |
--wait <ms> | Milliseconds to wait after page load. Only meaningful with --render-js. |
--device <desktop|mobile> | User-agent / viewport hint. |
--raw | Emit the provider's native response under raw. |
--no-cache | Bypass the response cache for this call. |
--refresh | Skip cache read but write the fresh response. |
--retries <n> | Retry retryable provider errors up to N times. Default 0, max 10. |
--timeout <seconds> | Per-attempt request timeout. Default 120. |
Output shape
{
"ok": true,
"provider": "scrapeops",
"verb": "fetch",
"cached": false,
"data": {
"pages": [
{
"url": "https://example.com",
"status": 200,
"content": "<!doctype html>…",
"format": "html",
"title": "Example Domain",
"bytes": 528
}
],
"failed": []
}
}Per-URL failures (auth, 4xx, 5xx) land in data.failed[] — they don't throw the whole call. Use --retries to retry transient ones.
Examples
# Default: raw HTML, no JS rendering
marmot fetch https://example.com --provider scrapeops
# JS rendering with geo-targeting
marmot fetch https://news.ycombinator.com --provider scraperapi --render-js --country us
# Residential proxy + wait for dynamic content
marmot fetch https://example.com --provider scrapingbee --premium --render-js --wait 3000
# Convert to markdown after fetching
marmot fetch https://example.com --provider scrapeops --format markdown
# Pull __NEXT_DATA__ JSON from a Next.js Pages Router site
marmot fetch https://example-pages-router-site.com --provider zyte --format nextPresets
fetch-mode presets accept urls (list — appends with runtime), format, renderJs, country, premium, waitMs, device, cache, refresh, output, raw, retries, timeout, session.
marmot preset create unblock --mode fetch --provider scrapeops --render-js --country us
marmot @unblock https://target.example.comConfig keys
{ "defaults": { "fetch": { "provider": "scrapeops" } } }