Fetch

Fetch raw HTML from URLs through a scraping-proxy provider (ScrapeOps, ScraperAPI, ScrapingBee, Zyte). Distinct from `scrape`.

marmot fetch <url> [<url> …] [flags…]

fetch is the proxy-aggregator verb. Given a URL, it issues the request through an unblock/proxy service and returns the raw HTML the target served. It is distinct from scrape (which returns clean markdown from a content-extraction provider): fetch puts you closer to the wire, with toggles for JS rendering, country targeting, and residential proxies, then optionally formats the result via --format markdown | text | next.

When to use fetch vs. scrape

  • fetch when you want raw HTML, when the page is bot-protected, or when you need to control geo / device / JS rendering. Providers: scrapeops, scraperapi, scrapingbee, zyte.
  • scrape when you want clean markdown extracted by a content-aware provider. Providers: exa, firecrawl, parallel, tavily.

Providers

scrapeops, scraperapi, scrapingbee, zyte.

ProviderAuthStrengths
ScrapeOpsSCRAPEOPS_API_KEY (query param)Aggregates 30+ proxy backends; 1,000 free credits/month at signup
ScraperAPISCRAPERAPI_API_KEY (query param)Strong JS rendering via real Chromium; 5,000 free credits/month
ScrapingBeeSCRAPINGBEE_API_KEY (query param)Residential proxies; defaults to render_js=true (marmot opts out by default)
ZyteZYTE_API_KEY (HTTP Basic)POST-based with surfacing of target HTTP status; AI-extraction features available

Flags

FlagDescription
--provider <slug>One of the four. Falls back to defaults.fetch.provider.
--api-key <key>Override the env var for this call.
--format <html|markdown|text|next>Output format. Default html (raw). markdown runs through turndown. text strips tags and script/style. next extracts __NEXT_DATA__ JSON from Next.js Pages Router pages into pages[].nextData.
--render-jsRun a headless browser to execute JavaScript before returning HTML. Slower; more credits.
--country <code>ISO-2 country code for geo-targeting the upstream proxy (us, gb, de, …).
--premiumUse residential / premium-tier proxies. Provider-specific cost multiplier. Zyte ignores this (single tier).
--wait <ms>Milliseconds to wait after page load. Only meaningful with --render-js.
--device <desktop|mobile>User-agent / viewport hint.
--rawEmit the provider's native response under raw.
--no-cacheBypass the response cache for this call.
--refreshSkip cache read but write the fresh response.
--retries <n>Retry retryable provider errors up to N times. Default 0, max 10.
--timeout <seconds>Per-attempt request timeout. Default 120.

Output shape

{
  "ok": true,
  "provider": "scrapeops",
  "verb": "fetch",
  "cached": false,
  "data": {
    "pages": [
      {
        "url": "https://example.com",
        "status": 200,
        "content": "<!doctype html>…",
        "format": "html",
        "title": "Example Domain",
        "bytes": 528
      }
    ],
    "failed": []
  }
}

Per-URL failures (auth, 4xx, 5xx) land in data.failed[] — they don't throw the whole call. Use --retries to retry transient ones.

Examples

# Default: raw HTML, no JS rendering
marmot fetch https://example.com --provider scrapeops

# JS rendering with geo-targeting
marmot fetch https://news.ycombinator.com --provider scraperapi --render-js --country us

# Residential proxy + wait for dynamic content
marmot fetch https://example.com --provider scrapingbee --premium --render-js --wait 3000

# Convert to markdown after fetching
marmot fetch https://example.com --provider scrapeops --format markdown

# Pull __NEXT_DATA__ JSON from a Next.js Pages Router site
marmot fetch https://example-pages-router-site.com --provider zyte --format next

Presets

fetch-mode presets accept urls (list — appends with runtime), format, renderJs, country, premium, waitMs, device, cache, refresh, output, raw, retries, timeout, session.

marmot preset create unblock --mode fetch --provider scrapeops --render-js --country us
marmot @unblock https://target.example.com

Config keys

{ "defaults": { "fetch": { "provider": "scrapeops" } } }