Crawl

Walk a domain and return its pages.

marmot crawl <url> [flags…]

Providers

firecrawl, tavily. Behavior differs:

  • Firecrawl: async — submits a job, returns a task id (or polls when --wait).
  • Tavily: sync — runs to completion, server-capped at 150 seconds.

Flags

FlagDescription
--provider <slug>firecrawl or tavily. Falls back to defaults.crawl.provider.
--api-key <key>Override the env var.
--max-pages <n>Cap pages crawled.
--max-depth <n>Discovery depth.
--instructions <text>Natural-language guidance (Tavily; doubles cost).
--include-paths <csv>Regex patterns of paths to include.
--exclude-paths <csv>Regex patterns of paths to exclude.
--allow-externalFollow off-domain links.
--waitBlock until done (default for Firecrawl async).
--asyncSubmit and return the task id immediately (Firecrawl only).
--rawEmit the provider's native response under raw.
--jsonEmit the structured envelope (default).
--retries <n>Retry the initial submission up to N times. Polling is unaffected. Default 0, max 10.
--timeout <seconds>Per-attempt submit timeout. Default 120.

Async behavior

When Firecrawl is the provider, the call is async. Default is --wait (poll until done). Pass --async to get the task id immediately and follow up with marmot get <id>.

See Async tasks.

Presets

crawl-mode presets accept url, maxPages, maxDepth, instructions (concat), includePaths, excludePaths, allowExternal, wait, async, output, raw, retries, timeout, session. The positional URL is optional when the preset supplies it. New negation flags: --no-allow-external, --no-wait, --no-async, --no-raw.

--instructions switches replace → concatenate (Breaking, 0.7.0): a preset's instructions plus a runtime --instructions now compose with \n\n instead of dropping the preset value.

marmot preset create site-crawl --mode crawl --provider firecrawl \
  --max-pages 100 --instructions "Skip auth pages"
marmot @site-crawl https://example.com --instructions "and skip /admin"

See Presets — Merge rules.

Config keys

{ "defaults": { "crawl": { "provider": "firecrawl" } } }