Transcription
Turn audio into a transcript.
marmot transcribe <audio> [flags…]Providers: openai, openrouter, vercel, cloudflare. On first run, marmot detects available API keys in the env and auto-configures a default in this order: openrouter → vercel → cloudflare → openai. Override any time with marmot setup, marmot config set, or --provider.
Output
Default is plain text on stdout — pipe-friendly.
| Flag | Output |
|---|---|
| (none) | Plain transcribed text |
--json (alias for --format json) | Structured envelope |
--format text | Plain text (same as default) |
--format srt | SRT subtitles |
--format vtt | WebVTT subtitles |
--format verbose-json | Envelope + raw provider response |
Examples
marmot transcribe ./meeting.mp3
marmot transcribe ./meeting.mp3 --json
marmot transcribe ./meeting.mp3 --format srt -o ./meeting.srt
# Pipe audio in
cat ./meeting.mp3 | marmot transcribe
# Round-trip with speak
marmot speak 'hello world' --play | marmot transcribe
# Bias with context
marmot transcribe ./call.mp3 --prompt 'technical interview, names: Ada, Linus'
# Cloudflare Whisper turbo
marmot transcribe ./meeting.mp3 --provider cloudflare \
--model @cf/openai/whisper-large-v3-turbo --language enFlags
For cross-cutting flags see Common flags. Transcribe-specific:
| Flag | Description |
|---|---|
--model <id> | Transcription model. Defaults to provider's default. |
-o, --output <path> | Write rendered output to a file. With -o set on a TTY, stdout stays silent; when piped, the transcript still flows to the pipe. See Stdout decision matrix. |
-q, --quiet | Suppress stdout. File output via -o is still written; stderr unaffected. |
--language <code> | ISO-639-1 language hint (e.g. en, es). |
--prompt <text> | Bias prompt to guide the transcription. Concatenates with a preset's prompt field when both are set. |
--format <fmt> | text (default), json, srt, vtt, verbose-json. |
--provider-option <key=value> | Generic passthrough. Repeatable. Lands in providerOptions[<provider>] for niche STT params (timestamp_granularities, etc.). |
--text / --no-text | Plain text (now the default; flag kept for back-compat). |
--json / --no-json | Alias for --format json. |
Audio source priority: positional path → preset audio field → piped binary stdin. At least one is required. (The legacy -i, --input flag was removed in 0.7.0; pass the audio path positionally or set audio in a preset.)
Presets
transcription-mode presets accept audio (positional), language, format, prompt (concatenates with runtime), providerOption, output, text, json, retries, timeout, session. See Presets — Merge rules.
marmot preset create whisper-en --mode transcription --provider openai \
--language en --prompt "Technical vocabulary."
marmot @whisper-en ./meeting.mp3
marmot @whisper-en ./call.mp3 --prompt "Names: Ada, Linus." # both prompts apply