Transcription

Turn audio into a transcript.

marmot transcribe <audio> [flags…]

Providers: openai, openrouter, vercel, cloudflare. On first run, marmot detects available API keys in the env and auto-configures a default in this order: openroutervercelcloudflareopenai. Override any time with marmot setup, marmot config set, or --provider.

Output

Default is plain text on stdout — pipe-friendly.

FlagOutput
(none)Plain transcribed text
--json (alias for --format json)Structured envelope
--format textPlain text (same as default)
--format srtSRT subtitles
--format vttWebVTT subtitles
--format verbose-jsonEnvelope + raw provider response

Examples

marmot transcribe ./meeting.mp3
marmot transcribe ./meeting.mp3 --json
marmot transcribe ./meeting.mp3 --format srt -o ./meeting.srt

# Pipe audio in
cat ./meeting.mp3 | marmot transcribe

# Round-trip with speak
marmot speak 'hello world' --play | marmot transcribe

# Bias with context
marmot transcribe ./call.mp3 --prompt 'technical interview, names: Ada, Linus'

# Cloudflare Whisper turbo
marmot transcribe ./meeting.mp3 --provider cloudflare \
  --model @cf/openai/whisper-large-v3-turbo --language en

Flags

For cross-cutting flags see Common flags. Transcribe-specific:

FlagDescription
--model <id>Transcription model. Defaults to provider's default.
-o, --output <path>Write rendered output to a file. With -o set on a TTY, stdout stays silent; when piped, the transcript still flows to the pipe. See Stdout decision matrix.
-q, --quietSuppress stdout. File output via -o is still written; stderr unaffected.
--language <code>ISO-639-1 language hint (e.g. en, es).
--prompt <text>Bias prompt to guide the transcription. Concatenates with a preset's prompt field when both are set.
--format <fmt>text (default), json, srt, vtt, verbose-json.
--provider-option <key=value>Generic passthrough. Repeatable. Lands in providerOptions[<provider>] for niche STT params (timestamp_granularities, etc.).
--text / --no-textPlain text (now the default; flag kept for back-compat).
--json / --no-jsonAlias for --format json.

Audio source priority: positional path → preset audio field → piped binary stdin. At least one is required. (The legacy -i, --input flag was removed in 0.7.0; pass the audio path positionally or set audio in a preset.)

Presets

transcription-mode presets accept audio (positional), language, format, prompt (concatenates with runtime), providerOption, output, text, json, retries, timeout, session. See Presets — Merge rules.

marmot preset create whisper-en --mode transcription --provider openai \
  --language en --prompt "Technical vocabulary."
marmot @whisper-en ./meeting.mp3
marmot @whisper-en ./call.mp3 --prompt "Names: Ada, Linus."   # both prompts apply