Transcription (STT)

marmot transcribe takes audio in and emits plain text by default. Use --json for the envelope.

marmot transcribe <audio> [flags…]

Providers: openai, openrouter, vercel, cloudflare. On first run, marmot detects available API keys in the env and auto-configures a default in this order: openroutervercelcloudflareopenai. Override any time with marmot setup, marmot config set, or --provider.

Output

Default is plain text on stdout — pipe-friendly.

FlagOutput
(none)Plain transcribed text
--json (alias for --format json)Structured envelope
--format textPlain text (same as default)
--format srtSRT subtitles
--format vttWebVTT subtitles
--format verbose-jsonEnvelope + raw provider response

Examples

marmot transcribe ./meeting.mp3
marmot transcribe ./meeting.mp3 --json
marmot transcribe ./meeting.mp3 --format srt -o ./meeting.srt

# Pipe audio in
cat ./meeting.mp3 | marmot transcribe

# Round-trip with speak
marmot speak 'hello world' --play | marmot transcribe

# Bias with context
marmot transcribe ./call.mp3 --prompt 'technical interview, names: Ada, Linus'

# Cloudflare Whisper turbo
marmot transcribe ./meeting.mp3 --provider cloudflare \
  --model @cf/openai/whisper-large-v3-turbo --language en

Flags

For cross-cutting flags see Common flags. Transcribe-specific:

FlagDescription
--model <id>Transcription model. Defaults to provider's default.
-i, --input <path>Audio file path (alternative to positional arg).
-o, --output <path>Write rendered output to a file.
--language <code>ISO-639-1 language hint (e.g. en, es).
--prompt <text>Bias prompt to guide the transcription.
--format <fmt>text (default), json, srt, vtt, verbose-json.
--textPlain text (now the default; flag kept for back-compat).
--jsonAlias for --format json.

Audio source priority: positional path → --input → piped binary stdin. At least one is required.