Transcription (STT)
marmot transcribe takes audio in and emits plain text by default. Use --json for the envelope.
marmot transcribe <audio> [flags…]Providers: openai, openrouter, vercel, cloudflare. On first run, marmot detects available API keys in the env and auto-configures a default in this order: openrouter → vercel → cloudflare → openai. Override any time with marmot setup, marmot config set, or --provider.
Output
Default is plain text on stdout — pipe-friendly.
| Flag | Output |
|---|---|
| (none) | Plain transcribed text |
--json (alias for --format json) | Structured envelope |
--format text | Plain text (same as default) |
--format srt | SRT subtitles |
--format vtt | WebVTT subtitles |
--format verbose-json | Envelope + raw provider response |
Examples
marmot transcribe ./meeting.mp3
marmot transcribe ./meeting.mp3 --json
marmot transcribe ./meeting.mp3 --format srt -o ./meeting.srt
# Pipe audio in
cat ./meeting.mp3 | marmot transcribe
# Round-trip with speak
marmot speak 'hello world' --play | marmot transcribe
# Bias with context
marmot transcribe ./call.mp3 --prompt 'technical interview, names: Ada, Linus'
# Cloudflare Whisper turbo
marmot transcribe ./meeting.mp3 --provider cloudflare \
--model @cf/openai/whisper-large-v3-turbo --language enFlags
For cross-cutting flags see Common flags. Transcribe-specific:
| Flag | Description |
|---|---|
--model <id> | Transcription model. Defaults to provider's default. |
-i, --input <path> | Audio file path (alternative to positional arg). |
-o, --output <path> | Write rendered output to a file. |
--language <code> | ISO-639-1 language hint (e.g. en, es). |
--prompt <text> | Bias prompt to guide the transcription. |
--format <fmt> | text (default), json, srt, vtt, verbose-json. |
--text | Plain text (now the default; flag kept for back-compat). |
--json | Alias for --format json. |
Audio source priority: positional path → --input → piped binary stdin. At least one is required.