Speech
Turn text into speech, played or piped.
marmot speak <text> [flags…]Providers: openai, openrouter, vercel, cloudflare. On first run, marmot detects available API keys in the env and auto-configures a default in this order: openrouter → vercel → cloudflare → openai. Override any time with marmot setup, marmot config set, or --provider.
Output
Default behavior is TTY-aware:
| Invocation | Output |
|---|---|
marmot speak '...' (terminal) | Plays through speakers (writes a temp file, plays in foreground, deletes after). |
marmot speak '...' > out.mp3 | Writes raw audio bytes to stdout (auto-binary). |
marmot speak '...' | next | Same — bytes on stdout. |
marmot speak '...' -o hi.mp3 | Writes to hi.mp3, prints the path. |
marmot speak '...' --play | Plays. When piped, also emits bytes downstream so the pipeline continues. |
marmot speak '...' --binary | Forces raw bytes regardless. |
marmot speak '...' --b64 | JSON envelope with inline base64. |
marmot speak '...' --json | Writes file, emits full JSON envelope. |
Examples
marmot speak 'Hello from marmot' # plays on TTY
marmot speak 'Hola mundo' --provider cloudflare --model @cf/myshell-ai/melotts
marmot speak 'Welcome' --voice nova -o ./hello.mp3
# Pipe bytes to a player
marmot speak 'Hello' | mpv -
# Play AND continue piping (e.g. round-trip transcribe)
marmot speak 'Hello from marmot' --play | marmot transcribe
# Steerable voice
marmot speak 'Welcome aboard' --model gpt-4o-mini-tts --voice ash \
--instructions 'cheerful, slow, slightly British'Flags
For cross-cutting flags see Common flags. Speak-specific:
| Flag | Description |
|---|---|
--model <id> | Speech model. Defaults to provider's default. |
--voice <name> | Voice id (provider-specific). |
--format <fmt> | Audio format: mp3 (default), wav, flac, aac, opus. |
--speed <n> | Playback speed multiplier (0.25–4.0). OpenAI only. |
--instructions <text> | Steering text for steerable voices (e.g. gpt-4o-mini-tts). |
--provider-option <key=value> | Generic passthrough. Repeatable. Lands in providerOptions[<provider>] for niche TTS params. |
-o, --output <path> | Output audio path. With -o set on a TTY, the path-print on stdout is suppressed; auto-playback still happens unless --quiet. See Stdout decision matrix. |
-q, --quiet | Suppress stdout. File output via -o is still written; stderr unaffected. |
-p, --prompt-file <path> | Read text from a file. |
--play / --no-play | Play through speakers. Default on a TTY. When piped, also emits bytes downstream. --no-play overrides a preset's play: true. |
--wait / --no-wait | With --play, block until playback finishes. |
--binary / --no-binary | Force raw audio bytes to stdout. |
--b64 / --no-b64 | JSON envelope with inline base64. |
--json / --no-json | JSON envelope on stdout (instead of just the path). |
--binary and --b64 are mutually exclusive. --play can combine with binary/pipe — that's the documented "play AND continue piping" mode.
Presets
speech-mode presets accept text (positional), promptFile, voice, format, speed, instructions, providerOption, output, binary, b64, json, play, wait, retries, timeout, session. The preset's text concatenates with the runtime positional text. See Presets — Merge rules.
marmot preset create narrator --mode speech --provider openai \
--voice nova --format mp3 --no-play
marmot @narrator "Welcome to the show."
marmot @narrator --play "Now playing through speakers." # override preset