Speech

Turn text into speech, played or piped.

marmot speak <text> [flags…]

Providers: openai, openrouter, vercel, cloudflare. On first run, marmot detects available API keys in the env and auto-configures a default in this order: openroutervercelcloudflareopenai. Override any time with marmot setup, marmot config set, or --provider.

Output

Default behavior is TTY-aware:

InvocationOutput
marmot speak '...' (terminal)Plays through speakers (writes a temp file, plays in foreground, deletes after).
marmot speak '...' > out.mp3Writes raw audio bytes to stdout (auto-binary).
marmot speak '...' | nextSame — bytes on stdout.
marmot speak '...' -o hi.mp3Writes to hi.mp3, prints the path.
marmot speak '...' --playPlays. When piped, also emits bytes downstream so the pipeline continues.
marmot speak '...' --binaryForces raw bytes regardless.
marmot speak '...' --b64JSON envelope with inline base64.
marmot speak '...' --jsonWrites file, emits full JSON envelope.

Examples

marmot speak 'Hello from marmot'                 # plays on TTY
marmot speak 'Hola mundo' --provider cloudflare --model @cf/myshell-ai/melotts
marmot speak 'Welcome' --voice nova -o ./hello.mp3

# Pipe bytes to a player
marmot speak 'Hello' | mpv -

# Play AND continue piping (e.g. round-trip transcribe)
marmot speak 'Hello from marmot' --play | marmot transcribe

# Steerable voice
marmot speak 'Welcome aboard' --model gpt-4o-mini-tts --voice ash \
  --instructions 'cheerful, slow, slightly British'

Flags

For cross-cutting flags see Common flags. Speak-specific:

FlagDescription
--model <id>Speech model. Defaults to provider's default.
--voice <name>Voice id (provider-specific).
--format <fmt>Audio format: mp3 (default), wav, flac, aac, opus.
--speed <n>Playback speed multiplier (0.25–4.0). OpenAI only.
--instructions <text>Steering text for steerable voices (e.g. gpt-4o-mini-tts).
--provider-option <key=value>Generic passthrough. Repeatable. Lands in providerOptions[<provider>] for niche TTS params.
-o, --output <path>Output audio path. With -o set on a TTY, the path-print on stdout is suppressed; auto-playback still happens unless --quiet. See Stdout decision matrix.
-q, --quietSuppress stdout. File output via -o is still written; stderr unaffected.
-p, --prompt-file <path>Read text from a file.
--play / --no-playPlay through speakers. Default on a TTY. When piped, also emits bytes downstream. --no-play overrides a preset's play: true.
--wait / --no-waitWith --play, block until playback finishes.
--binary / --no-binaryForce raw audio bytes to stdout.
--b64 / --no-b64JSON envelope with inline base64.
--json / --no-jsonJSON envelope on stdout (instead of just the path).

--binary and --b64 are mutually exclusive. --play can combine with binary/pipe — that's the documented "play AND continue piping" mode.

Presets

speech-mode presets accept text (positional), promptFile, voice, format, speed, instructions, providerOption, output, binary, b64, json, play, wait, retries, timeout, session. The preset's text concatenates with the runtime positional text. See Presets — Merge rules.

marmot preset create narrator --mode speech --provider openai \
  --voice nova --format mp3 --no-play
marmot @narrator "Welcome to the show."
marmot @narrator --play "Now playing through speakers."   # override preset