Skip to main content

Documentation Index

Fetch the complete documentation index at: https://platform.stepfun.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Audio models

What model should I use for text-to-speech?

For new projects, use stepaudio-2.5-tts — our flagship contextual TTS model with zero-shot voice cloning and natural-language control over emotion and style. Use step-tts-2 or step-tts-mini if you rely on tag-based voice/emotion control or preset voice libraries. See Audio Models for a full comparison.

Where can I find the TTS API parameters?

See Generate audio for the full request schema and examples.

What audio formats are supported?

wav, mp3, flac, opus, pcm. Default is mp3.

Is there a limit on input length?

Yes. The maximum input length is 1,000 characters per request.