Clone a voice from a previously uploaded WAV or MP3 file so it can be used for TTS audio generation.Documentation Index
Fetch the complete documentation index at: https://platform.stepfun.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Endpoint
POST https://api.stepfun.ai/v1/audio/voices
Request parameters
-
modelstringrequired
TTS model to use. Options:step-tts-2,step-tts-mini. -
textstringoptional
Transcript of the source audio file. If omitted, automatic speech recognition is used. For best results, we recommend providing the transcript. -
file_idstringrequired
File ID of the source audio used for cloning. Obtain the ID via file upload; setpurposetostorage. Supported formats: mp3, wav. Audio length should be 5–10 seconds. -
sample_textstringoptional
Text (max 50 characters) used to create a preview clip.
Response
-
idstring
Voice ID for subsequent audio generation. -
objectstring
Object type, alwaysaudio.voice. -
duplicatedboolean
Indicates the request was duplicated (returned on repeated calls). -
sample_textstring
Text used for the preview audio. -
sample_audiostring
Preview audio in base64 (wav). Convert to a file to play.
Example
- curl