Endpoint
POST https://api.stepfun.ai/v1/audio/voices
Request parameters
-
modelstringrequired
TTS model to use. Options:step-tts-2. -
textstringrequired
Transcript of the source audio file. -
file_idstringrequired
File ID of the source audio used for cloning. Obtain the ID via file upload; setpurposetostorage. Supported formats: mp3, wav. Audio length should be 5–10 seconds. -
sample_textstringoptional
Text (max 50 characters) used to create a preview clip.
Response
-
idstring
Voice ID for subsequent audio generation. -
objectstring
Object type, alwaysaudio.voice. -
duplicatedboolean
Indicates the request was duplicated (returned on repeated calls). -
sample_textstring
Text used for the preview audio. -
sample_audiostring
Preview audio in base64 (wav). Convert to a file to play.
Example
- curl