Skip to main content
Clone a voice from a previously uploaded WAV or MP3 file so it can be used for TTS audio generation.

Endpoint

POST https://api.stepfun.ai/v1/audio/voices

Request parameters

  • model string required
    TTS model to use. Options: step-tts-2.
  • text string required
    Transcript of the source audio file.
  • file_id string required
    File ID of the source audio used for cloning. Obtain the ID via file upload; set purpose to storage. Supported formats: mp3, wav. Audio length should be 5–10 seconds.
  • sample_text string optional
    Text (max 50 characters) used to create a preview clip.

Response

  • id string
    Voice ID for subsequent audio generation.
  • object string
    Object type, always audio.voice.
  • duplicated boolean
    Indicates the request was duplicated (returned on repeated calls).
  • sample_text string
    Text used for the preview audio.
  • sample_audio string
    Preview audio in base64 (wav). Convert to a file to play.

Example

curl -L 'https://api.stepfun.ai/v1/audio/voices' \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer $STEP_API_KEY" \
-d '{
    "file_id":"file-Ckyl3cV09A",
    "model":"step-tts-2",
    "text":"StepFun intelligence, 10x possibilities for everyone.",
    "sample_text":"Nice weather today"
}'