Skip to main content

Documentation Index

Fetch the complete documentation index at: https://platform.stepfun.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

StepFun provides developers with voice interaction models that support audio generation and voice cloning. By integrating these models, applications can extend beyond standard large language model understanding and enable voice interaction.

Quick Start

Quickly Generate an Audio Clip

Copy the following code to quickly generate an audio file.
curl --location 'https://api.stepfun.ai/v1/audio/speech' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer $STEP_API_KEY" \
--data '{
   "model":"step-tts-2",
   "input":"StepFun is building the next generation of AGI.",
   "voice":"lively-girl"
}'\
--output "step.mp3"

Voice Recommendations by Scenario

StepFun offers dozens of recommended voices across seven major scenarios. You can preview different voices here and use them via the API. We strongly recommend using voice cloning to create custom voices. The step-tts-2 model delivers industry-leading cloning performance, and cloned voices support all emotion and style controls at zero additional cost.

1. Marketing

Marketing scenarios require voices with charisma, persuasiveness, and warmth that can effectively convey product value and inspire purchase intent. Step-TTS delivers full emotional expression, building trust and professionalism to make marketing content more compelling.
Supported ModelsVoice NameVoice IDAudio Samples
stepaudio-2.5-tts / step-tts-2 / step-tts-miniLively Breezylivelybreezy-femaleSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniUpright YouthzhengpaiqingnianSample 1 · Sample 2

2. Customer Service

Customer service scenarios require voices that are warm, patient, and professional, capable of calming users and providing clear solutions. We offer two types of customer service voices — step-tts-2 voices stand out with rich audio quality, full emotion, and a lifelike human feel, making the first four recommendations especially suited for phone scenarios.
Supported ModelsVoice NameVoice IDAudio Samples
stepaudio-2.5-tts / step-tts-2Straightforward MaleshuangkuainanshengSample 1 · Sample 2 · Sample 3
stepaudio-2.5-tts / step-tts-2Capable FemaleganliannvshengSample 1 · Sample 2 · Sample 3
stepaudio-2.5-tts / step-tts-2Warm FemaleqinhenvshengSample 1 · Sample 2 · Sample 3
stepaudio-2.5-tts / step-tts-2Energetic FemalehuolinvshengSample 1 · Sample 2 · Sample 3
stepaudio-2.5-tts / step-tts-2 / step-tts-miniElegant Gentleelegantgentle-femaleSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniLively Breezylivelybreezy-femaleSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniGentle MalewenrounanshengSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniClassic FemalejingdiannvshengSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniMature GentlewenroushunvSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniSweet FemaletianmeinvshengSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniPure GirlqingchunshaonvSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniSpirited MaleyuanqinanshengSample 1 · Sample 2

3. Audiobook

Audiobooks require voices that are expressive and emotionally engaging, capable of vividly bringing different characters and story atmospheres to life. Our TTS stands out with its delicate emotional expression and versatile vocal styles, enabling listeners to fully immerse themselves in the world of the story.
Supported ModelsVoice NameVoice IDAudio Samples
stepaudio-2.5-tts / step-tts-2Lively Girllively-girlSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniScholarly GentlemanruyananshiSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniGentle FemalewenrounvshengSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniTender GentlemanwenrougongziSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniMagnetic MalecixingnanshengSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniSpirited GirlyuanqishaonvSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniUpright YouthzhengpaiqingnianSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniSpirited MaleyuanqinanshengSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniBroadcast MaleboyinnanshengSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniDeep MaleshenchennanyinSample 1 · Sample 2

4. Emotional Companionship

Emotional companionship requires voices that are warm, gentle, and empathetic, capable of providing users with comfort and psychological support. Our TTS features delicate, soothing voice timbres with strong emotional expressiveness, helping you create a safe and comforting interaction environment for users.
Supported ModelsVoice NameVoice IDAudio Samples
stepaudio-2.5-tts / step-tts-2Soft-spoken Gentlemansoft-spoken-gentlemanSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniElegant Gentleelegantgentle-femaleSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniLively Breezylivelybreezy-femaleSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniGentle MalewenrounanshengSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniTender GentlemanwenrougongziSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniClassic FemalejingdiannvshengSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniFriendly FemaleqinqienvshengSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniSweet FemaletianmeinvshengSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniMagnetic MalecixingnanshengSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniSpirited GirlyuanqishaonvSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniGirl Next DoorlinjiajiejieSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniScholarly GentlemanruyananshiSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniDeep MaleshenchennanyinSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniGentle FemalewenrounvshengSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniCute Soft FemaleruanmengnvshengSample 1 · Sample 2

5. Voice Assistant

Voice assistant scenarios require voices that are clear, natural, and efficient, capable of accurately understanding and responding to user commands. Our TTS features natural prosody and full emotional expression, making your voice assistant both professional and approachable.
Supported ModelsVoice NameVoice IDAudio Samples
stepaudio-2.5-tts / step-tts-2 / step-tts-miniElegant Gentleelegantgentle-femaleSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniLively Breezylivelybreezy-femaleSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniPure GirlqingchunshaonvSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniSpirited GirlyuanqishaonvSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniGirl Next DoorlinjiajiejieSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniScholarly GentlemanruyananshiSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniClever GirljilingshaonvSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniCute Soft FemaleruanmengnvshengSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniKid SisterlinjiameimeiSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniIntellectual LadyzhixingjiejieSample 1 · Sample 2

6. Video Dubbing

Video dubbing requires voices that are expressive, rhythmic, and visually evocative, capable of blending seamlessly with visual content. Our TTS excels in precise emotional delivery and fine-grained speech rhythm control, enhancing the impact and overall appeal of your videos.
Supported ModelsVoice NameVoice IDAudio Samples
stepaudio-2.5-tts / step-tts-2Vibrant Youthvibrant-youthSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2Magnetic-voiced Malemagnetic-voiced-maleSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniGirl Next DoorlinjiajiejieSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniKid SisterlinjiameimeiSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniCollege StudentqingniandaxueshengSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniCute Soft FemaleruanmengnvshengSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniElegant FemaleyouyanvshengSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniCool BeautylengyanyujieSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniIntellectual LadyzhixingjiejieSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniBold SistershuangkuaijiejieSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniQuiet ScholarwenjingxuejieSample 1 · Sample 2

7. Education & Training

Education and training scenarios require voices that are clear, accurate, and inspiring, capable of effectively conveying knowledge and sparking learning interest. Our TTS excels at capturing the vocal characteristics of instructors across different emotional states.
Supported ModelsVoice NameVoice IDAudio Samples
stepaudio-2.5-tts / step-tts-2 / step-tts-miniElegant Gentleelegantgentle-femaleSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniGentle MalewenrounanshengSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniLively Breezylivelybreezy-femaleSample 1 · Sample 2
stepaudio-2.5-tts / step-tts-2 / step-tts-miniMature GentlewenroushunvSample 1 · Sample 2

System Voice ID List

Voice NameVoice IDSupported ModelsRecommended Use Cases
Vibrant Youthvibrant-youthstepaudio-2.5-tts, step-tts-2Audiobook, video dubbing
Lively Girllively-girlstepaudio-2.5-tts, step-tts-2Audiobook, video dubbing
Soft-spoken Gentlemansoft-spoken-gentlemanstepaudio-2.5-tts, step-tts-2Emotional companionship, audiobook
Magnetic-voiced Malemagnetic-voiced-malestepaudio-2.5-tts, step-tts-2Audiobook, video dubbing
Confident Malezixinnanshengstepaudio-2.5-tts, step-tts-2Audiobook, emotional companionship, education, marketing
Elegant Gentleelegantgentle-femalestepaudio-2.5-tts, step-tts-2, step-tts-miniCustomer service, voice-over, education, emotional companionship
Lively Breezylivelybreezy-femalestepaudio-2.5-tts, step-tts-2, step-tts-miniEmotional companionship, customer service, education, marketing
Gentle Malewenrounanshengstepaudio-2.5-tts, step-tts-2, step-tts-miniVoice-over, emotional companionship, customer service, education
Tender Gentlemanwenrougongzistepaudio-2.5-tts, step-tts-2, step-tts-miniEmotional companionship, audiobook
Spirited Maleyuanqinanshengstepaudio-2.5-tts, step-tts-2, step-tts-miniAudiobook, voice-over, customer service
Classic Femalejingdiannvshengstepaudio-2.5-tts, step-tts-2, step-tts-miniCustomer service, emotional companionship
Mature Gentlewenroushunvstepaudio-2.5-tts, step-tts-2, step-tts-miniCustomer service, voice-over, education
Sweet Femaletianmeinvshengstepaudio-2.5-tts, step-tts-2, step-tts-miniEmotional companionship, customer service
Pure Girlqingchunshaonvstepaudio-2.5-tts, step-tts-2, step-tts-miniCustomer service, voice assistant
Magnetic Malecixingnanshengstepaudio-2.5-tts, step-tts-2, step-tts-miniAudiobook, emotional companionship
Spirited Girlyuanqishaonvstepaudio-2.5-tts, step-tts-2, step-tts-miniAudiobook, emotional companionship, voice assistant
Girl Next Doorlinjiajiejiestepaudio-2.5-tts, step-tts-2, step-tts-miniVoice-over, emotional companionship, voice assistant, video dubbing
Upright Youthzhengpaiqingnianstepaudio-2.5-tts, step-tts-2, step-tts-miniMarketing, audiobook
College Studentqingniandaxueshengstepaudio-2.5-tts, step-tts-2, step-tts-miniVoice-over
Broadcast Maleboyinnanshengstepaudio-2.5-tts, step-tts-2, step-tts-miniAudiobook, voice-over
Scholarly Gentlemanruyananshistepaudio-2.5-tts, step-tts-2, step-tts-miniAudiobook, emotional companionship, voice-over, voice assistant
Deep Maleshenchennanyinstepaudio-2.5-tts, step-tts-2, step-tts-miniEmotional companionship, audiobook
Friendly Femaleqinqienvshengstepaudio-2.5-tts, step-tts-2, step-tts-miniVoice-over
Gentle Femalewenrounvshengstepaudio-2.5-tts, step-tts-2, step-tts-miniAudiobook, emotional companionship
Clever Girljilingshaonvstepaudio-2.5-tts, step-tts-2, step-tts-miniVoice assistant, voice-over
Cute Soft Femaleruanmengnvshengstepaudio-2.5-tts, step-tts-2, step-tts-miniEmotional companionship, voice assistant, video dubbing
Elegant Femaleyouyanvshengstepaudio-2.5-tts, step-tts-2, step-tts-miniVideo dubbing
Cool Beautylengyanyujiestepaudio-2.5-tts, step-tts-2, step-tts-miniVideo dubbing
Bold Sistershuangkuaijiejiestepaudio-2.5-tts, step-tts-2, step-tts-miniVoice-over
Quiet Scholarwenjingxuejiestepaudio-2.5-tts, step-tts-2, step-tts-miniVoice-over
Kid Sisterlinjiameimeistepaudio-2.5-tts, step-tts-2, step-tts-miniVideo dubbing, voice-over, voice assistant
Intellectual Ladyzhixingjiejiestepaudio-2.5-tts, step-tts-2, step-tts-miniVideo dubbing, voice-over, voice assistant
Straightforward Maleshuangkuainanshengstepaudio-2.5-tts, step-tts-2Customer service, voice assistant
Capable Femaleganliannvshengstepaudio-2.5-tts, step-tts-2Customer service, voice assistant
Warm Femaleqinhenvshengstepaudio-2.5-tts, step-tts-2Customer service, voice assistant
Energetic Femalehuolinvshengstepaudio-2.5-tts, step-tts-2Customer service, voice assistant

Voice Tags List

Voice tags support three categories: speaking style, emotion, and language. Emotion tags must be set in the voice_label.emotion field, while speaking-style tags must be set in the voice_label.style field.
stepaudio-2.5-tts does NOT support voice tags. Use the instruction parameter for emotion and style control instead.
No.Tag NameTag Typestep-tts-2step-tts-mini
1HappyEmotion
2Very HappyEmotion
3SadEmotion
4AngryEmotion
5Very AngryEmotion
6CoquettishEmotion
7SlowSpeaking Style
8Very SlowSpeaking Style
9FastSpeaking Style
10Very FastSpeaking Style
11FearfulEmotion
12SurprisedEmotion
13ExcitedEmotion
14AdmiringEmotion
15ConfusedEmotion
16ColdDelivery Style
17EmbarrassedDelivery Style
18FrustratedDelivery Style
19ProudDelivery Style
20TenderDelivery Style
21SweetDelivery Style
22OutgoingDelivery Style
23SeriousDelivery Style
24ArrogantDelivery Style
25ElderlyDelivery Style
26ShoutingDelivery Style
27SarcasticDelivery Style
28StutteringDelivery Style

Output Format

StepFun TTS models support audio output in wav, mp3, flac, opus, and pcm formats. The default format is mp3. You can choose the format that best suits your use case.

Output Languages

StepFun TTS models support generating audio in Chinese, English, mixed Chinese-English, and Japanese.

FAQ

Do I own the audio I generate? Yes. You own the audio you create. However, we recommend informing your end users that the audio was generated by AI so they are aware of its nature. How do I adjust the volume of the generated audio? You can set the volume parameter when calling the generation API. Valid values range from 0.1 to 2.0, representing 10% volume to 200% volume. How do I adjust the speaking rate of the generated audio? You can set the speed parameter when calling the generation API. Valid values range from 0.5 to 2.0, representing half-speed to double-speed.