Recommended
2
Indexed models
3
Max context
256K
4 category views covering 3 public models
- Recommended models
- All models
- Text & reasoning
- Audio
Recommended
Recommended models
Step 3.7 Flash
Flagship multimodal reasoning
Max context
256K
StepFun’s flagship multimodal reasoning model. Building on step-3.5-flash’s high-throughput reasoning and tool calling, it adds native multimodal input — understanding images and videos directly, without an additional vision MCP or auxiliary model. Three reasoning effort levels (low / medium / high) make it a fast and dependable choice for agent, coding, and multimodal workloads.
step-3.5-flash
Flagship reasoning
Max context
256K
A flagship reasoning model built for agents. Its reasoning depth rivals leading closed-source models while also delivering ultra-fast responses and stable, reliable tool calling. On top of strong general reasoning, it excels at complex project planning and long-horizon task execution.
stepaudio-2.5-tts
Contextual TTS
Max context
≤10000 chars
The first model to integrate contextual understanding into the full speech generation pipeline. Supports Global Context + Inline Context dual-level control via natural language descriptions for precise emotion and style control. Ideal for audiobooks, drama dubbing, ad narration, and other high-expressiveness scenarios.