Recommended
6
Indexed models
19
Max context
256K
5 category views covering 19 public models
- Recommended models
- All models
- Text & reasoning
- Vision
- Audio
Recommended
Recommended models
step-3.5-flash
Flagship reasoning
Max context
256K
A flagship reasoning model built for agents. Its reasoning depth rivals leading closed-source models while also delivering ultra-fast responses and stable, reliable tool calling. On top of strong general reasoning, it excels at complex project planning and long-horizon task execution.
step-3
Multimodal reasoning
Max context
64K
Combines visual perception with complex reasoning for cross-modal analysis and knowledge-intensive tasks.
step-2-mini
High-speed text
Max context
32K
An ultra-fast MFA-attention model that delivers strong general-task and coding performance at lower cost.
Related entry points
step-1o-turbo-vision
Recommended vision
Max context
32K
The recommended vision model for image and video understanding, with a lighter footprint and faster output.
step-tts-mini
Expressive TTS
Max context
≤1000 chars
A TTS model focused on emotional expressiveness and controllable style, suitable for multi-emotion voice output and cloning.
step-tts-vivid
High-fidelity TTS
Max context
≤1000 chars
A speech-synthesis model optimized for highly human-like output and strong realism in outbound-call scenarios.