> ## Documentation Index > Fetch the complete documentation index at: https://platform.stepfun.ai/docs/llms.txt > Use this file to discover all available pages before exploring further. # All Reasoning Models ## Model overview Our reasoning models are built for deep analytical work — logical reasoning, math, coding, and long-running agent tasks. ## Models ### Step 3.7 Flash Recommended. Our flagship multimodal reasoning model. Building on the high-throughput reasoning and tool-calling capabilities of `step-3.5-flash`, it adds **native multimodal input** — understanding images and videos directly, without an additional vision MCP or auxiliary model. Powered by a 198B-parameter / 11B-activation sparse MoE architecture and offering three reasoning effort levels (low / medium / high). A fast and dependable model for agent, coding, and multimodal workloads. 256K context. [View detailed documentation →](/en/guides/models/step-3.7-flash) ### Step 3.5 Flash Text-only reasoning. Our flagship language reasoning model. It delivers top-tier reasoning quality and fast, reliable execution — decomposing and planning complex tasks, and reliably orchestrating tool calls. Suitable for logical reasoning, math, software engineering, deep research, and other complex workloads. 256K context. ## Context length Context length is how much input a model can "look back" and consider when generating a response. A longer context lets the model use more history, improving coherence and accuracy. The limit applies to both input and output — total tokens (not characters) cannot exceed the model's context window. | Model | Context length | | -------------- | -------------- | | Step 3.7 Flash | 256K | | Step 3.5 Flash | 256K | ## Quickstart See recommended prompting and usage patterns for complex reasoning workloads. Switch existing OpenAI-compatible integrations to Stepfun with minimal code changes. Store message history and pass context back to the model for continuous dialogue. Return machine-parseable JSON so model output can plug into application logic. Stream tokens to the UI as they are generated for a faster perceived response. Let the model invoke tools and external systems to complete real tasks. Reuse repeated context to reduce cost and improve latency in repeated requests.