Skip to main content

Documentation Index

Fetch the complete documentation index at: https://platform.stepfun.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

StepFunโ€™s flagship multimodal reasoning model. Powered by a 198B-parameter / 11B-activation sparse MoE architecture, with native support for image and video understanding.

Key facts

Model type

Sparse MoE architecture
198B total params / 11B activated params

Context length

256K tokens

Best for

High-throughput reasoning + native multimodal
Optimized for agent and coding workloads

Core capabilities

๐Ÿ‘๏ธ Native multimodal

Native support for image and video understanding. Drop a file straight into the chat โ€” no separate vision model required inside your Agent framework.

๐Ÿš€ High-throughput reasoning

Sparse MoE architecture delivers high throughput and low latency, ideal for real-time agent workflows and high-volume calls.

๐Ÿ› ๏ธ Tool calling

Reliable tools / tool_choice orchestration, supports multi-step task decomposition and plan execution.

๐Ÿง  Complex reasoning

Handles logical reasoning, math, software engineering, and deep research โ€” a dependable foundation for long-chain agent reasoning.

Reasoning effort

step-3.7-flash supports three reasoning effort levels โ€” pick one to match task complexity:
EffortBest for
lowSimple Q&A, summarization, rewriting, information extraction
mediumDefault. General reasoning and multi-step tasks
highComplex reasoning, math, planning, code analysis
The Chat Completions API uses reasoning_effort to control the effort level; the Messages API uses output_config.effort. See the Quickstart for full call examples.

Get started

Multimodal quickstart

Get started with images, video, local files, and reasoning-effort control.

Cookbook

Task templates for whiteboard-to-plan, chart-to-data, receipt-to-table, and more.

Mobile Agent

Connect to a real Android device via GELab-Zero and let the model plan operations from screenshots.

Chat Completion

POST /v1/chat/completions
OpenAI-compatible, with streaming and tool calling.

Pricing

ItemPrice (per million tokens)
Input (cache hit)$0.04
Input (cache miss)$0.20
Output$1.15

Framework support

step-3.7-flash plugs reliably into popular Coding and Agent tools, well-suited to code generation, file editing, and complex task coordination in the terminal, IDE, or Agent workflows. View Step Plan integration guide โ†’

Reasoning model guide

Recommended usage of reasoning models for complex tasks, tool calling, and long contexts.

Image understanding best practices

A deeper look at image understanding API params, the detail setting, and best practices.

Video understanding best practices

A deeper look at video understanding API params, file limits, and common pitfalls.