Overview - StepFun Documentation

StepFun’s flagship multimodal reasoning model. Powered by a 198B-parameter / 11B-activation sparse MoE architecture, with native support for image and video understanding.

Key facts

Model type

Sparse MoE architecture
198B total params / 11B activated params

Context length

256K tokens

Best for

High-throughput reasoning + native multimodal
Optimized for agent and coding workloads

Core capabilities

👁️ Native multimodal

Native support for image and video understanding. Drop a file straight into the chat — no separate vision model required inside your Agent framework.

🚀 High-throughput reasoning

Sparse MoE architecture delivers high throughput and low latency, ideal for real-time agent workflows and high-volume calls.

🛠️ Tool calling

Reliable tools / tool_choice orchestration, supports multi-step task decomposition and plan execution.

🧠 Complex reasoning

Handles logical reasoning, math, software engineering, and deep research — a dependable foundation for long-chain agent reasoning.

Reasoning effort

step-3.7-flash supports three reasoning effort levels — pick one to match task complexity:

Effort	Best for
`low`	Simple Q&A, summarization, rewriting, information extraction
`medium`	Default. General reasoning and multi-step tasks
`high`	Complex reasoning, math, planning, code analysis

The Chat Completions API uses reasoning_effort to control the effort level; the Messages API uses output_config.effort. See the Quickstart for full call examples.

Get started

Multimodal quickstart

Get started with images, video, local files, and reasoning-effort control.

Cookbook

Task templates for whiteboard-to-plan, chart-to-data, receipt-to-table, and more.

Mobile Agent

Connect to a real Android device via GELab-Zero and let the model plan operations from screenshots.

Chat Completion

POST /v1/chat/completions
OpenAI-compatible, with streaming and tool calling.

Pricing

Item	Price (per million tokens)
Input (cache hit)	$0.04
Input (cache miss)	$0.20
Output	$1.15

Framework support

step-3.7-flash plugs reliably into popular Coding and Agent tools, well-suited to code generation, file editing, and complex task coordination in the terminal, IDE, or Agent workflows. View Step Plan integration guide →

Reasoning model guide

Recommended usage of reasoning models for complex tasks, tool calling, and long contexts.

Image understanding best practices

A deeper look at image understanding API params, the detail setting, and best practices.

Video understanding best practices

A deeper look at video understanding API params, file limits, and common pitfalls.

​Key facts

Model type

Context length

Best for

​Core capabilities

👁️ Native multimodal

🚀 High-throughput reasoning

🛠️ Tool calling

🧠 Complex reasoning

​Reasoning effort

​Get started

Multimodal quickstart

Cookbook

Mobile Agent

Chat Completion

​Pricing

​Framework support

​Related reading

Reasoning model guide

Image understanding best practices

Video understanding best practices

Key facts

Core capabilities

Reasoning effort

Get started

Pricing

Framework support

Related reading