Step 3.5 Flash - StepFun Documentation

step-3.5-flash is StepFun’s flagship language reasoning model. It delivers top-tier reasoning quality and fast, reliable execution — decomposing and planning complex tasks, and reliably orchestrating tool calls. Suitable for logical reasoning, math, software engineering, deep research, and other complex workloads. Built on a 196B-parameter / 11B-activation sparse MoE architecture.

Key facts

Model type

Sparse MoE architecture
196B total params / 11B activated params

Context length

256K tokens

Best for

High-throughput reasoning + tool calling
Optimized for agent and coding workloads

Core capabilities

🚀 High-throughput reasoning

Sparse MoE architecture delivers high throughput and low latency, ideal for real-time agent workflows and high-volume calls.

🛠️ Tool calling

Reliable tools / tool_choice orchestration, supports multi-step task decomposition and plan execution.

🧠 Complex reasoning

Handles logical reasoning, math, software engineering, and deep research — a dependable foundation for long-chain agent reasoning.

Model variants

step-3.5-flash

Base version: general-purpose reasoning and tool calling — well-suited to most Agent and complex-task scenarios.

step-3.5-flash-2603

Agent-optimized version: tuned from step-3.5-flash for high-frequency agent scenarios. Better token efficiency and faster inference, with an optional low-reasoning mode that significantly reduces token consumption. Improved compatibility with coding workflows and agent frameworks. Supports the reasoning_effort field (low / high).

API endpoint

Chat Completion

POST /v1/chat/completions
OpenAI-compatible, with streaming and tool calling.

Pricing

Item	Price (per million tokens)
Input (cache miss)	$0.10
Input (cache hit)	$0.02
Output	$0.30

See full pricing details →

Quickstart

curl
Python (OpenAI SDK)

curl https://api.stepfun.ai/v1/chat/completions \
  -H "Authorization: Bearer $STEP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "step-3.5-flash",
    "messages": [
      {"role": "user", "content": "Hello, please introduce yourself."}
    ]
  }'

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_STEP_API_KEY",
    base_url="https://api.stepfun.ai/v1",
)

response = client.chat.completions.create(
    model="step-3.5-flash",
    messages=[
        {"role": "user", "content": "Hello, please introduce yourself."}
    ],
)

print(response.choices[0].message.content)

Reasoning model guide

Recommended usage of reasoning models for complex tasks, tool calling, and long contexts.

Step 3.7 Flash — multimodal flagship

Building on Step 3.5 Flash with native image and video understanding.

​Key facts