Skip to main content

Step 3.5 Flash

step-3.5-flash is our flagship reasoning model, designed for high-complexity tasks requiring deep logic and fast execution. It features:
  • Mixture of Experts Architecture (MoE): Combines a 196B parameter knowledge base with sparse activation (activating around 11B parameters per token) to deliver the logical depth of ultra-large models while keeping inference fast.
  • 256K Long Context: Maintains logical consistency when processing massive datasets or long documents, making it ideal for multi-stage reasoning and research workflows.
  • Native Agent Capabilities: Excels at tool call orchestration, multi-step problem decomposition, and long-context agent development, making it the preferred foundation for engineering and automation workloads.
  • Extreme Efficiency: Optimized for production throughput and cost-effective deployment without compromising on cutting-edge reasoning performance.

Chat Completion Example

The following code demonstrates how to use the step-3.5-flash model for logical reasoning.
copy
import time
from openai import OpenAI

# Set your API Key and Base URL
BASE_URL = "https://api.stepfun.ai/v1"
STEP_API_KEY = "YOUR_STEPFUN_API_KEY"

# Select Model
COMPLETION_MODEL = "step-3.5-flash"

# User Prompt
user_prompt = "How many 'r's are in the word strawberry?"

client = OpenAI(api_key=STEP_API_KEY, base_url=BASE_URL)

time_start = time.time()

try:
    response = client.chat.completions.create(
        model=COMPLETION_MODEL,
        messages=[
            {"role": "user", "content": user_prompt}
        ],
        stream=True
    )
except Exception as e:
    print("Exception occurred when requesting API:", e)
    exit(1)

print("Reasoning Process:")
try:
    for chunk in response:
        # Check for reasoning content
        if hasattr(chunk.choices[0].delta, 'reasoning') and chunk.choices[0].delta.reasoning:
             print(chunk.choices[0].delta.reasoning, end='', flush=True)
        # Check for standard content
        elif chunk.choices[0].delta.content:
             print(chunk.choices[0].delta.content, end='', flush=True)

except Exception as e:
    print("\\nError occurred while processing streaming results:", e)

time_end = time.time()
print(f"\\n\\nTotal generation time: {time_end - time_start:.2f} seconds")
For input parameter details, please refer to the Chat Completion Documentation

Obtaining Reasoning Content

When StepFun’s reasoning models handle complex problems, they include a reasoning field in the output to display the model’s thinking process. Developers can check for the existence of this field to obtain the model’s thinking information.
if chunk.choices[0].delta.reasoning:
    reasoning = chunk.choices[0].delta.reasoning
    print("Model thinking process:", reasoning)
For non-streaming scenarios, you can directly extract the reasoning field to get the model’s thinking process.
msg  = completion.choices[0].message.content
reasoning = completion.choices[0].message.reasoning

Notes

  • JSON Mode Limitation: The current version does not temporarily support JSON mode.
  • Error Handling and Logging: A Trace ID is added to model outputs. Please include this ID when reporting any issues with reasoning behavior.