Reasoning Model Development Guide
Step 3.5 Flash
step-3.5-flash is our flagship reasoning model, designed for high-complexity tasks requiring deep logic and fast execution. It features:
- Mixture of Experts Architecture (MoE): Combines a 196B parameter knowledge base with sparse activation (activating around 11B parameters per token) to deliver the logical depth of ultra-large models while keeping inference fast.
- 256K Long Context: Maintains logical consistency when processing massive datasets or long documents, making it ideal for multi-stage reasoning and research workflows.
- Native Agent Capabilities: Excels at tool call orchestration, multi-step problem decomposition, and long-context agent development, making it the preferred foundation for engineering and automation workloads.
- Extreme Efficiency: Optimized for production throughput and cost-effective deployment without compromising on cutting-edge reasoning performance.
Chat Completion Example
The following code demonstrates how to use the step-3.5-flash model for logical reasoning.
import time
from openai import OpenAI
# Set your API Key and Base URL
BASE_URL = "https://api.stepfun.com/v1"
STEP_API_KEY = "YOUR_STEPFUN_API_KEY"
# Select Model
COMPLETION_MODEL = "step-3.5-flash"
# User Prompt
user_prompt = "How many 'r's are in the word strawberry?"
client = OpenAI(api_key=STEP_API_KEY, base_url=BASE_URL)
time_start = time.time()
try:
response = client.chat.completions.create(
model=COMPLETION_MODEL,
messages=[
{"role": "user", "content": user_prompt}
],
stream=True
)
except Exception as e:
print("Exception occurred when requesting API:", e)
exit(1)
print("Reasoning Process:")
try:
for chunk in response:
# Check for reasoning content
if hasattr(chunk.choices[0].delta, 'reasoning') and chunk.choices[0].delta.reasoning:
print(chunk.choices[0].delta.reasoning, end='', flush=True)
# Check for standard content
elif chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end='', flush=True)
except Exception as e:
print("\\nError occurred while processing streaming results:", e)
time_end = time.time()
print(f"\\n\\nTotal generation time: {time_end - time_start:.2f} seconds")For input parameter details, please refer to the Chat Completion Documentation
Obtaining Reasoning Content
When StepFun’s reasoning models handle complex problems, they include a reasoning field in the output to display the model’s thinking process. Developers can check for the existence of this field to obtain the model’s thinking information.
if chunk.choices[0].delta.reasoning:
reasoning = chunk.choices[0].delta.reasoning
print("Model thinking process:", reasoning)For non-streaming scenarios, you can directly extract the reasoning field to get the model’s thinking process.
msg = completion.choices[0].message.content
reasoning = completion.choices[0].message.reasoningNotes
- JSON Mode Limitation: The current version does not temporarily support JSON mode.
- Error Handling and Logging: A Trace ID is added to model outputs. Please include this ID when reporting any issues with reasoning behavior.
Last updated on