Documentation Index
Fetch the complete documentation index at: https://platform.stepfun.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
step-3.7-flash’s multimodal understanding works well for mobile GUI Agent scenarios: the model reads screenshots, task descriptions, and history, decides the next action, and then the runtime executes that action on a real Android device.
Today we recommend GELab-Zero for running this end-to-end. GELab-Zero is a GUI Agent runtime for Android — it handles device connection, screenshot capture, multimodal model calls, action execution, and logging.
Mobile Agent involves a real device, ADB, model calls, and a local runtime — best treated as an advanced scenario. For your first run, get the Multimodal Quickstart working first to confirm your API key and model calls are healthy.
How it works
GELab-Zero strings model decisions and device actions into a complete loop:- You give the Agent a natural-language task.
- Your computer connects to the Android phone via ADB and captures a screenshot.
- The runtime sends the current screenshot, action history, and task description to the multimodal model.
- The model outputs the next action — e.g.
AWAKE,CLICK,TYPE,SLIDE. - The runtime executes the action on the phone.
- Every step’s screenshot, action, and model output is logged.
- After the task completes, you can replay the full run from the visualization UI by
Session ID.
Prerequisites
You need:- An Android phone with Developer Mode and USB Debugging enabled
- ADB / platform-tools
- Python 3.12+
- The GELab-Zero repo and its dependencies
- A valid Step API key
unauthorized, accept the USB debugging prompt on the phone.
Install GELab-Zero
Configure the model service
GELab-Zero reads model configuration frommodel_config.yaml and makes requests via the OpenAI-compatible interface. First, configure the Step API in model_config.yaml:
examples/run_single_task_state_compress.py, point the model provider at stepfun and use step-3.7-flash:
The GELab-Zero minimal-run guide recommends keeping
temperature at 1 — don’t change it to 0.1 or 0.5. For long-horizon tasks, enable state compression to prevent the history context from growing unbounded.Run a mobile task
Confirm the phone is still online:Session ID, per-step duration, current action, and total elapsed time. Task logs are written by default to:
running_log/server_log/os-copilot-local-eval-logs/tracesrunning_log/server_log/os-copilot-local-eval-logs/images
Review the run
GELab-Zero ships a local visualization UI for inspecting each step’s screenshot, model thinking, and action result:Session ID from the task terminal and paste it into the page’s input box to replay the entire run.
Reference
GELab-Zero minimal-run guide
Android device setup, State Compress entry points, recommended parameters, and visualization usage.