Step Image Edit 2 - StepFun Documentation

StepFun’s latest lightweight editing model. A single model supports both text-to-image and image editing. Within the under-6B parameter range, it sets a performance benchmark in its tier and competes cross-tier with 12B-20B open-source large models. Each editing task takes only 1-2 seconds, redefining real-time interactive image editing.

Showcase

See official sample prompts and generated results.

API quickstart

View minimal runnable curl examples.

Key information

Parameters

Under 6B
Lightweight generation and editing

Prompt length

512 characters

Input image limit

4096x4096
(image editing scenarios)

Core capabilities

🏆 Benchmark performance at lightweight scale

Focused on maximizing performance within the under-6B parameter range. Demonstrates exceptional editing capability, currently the strongest image-editing model at this parameter level.

🚀 High intelligence density and cross-tier superiority

Optimized architecture for parameter efficiency. With a smaller footprint, it surpasses 12B-20B open-source large models cross-tier. In general editing and reference editing, it matches top-tier closed-source domestic models.

⚡ Sub-second response, real-time interaction

Deep architectural optimization yields a qualitative leap in inference speed: 1-2 seconds per editing task. This near-zero latency removes the long-standing bottleneck of large models in real-time interactive image editing.

API endpoints

Text-to-image

POST /v1/images/generations
Generate an image from a prompt.

Image editing

POST /v1/images/edits
Modify an image based on input image and prompt.

Pricing

Item	Price
Text-to-image / Image editing	$0.003 / image

View full pricing details →

Quickstart

Text-to-image (curl)
Image editing (curl)

curl https://api.stepfun.ai/v1/images/generations \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $STEP_API_KEY" \
  -d '{
    "model": "step-image-edit-2",
    "prompt": "A serene alpine lake at sunset, mirror reflection, photorealistic",
    "response_format": "b64_json",
    "cfg_scale": 1.0,
    "steps": 8,
    "seed": 1,
    "text_mode": true
  }'

Supported sizes for text-to-image: 1024x1024, 768x1360, 896x1184, 1360x768, 1184x896 (format is height x width).

curl -X POST "https://api.stepfun.ai/v1/images/edits" \
  -H "Authorization: Bearer $STEP_API_KEY" \
  -F 'model=step-image-edit-2' \
  -F 'image=@input.webp' \
  -F 'prompt=Make the character ride a bicycle, holding a sign that says "Saudi Arabia"' \
  -F 'response_format=b64_json' \
  -F 'cfg_scale=1.0' \
  -F 'steps=8' \
  -F 'seed=1' \
  -F 'text_mode=true'

For image editing, the result image is returned at the same size as the input image.

text_mode is an optimization strategy for text-rendering scenarios; off by default, enable as needed. When cfg_scale = 1.0, negative_prompt is not passed to the underlying model.

Showcase

Text-to-image

Wide-angle landscape photography

Prompt: A breathtaking wide-angle landscape photograph. A crystal-clear alpine lake serves as a perfect mirror. Towering snow-capped mountains pierce an azure sky dotted with wisps of cloud. On the lakeside slopes, autumn aspens display splendid golden hues. The water reflects mountains, trees and sky with sharp clarity. Hyperrealistic, 8K resolution, National Geographic style, vivid colors, serene atmosphere.

Cinematic portrait

Prompt: High-angle cinematic portrait, a young girl standing at a vintage cobblestone street corner during sunset. Long brown hair lifted by the evening breeze, wearing a vintage white floral puff-sleeve dress. Visible details on facial features, hair strands and fabric texture. Subject-centered composition with softly blurred background of old buildings, wooden windows and warm-toned walls as ambiance. Golden-hour sunlight wraps her side profile and shoulders; hair tips and dress edges glow with a faint golden halo. Nostalgic, innocent, quiet summer-evening mood. 35mm film, Kodak Portra 400 tones, soft bloom, shallow depth of field, cinematic portrait, photorealistic.

Minimalist still life photography

Prompt: A minimalist still-life photograph. Centered: a deep-blue glossy glass vase holding a bouquet of vibrant golden tulips. The vase rests on a textured pale-white tabletop against a pure white wall. Strong afternoon sunlight streams in from the side, casting clear, elongated, artistic shadows of leaves and flowers on the wall. High contrast of light and shadow, pure colors, tranquil and warm atmosphere. High quality, 8K resolution.

Classical oil-painting still life

Prompt: A classical oil-painting still life, Dutch Golden Age style. Center: a transparent glass vase filled with colorful tulips, lilies and purple flowers (orange, pink, purple). Beside it, a white ceramic teapot and exquisite teacups. The dark wooden foreground table is draped with a flowing green-and-white striped silk cloth. Cut lemons, lemon peel, several cups of tea and scattered petals on the tabletop. Background: a deep textured wall, with a butterfly fluttering near the flowers. Soft, delicate light and shadow, rich detail, heavy oil-painting brushstrokes.

Image editing

Pose change / dialogue bubble

Prompt: Make the cat lie on its back showing its belly. Add a dialogue bubble next to it that says “I was wrong”.

Input	Result

Outfit replacement

Prompt: Change the man into a suit and shirt, and the woman into a beautiful Western wedding dress with a veil. Both face the camera directly.

Input	Result

Showcase

API quickstart

​Key information

Parameters

Prompt length

Input image limit

​Core capabilities

🏆 Benchmark performance at lightweight scale

🚀 High intelligence density and cross-tier superiority

⚡ Sub-second response, real-time interaction

​API endpoints

Text-to-image

Image editing

​Pricing

​Quickstart

​Showcase

​Text-to-image

​Image editing

Key information

Core capabilities

API endpoints

Pricing

Quickstart

Showcase

Text-to-image

Image editing