Documentation Index
Fetch the complete documentation index at: https://platform.stepfun.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Model overview
Our text-to-image models generate high-quality, diverse images from text prompts or other inputs. They are widely applicable for art, design, game development, and beyond. We currently provide the step-1x and step-2x model series:Models
step-2x-large
Our new-generation image model focused on text-to-image generation. Produces more realistic textures and stronger Chinese/English text rendering.step-1x-edit
Specialized for image editing. Takes images plus text instructions to modify and enhance results. Supports text prompts and reference images, understands intent, and produces edits that match requirements.step-1x-medium
A strong text-to-image generator with native Chinese support for better semantic understanding of Chinese prompts. Generates high-resolution, high-quality images with style-transfer capability.Key terms
- Image resolution: Pixel width/height of the output. Higher resolution gives more detail but increases generation time and compute.
- Image style: Visual characteristics such as realistic, abstract, cartoon, etc.
- Prompt/description: The text or reference image describing what to generate. More precise descriptions yield outputs closer to expectations.
- Model parameters: Larger models capture more detail and produce higher-quality results. The step-1x line offers a 2B-parameter model.
Usage limits
- Supported input: Natural-language descriptions of desired content and style.
- Images per request: step-1x models allow up to 1 image per request.
- Resolution limits: Squares:
256x256,512x512,768x768,1024x1024; 16:9:1280x800,800x1280. - Generation time: Varies with prompt complexity and model throughput.
- Quality: Results depend on prompt specificity and training data; multiple attempts may be needed for the best output.
- Copyright and usage: You own generated images but must not use them for illegal purposes or rights violations. Models are evolving, so evaluate and adjust for your scenario.