Speech to text

Endpoint
Request parameters
Response
Example

Upload an audio file and get back the corresponding transcript.

Endpoint

POST https://api.stepfun.ai/v1/audio/transcriptions

Request parameters

model string required
Model name, fixed to step-asr.
response_format string required
Output format. Supported: json, text, srt, vtt.
file File required
Audio file. Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm, aac, opus.
File size must be under 100 MB.
hotwords string optional
Hotword list as a JSON string that parses to an array, e.g., ["1","2","3","4","abc"].

Response

Returns the transcription in the requested response_format.

json
text
srt
vtt

text string
Recognized text.

{ "text": "Test recording" }

Test recording

1
00:00:00,000 --> 00:00:03,240
Test recording

WEBVTT

00:00:00,000 --> 00:00:03,240
Test recording

Example

curl

curl -L 'https://api.stepfun.ai/v1/audio/transcriptions' \
-H "Authorization: Bearer $STEP_API_KEY" \
-F 'model="step-asr"' \
-F 'response_format="json"' \
-F 'file=@"sample.mp3"'

Voice cloning Streaming Text-to-Speech

Chat

Realtime

Audio

Models

Files

Account

Tool Call

Token Count

Error Codes

Endpoint

Request parameters

Response

Example

Chat

Realtime

Audio

Models

Files

Account

Tool Call

Token Count

Error Codes

​Endpoint

​Request parameters

​Response

​Example

Endpoint

Request parameters

Response

Example