Serverless audio transcriptions

curl --request POST \ --url https://api.friendli.ai/serverless/v1/audio/transcriptions \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: multipart/form-data' \ --form file='@example-file' \ --form model=openai/whisper-large-v3

{ "text": "Hello, how are you?", "usage": { "type": "tokens", "input_tokens": 20, "output_tokens": 10, "total_tokens": 30, "input_audio_length_ms": 18000, "processed_audio_length_ms": 24000, "input_token_details": { "audio_tokens": 10, "text_tokens": 10 } } }

Authorizations

Authorization

string

header

required

When using Friendli Suite API for inference requests, you need to provide a Friendli Token for authentication and authorization purposes.

For more detailed information, please refer here.

Headers

X-Friendli-Team

string | null

ID of team to run requests as (optional parameter).

Body

multipart/form-data

model

string

required

Code of the model to use. See available model list.

Example:

"openai/whisper-large-v3"

file

required

The audio file object (not file name) to transcribe, in one of these formats: mp3, wav, flac, ogg, and many other standard audio formats.

chunking_strategy

Controls how the audio is cut into chunks. When set to "auto", the server first normalizes loudness and then uses voice activity detection (VAD) to choose boundaries. server_vad object can be provided to tweak VAD detection parameters manually. If unset, the audio is transcribed as a single block.

Allowed value: "auto"

language

string | null

The language of the input audio. Supplying the input language in ISO-639-1 (e.g. en) format will improve accuracy and latency.

stream

boolean | null

Whether to stream the transcription result. When set to true, the transcription result will be streamed as server-sent events once generated.

temperature

number | null

The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

Response

Successfully transcribed the audio file.

text

string

required

The transcribed text.

usage

AudioTranscriptionUsage · object

required

Hide child attributes

usage.type

string

required

The type of the usage object. Always tokens for this variant.

Allowed value: "tokens"

usage.input_tokens

integer

required

Number of input tokens billed for this request.

usage.output_tokens

integer

required

Number of output tokens generated.

usage.total_tokens

integer

required

Total number of tokens used (input + output).

usage.input_audio_length_ms

integer

required

The length of the input audio in milliseconds.

usage.processed_audio_length_ms

integer

required

The length of the processed audio in milliseconds.

usage.input_token_details

AudioTranscriptionInputTokenDetails · object

Details about the input tokens billed for this request.

Hide child attributes

usage.input_token_details.audio_tokens

integer

required

Number of audio tokens billed for this request.

usage.input_token_details.text_tokens

integer

required

Number of text tokens billed for this request.

API Reference

Dedicated

Serverless

Container

Dataset & File

Friendli SDK

Serverless audio transcriptions

Authorizations

Headers

Body

Response

API Reference

Dedicated

Serverless

Container

Dataset & File

Friendli SDK

Documentation Index

Authorizations

Headers

Body

Response