Dedicated audio transcriptions

Audio transcriptions

curl --request POST \
  --url https://api.friendli.ai/dedicated/v1/audio/transcriptions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form file='@example-file' \
  --form 'model=(endpoint-id)'

{
  "text": "Hello, how are you?",
  "usage": {
    "type": "tokens",
    "prompt_tokens": 20,
    "completion_tokens": 10,
    "total_tokens": 30,
    "input_token_details": {
      "audio_tokens": 10,
      "text_tokens": 10
    }
  }
}

POST

dedicated

audio

transcriptions

Audio transcriptions

curl --request POST \
  --url https://api.friendli.ai/dedicated/v1/audio/transcriptions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form file='@example-file' \
  --form 'model=(endpoint-id)'

{
  "text": "Hello, how are you?",
  "usage": {
    "type": "tokens",
    "prompt_tokens": 20,
    "completion_tokens": 10,
    "total_tokens": 30,
    "input_token_details": {
      "audio_tokens": 10,
      "text_tokens": 10
    }
  }
}

Given an audio file, the model transcribes it into text. To request successfully, it is mandatory to enter a Friendli Token (e.g. flp_XXX) value in the Bearer Token field. Refer to the authentication section on our introduction page to learn how to acquire this variable and visit here to generate your token.

Authorizations

Authorization

string

header

required

When using Friendli Suite API for inference requests, you need to provide a Friendli Token for authentication and authorization purposes.

For more detailed information, please refer here.

Headers

X-Friendli-Team

string | null

ID of team to run requests as (optional parameter).

Body

multipart/form-data

model

string

required

ID of target endpoint. If you want to send request to specific adapter, use the format "YOUR_ENDPOINT_ID:YOUR_ADAPTER_ROUTE". Otherwise, you can just use "YOUR_ENDPOINT_ID" alone.

Example:

"(endpoint-id)"

file

required

The audio file object (not file name) to transcribe, in one of these formats: mp3, wav, flac, ogg, and many other standard audio formats.

chunking_strategy

Controls how the audio is cut into chunks. When set to "auto", the server first normalizes loudness and then uses voice activity detection (VAD) to choose boundaries. server_vad object can be provided to tweak VAD detection parameters manually. If unset, the audio is transcribed as a single block.

Allowed value: "auto"

language

string | null

The language of the input audio. Supplying the input language in ISO-639-1 (e.g. en) format will improve accuracy and latency.

temperature

number | null

The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

Response

Successfully transcribed the audio file.

text

string

required

The transcribed text.

usage

AudioTranscriptionUsage · object

required

Show child attributes

Image edits Chat completions chunk object

⌘I

API Reference

Dedicated

Serverless

Container

Dataset & File

Friendli SDK

Dedicated audio transcriptions

Authorizations

Headers

Body

Response