Given an audio file, the model transcribes it into text.
stream option is set to true), the response is in MIME type text/event-stream. Otherwise, the content type is application/json.
You can view the schema of the streamed sequence of chunk objects in streaming mode here.
ID of team to run requests as (optional parameter).
Code of the model to use. See available model list.
"openai/whisper-large-v3"
The audio file object (not file name) to transcribe, in one of these formats: mp3, wav, flac, ogg, and many other standard audio formats.
Controls how the audio is cut into chunks. When set to "auto", the server first normalizes loudness and then uses voice activity detection (VAD) to choose boundaries. server_vad object can be provided to tweak VAD detection parameters manually. If unset, the audio is transcribed as a single block.
"auto"Whether to stream the transcription result. When set to true, the transcription result will be streamed as server-sent events once generated.
The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.