Audio and Speech: Converting Audio to Text
Guide to using Friendli’s Audio and Speech feature for audio analysis and transcription. Covers usage via Playground and API (URL & Base64 examples).
Friendli provides audio and speech features through Friendli Dedicated Endpoints, allowing you to convert audio files to text and perform various AI tasks. This guide explains how to use these features with examples for both the Playground and API interfaces.
You can find the full list of available models here.
ASR - /v1/audio/transcriptions
Our ASR (Automatic Speech Recognition) service is designed for efficient audio transcription.
By default, audio input is limited to 30 seconds. If you require support for longer audio inputs, please contact us.
API Usage Example
Supported Models
We support a variety of powerful ASR models, including:
- openai/whisper-large-v3-turbo
- openai/whisper-large-v3
- openai/whisper-small
- …and many more.
Audio Modality - /v1/chat/completions
The audio modality endpoint allows you to combine audio and text inputs, enabling advanced AI tasks.
This endpoint is ideal for:
- Complex audio and text analysis
- Conversational AI
- Tasks requiring diverse inference, such as summarization, sentiment analysis, and question answering.
By default, audio input is limited to 10 seconds. If you require support for longer audio inputs, please contact us.
Supported Models
We offer a range of multi-modal models, including:
- fixie-ai/ultravox-v0_5-llama-3_3-70b
- fixie-ai/ultravox-v0_5-llama-3_1-8b
- Qwen/Qwen2-Audio-7B-Instruct
- openbmb/MiniCPM-o-2_6
- …and many more.
Supported Audio Formats
Our platform supports a wide range of audio formats compatible with the librosa library, ensuring broad compatibility for your applications. Supported formats include:
- MP3 (.mp3)
- WAV (.wav)
- FLAC (.flac)
- OGG (.ogg)
- And many other standard audio formats