Use the comprehensive guides below to get started with each modality.
Quick Navigation
- Image Generation - Generate images from text prompts
- Vision (Image Understanding) - Analyze and understand images
- Video Understanding - Process and analyze video content
- Audio and Speech - Convert audio to text and analyze audio
Image Generation
Transform text prompts into high-quality visuals with Friendli’s image generation capabilities.Representative Models
We support various trending image generation models including:API Usage
guidance_scale
is required when using Friendli Container. For more detail, please refer to the Container API Reference.Vision (Image Understanding)
Analyze and understand images using Friendli’s vision capabilities.Representative Models
We support various trending vision models including:Supported Image Formats
Supports formats supported by the PIL library:- JPEG (.jpeg and .jpg)
- PNG (.png)
- AVIF (.avif)
API Usage
Video Understanding
Process and analyze video content with Friendli’s video understanding capabilities.Representative Models
We support various video understanding models including:Video Requirements
- Videos must be hosted at publicly accessible URLs
- HTTPS URLs are recommended for security
- Consider video file size and processing time implications
- Some models may have specific resolution or duration requirements
API Usage
By default, video fetching timeout is 30 seconds. To increase the timeout value, please contact us.Audio and Speech
Convert audio files to text and perform various AI tasks with Friendli’s audio capabilities.Representative Models
We support various trending audio models including:Supported Audio Formats
Our platform supports a wide range of audio formats compatible with the librosa library:- MP3 (.mp3)
- WAV (.wav)
- FLAC (.flac)
- OGG (.ogg)
- And many other standard audio formats