Multi‑modality

Friendli supports multimodal workflows across text, image, audio, and video.
Use the comprehensive guides below to get started with each modality.

Image Generation - Generate images from text prompts
Vision (Image Understanding) - Analyze and understand images
Video Understanding - Process and analyze video content
Audio and Speech - Convert audio to text and analyze audio

Image Generation

Transform text prompts into high-quality visuals with Friendli’s image generation capabilities.

Representative Models

We support various trending image generation models including:

API Usage

curl -L -X POST "https://api.friendli.ai/dedicated/v1/images/generations" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $FRIENDLI_TOKEN" \
  --data-raw '{
    "model": "YOUR_ENDPOINT_ID",
    "prompt": "An orange Lamborghini driving down a hill road at night with a beautiful ocean view in the background.",
    "num_inference_steps": 10,
    "guidance_scale": 3.5
  }'

guidance_scale is required when using Friendli Container. For more detail, please refer to the Container API Reference.

Vision (Image Understanding)

Analyze and understand images using Friendli’s vision capabilities.

Representative Models

We support various trending vision models including:

Qwen2.5-VL
InternVL3
See all vision models

Supported Image Formats

Supports formats supported by the PIL library:

JPEG (.jpeg and .jpg)
PNG (.png)
AVIF (.avif)

API Usage

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.friendli.ai/dedicated/v1",
    api_key=os.environ.get("FRIENDLI_TOKEN"),
)

image_url = "https://upload.wikimedia.org/wikipedia/commons/9/9e/Ours_brun_parcanimalierpyrenees_1.jpg"

completion = client.chat.completions.create(
    model="YOUR_ENDPOINT_ID",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What kind of animal is shown in the image?",
                },
                {"type": "image_url", "image_url": {"url": image_url}},
            ],
        },
    ],
    stream=False
)

print(completion.choices[0].message.content)

Video Understanding

Process and analyze video content with Friendli’s video understanding capabilities.

Representative Models

We support various video understanding models including:

Qwen2.5-VL
See all video models

Video Requirements

Videos must be hosted at publicly accessible URLs
HTTPS URLs are recommended for security
Consider video file size and processing time implications
Some models may have specific resolution or duration requirements

API Usage

By default, video fetching timeout is 30 seconds. To increase the timeout value, please contact us.

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.friendli.ai/dedicated/v1",
    api_key=os.environ.get("FRIENDLI_TOKEN"),
)

video_url = "http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerFun.mp4"

completion = client.chat.completions.create(
    model="YOUR_ENDPOINT_ID",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What's in this video?",
                },
                {
                    "type": "video_url",
                    "video_url": {"url": video_url},
                },
            ],
        },
    ],
    temperature=0,
    max_tokens=100,
)

print(completion.choices[0].message.content)

Audio and Speech

Convert audio files to text and perform various AI tasks with Friendli’s audio capabilities.

Representative Models

We support various trending audio models including:

Whisper Large V3
Qwen2-Audio
Ultravox
See all audio models

Supported Audio Formats

Our platform supports a wide range of audio formats compatible with the librosa library:

MP3 (.mp3)
WAV (.wav)
FLAC (.flac)
OGG (.ogg)
And many other standard audio formats

API Usage

By default, audio input is limited to 30 seconds. To enable longer audio inputs, please contact us.

curl -X POST https://api.friendli.ai/dedicated/v1/audio/transcriptions \
  -H "Authorization: Bearer $FRIENDLI_TOKEN" \
  -H 'Content-Type: multipart/form-data' \
  -F file=@/path/to/audio/file.mp3 \
  -F model="YOUR_ENDPOINT_ID"

API References

For detailed API specifications, refer to:

Get Started

Capabilities

Friendli Dedicated Endpoints

Friendli Serverless Endpoints

Friendli Container

Quick Navigation

Image Generation

Representative Models

API Usage

Vision (Image Understanding)

Representative Models

Supported Image Formats

API Usage

Video Understanding

Representative Models

Video Requirements

API Usage

Audio and Speech

Representative Models

Supported Audio Formats

API Usage

API References

Get Started

Capabilities

Friendli Dedicated Endpoints

Friendli Serverless Endpoints

Friendli Container

​Quick Navigation

​Image Generation

​Representative Models

​API Usage

​Vision (Image Understanding)

​Representative Models

​Supported Image Formats

​API Usage

​Video Understanding

​Representative Models

​Video Requirements

​API Usage

​Audio and Speech

​Representative Models

​Supported Audio Formats

​API Usage

​API References

Quick Navigation

Image Generation

Representative Models

API Usage

Vision (Image Understanding)

Representative Models

Supported Image Formats

API Usage

Video Understanding

Representative Models

Video Requirements

API Usage

Audio and Speech

Representative Models

Supported Audio Formats

API Usage

API References