Friendli supports multimodal workflows across text, image, audio, and video.
Use the comprehensive guides below to get started with each modality.

Quick Navigation

Image Generation

Transform text prompts into high-quality visuals with Friendli’s image generation capabilities.

Representative Models

We support various trending image generation models including:

API Usage

curl -L -X POST "https://api.friendli.ai/dedicated/v1/images/generations" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $FRIENDLI_TOKEN" \
  --data-raw '{
    "model": "YOUR_ENDPOINT_ID",
    "prompt": "An orange Lamborghini driving down a hill road at night with a beautiful ocean view in the background.",
    "num_inference_steps": 10,
    "guidance_scale": 3.5
  }'
guidance_scale is required when using Friendli Container. For more detail, please refer to the Container API Reference.

Vision (Image Understanding)

Analyze and understand images using Friendli’s vision capabilities.

Representative Models

We support various trending vision models including:

Supported Image Formats

Supports formats supported by the PIL library:
  • JPEG (.jpeg and .jpg)
  • PNG (.png)
  • AVIF (.avif)

API Usage

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.friendli.ai/dedicated/v1",
    api_key=os.environ.get("FRIENDLI_TOKEN"),
)

image_url = "https://upload.wikimedia.org/wikipedia/commons/9/9e/Ours_brun_parcanimalierpyrenees_1.jpg"

completion = client.chat.completions.create(
    model="YOUR_ENDPOINT_ID",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What kind of animal is shown in the image?",
                },
                {"type": "image_url", "image_url": {"url": image_url}},
            ],
        },
    ],
    stream=False
)

print(completion.choices[0].message.content)

Video Understanding

Process and analyze video content with Friendli’s video understanding capabilities.

Representative Models

We support various video understanding models including:

Video Requirements

  • Videos must be hosted at publicly accessible URLs
  • HTTPS URLs are recommended for security
  • Consider video file size and processing time implications
  • Some models may have specific resolution or duration requirements

API Usage

By default, video fetching timeout is 30 seconds. To increase the timeout value, please contact us.
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.friendli.ai/dedicated/v1",
    api_key=os.environ.get("FRIENDLI_TOKEN"),
)

video_url = "http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerFun.mp4"

completion = client.chat.completions.create(
    model="YOUR_ENDPOINT_ID",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What's in this video?",
                },
                {
                    "type": "video_url",
                    "video_url": {"url": video_url},
                },
            ],
        },
    ],
    temperature=0,
    max_tokens=100,
)

print(completion.choices[0].message.content)

Audio and Speech

Convert audio files to text and perform various AI tasks with Friendli’s audio capabilities.

Representative Models

We support various trending audio models including:

Supported Audio Formats

Our platform supports a wide range of audio formats compatible with the librosa library:
  • MP3 (.mp3)
  • WAV (.wav)
  • FLAC (.flac)
  • OGG (.ogg)
  • And many other standard audio formats

API Usage

By default, audio input is limited to 30 seconds. To enable longer audio inputs, please contact us.
curl -X POST https://api.friendli.ai/dedicated/v1/audio/transcriptions \
  -H "Authorization: Bearer $FRIENDLI_TOKEN" \
  -H 'Content-Type: multipart/form-data' \
  -F file=@/path/to/audio/file.mp3 \
  -F model="YOUR_ENDPOINT_ID"

API References

For detailed API specifications, refer to: