Friendli Dedicated Endpoints, Serverless Endpoints, and Container are OpenAI-compatible.
Existing applications can migrate with minimal effort, still using the official OpenAI SDKs.

Specify the base URL and API key

Initialize the OpenAI client using Friendli’s base URL and your Friendli token (API key).
  • Serverless Endpoints: https://api.friendli.ai/serverless/v1.
  • Dedicated Endpoints: https://api.friendli.ai/dedicated/v1.
  • Container: your own container’s URL (e.g., http://HOST:PORT/v1).
Get your Friendli token in Friendli Suite → Settings → Tokens.
client = OpenAI(
    api_key=os.getenv("FRIENDLI_TOKEN"),
    base_url="https://api.friendli.ai/serverless/v1",
)

Usage

Choose any model available on Friendli Serverless Endpoints, Dedicated Endpoints, or Container.

Completions API

Generate text completions using a simple prompt-based approach.
from openai import OpenAI
import os

client = OpenAI(
    api_key=os.getenv("FRIENDLI_TOKEN"),
    base_url="https://api.friendli.ai/serverless/v1",
)

completion = client.completions.create(
    model="meta-llama-3.3-70b-instruct",
    prompt="Tell me a funny joke about programming.",
    max_tokens=100,
    temperature=0.7,
)
print(completion.choices[0].text)

Chat Completions API

Generate chat completions using a conversational message-based approach.
from openai import OpenAI
import os

client = OpenAI(
    api_key=os.getenv("FRIENDLI_TOKEN"),
    base_url="https://api.friendli.ai/serverless/v1",
)

completion = client.chat.completions.create(
    model="meta-llama-3.3-70b-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a funny joke."},
    ],
    stream=False,
)
print(completion.choices[0].message.content)

Streaming Mode

Receive responses in real-time, enabling better user experience for long responses.
from openai import OpenAI
import os

client = OpenAI(
    api_key=os.getenv("FRIENDLI_TOKEN"),
    base_url="https://api.friendli.ai/serverless/v1",
)

stream = client.chat.completions.create(
    model="meta-llama-3.3-70b-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a funny joke."},
    ],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)