Documentation Index
Fetch the complete documentation index at: https://friendli.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Friendli Dedicated Endpoints, Serverless Endpoints, and Container are OpenAI-compatible.
Existing applications can migrate with minimal effort, still using the official OpenAI SDKs.
Specify the base URL and API Key
Initialize the OpenAI client using Friendli’s base URL and your Personal API Key.
- Serverless Endpoints:
https://api.friendli.ai/serverless/v1.
- Dedicated Endpoints:
https://api.friendli.ai/dedicated/v1.
- Container: your own container’s URL (e.g.,
http://HOST:PORT/v1).
Get your Personal API Key in Friendli Suite > Personal Settings > API Keys.
client = OpenAI(
api_key=os.getenv("API_KEY"),
base_url="https://api.friendli.ai/serverless/v1",
)
Usage
Choose any model available on Friendli Serverless Endpoints, Dedicated Endpoints, or Container.
Completions API
Generate text completions using a simple prompt-based approach.
from openai import OpenAI
import os
client = OpenAI(
api_key=os.getenv("API_KEY"),
base_url="https://api.friendli.ai/serverless/v1",
)
completion = client.completions.create(
model="meta-llama-3.3-70b-instruct",
prompt="Tell me a funny joke about programming.",
max_tokens=100,
temperature=0.7,
)
print(completion.choices[0].text)
Chat completions API
Generate chat completions using a conversational message-based approach.
from openai import OpenAI
import os
client = OpenAI(
api_key=os.getenv("API_KEY"),
base_url="https://api.friendli.ai/serverless/v1",
)
completion = client.chat.completions.create(
model="meta-llama-3.3-70b-instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me a funny joke."},
],
stream=False,
)
print(completion.choices[0].message.content)
Streaming mode
Receive responses in real-time, enabling better user experience for long responses.
from openai import OpenAI
import os
client = OpenAI(
api_key=os.getenv("API_KEY"),
base_url="https://api.friendli.ai/serverless/v1",
)
stream = client.chat.completions.create(
model="meta-llama-3.3-70b-instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me a funny joke."},
],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)