Friendli Serverless Endpoints is compatible with the OpenAI API standard through the Python API Libraries and the Node API Libraries.

Friendli Dedicated Endpoints and Friendli Container are also OpenAI API compatible.

Through this guide, you will learn how to:

  • Send inference requests to Friendli Serverless Endpoints in Python and Node.js.
  • Use chat models supported by Friendli Endpoint.
  • Generate streaming chat responses.

Model Supports

  • meta-llama-3.1-8b-instruct
  • meta-llama-3.1-70b-instruct

You can find more information about each text generation model here. Log in to the Friendli Suite to create your Friendli Token for this quick tutorial. We will use the Llama 3.1 70B Instruct model as an example in this tutorial.

Quick Guide

If you want to integrate Friendli Serverless Endpoints to your application that had been using OpenAI, you can simply switch the following components: API key, model, and the base url. The API key is equivalent to your Friendli Token, which you can create here. After choosing your generative text model, you can find the model id by pressing the ‘More info’ icon, or by using the ids listed in the Model Supports section above. Last but not least, change the base url to https://api.friendli.ai/serverless/v1 and you are all set!

Python

This example demonstrates how you can use the OpenAI Python SDK to generate a response.

Default Example Code

import openai
import os

client = openai.OpenAI(
    api_key=os.getenv("FRIENDLI_TOKEN"),
    base_url="https://api.friendli.ai/serverless/v1",
)

chat_completion = client.chat.completions.create(
    model="meta-llama-3.1-70b-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a funny joke."},
    ],
    stream=False,
)
print(chat_completion.choices[0].message.content)

Streaming Example Code

import openai
import os

client = openai.OpenAI(
    api_key=os.getenv("FRIENDLI_TOKEN"),
    base_url="https://api.friendli.ai/serverless/v1",
)

stream = client.chat.completions.create(
    model="meta-llama-3.1-70b-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a funny joke."},
    ],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Node.js

This example demonstrates how you can use the OpenAI Node.js SDK to generate a response.

Default Example Code

const OpenAI = require("openai");

const openai = new OpenAI({
  apiKey: process.env.FRIENDLI_TOKEN,
  baseURL: "https://api.friendli.ai/serverless/v1",
});

async function getChatCompletion() {
  try {
    const chatCompletion = await openai.chat.completions.create({
      messages: [
        { role: "system", content: "You are a helpful assistant." },
        { role: "user", content: "Tell me a funny joke." },
      ],
      model: "meta-llama-3.1-70b-instruct",
      stream: false,
    });
    process.stdout.write(chatCompletion.choices[0].message.content);
  } catch (error) {
    console.error("Error:", error);
  }
}
getChatCompletion();

Streaming Example Code

const OpenAI = require("openai");

const openai = new OpenAI({
  apiKey: process.env.FRIENDLI_TOKEN,
  baseURL: "https://api.friendli.ai/serverless/v1",
});

async function getChatCompletionStream() {
  try {
    const stream = await openai.chat.completions.create({
      messages: [
        { role: "system", content: "You are a helpful assistant." },
        { role: "user", content: "Tell me a funny joke." },
      ],
      model: "meta-llama-3.1-70b-instruct",
      stream: true,
    });
    for await (const chunk of stream) {
      process.stdout.write(chunk.choices[0].delta?.content || "");
    }
  } catch (error) {
    console.error("Error:", error);
  }
}
getChatCompletionStream();

Results

Here's one:

Why couldn't the bicycle stand up by itself?

(wait for it...)

Because it was two-tired!

Hope that brought a smile to your face!