Histogram	Metric Name	Description
Friendli TCache hit ratio (0≤value≤1)	friendli\_tcache\_hit\_ratio\_bucket	Bucketized number of histogram samples for TCache hit ratio, with `le` label
friendli\_tcache\_hit\_ratio\_count	Total number of histogram samples for TCache hit ratio
friendli\_tcache\_hit\_ratio\_sum	Sum of histogram sample values for TCache hit ratio
The length of input tokens (Experimental metric)	friendli\_input\_lengths\_bucket	Bucketized number of histogram samples for length of input tokens, with `le` label
friendli\_input\_lengths\_count	Total number of histogram samples for length of input tokens
friendli\_input\_lengths\_sum	Sum of histogram sample values for length of input tokens
The length of output tokens (Experimental metric)	friendli\_output\_lengths\_bucket	Bucketized number of histogram samples for length of output tokens, with `le` label
friendli\_output\_lengths\_count	Total number of histogram samples for length of output tokens
friendli\_output\_lengths\_sum	Sum of histogram sample values for length of output tokens

Quantiles	Metric Name	Description
Request completion latency (in nanoseconds)	friendli\_requests\_latencies	Percentile value for request completion latency (`quantile` label is either `0.5`, `0.9`, or `0.99`)
friendli\_requests\_latencies\_count	Total number of samples for request completion latency
friendli\_requests\_latencies\_sum	Sum of sample values for request completion latency
Time to first token (TTFT) (in nanoseconds)	friendli\_requests\_ttft	Percentile value for time to first token (TTFT) (`quantile` label is either `0.5`, `0.9`, or `0.99`)
friendli\_requests\_ttft\_count	Total number of samples for time to first token (TTFT)
friendli\_requests\_ttft\_sum	Sum of sample values for time to first token (TTFT)
Request queueing delay (in nanoseconds)	friendli\_requests\_queueing\_delays	Percentile value for queueing delay (`quantile` label is either `0.5`, `0.9`, or `0.99`)
friendli\_requests\_queueing\_delays\_count	Total number of samples for queueing delay
friendli\_requests\_queueing\_delays\_sum	Sum of sample values for queueing delay

* [and more!](https://friendli.ai/models) You can find more information about each text generation model [here](https://friendli.ai/models). Log in to the [Friendli Suite](https://friendli.ai/login) to create your Friendli Token for this quick tutorial. We will use the *Llama 3.3 70B Instruct* model as an example in this tutorial. ## Quick Guide If you want to integrate Friendli Serverless Endpoints to your application that had been using OpenAI, you can simply switch the following components: **API key**, **model**, and the **base url**. The **API key** is equivalent to your Friendli Token, which you can create [here](https://friendli.ai/suite/setting/tokens). After choosing your generative text model, you can find the **model id** by pressing the 'More info' icon, or by using the ids listed in the Model Supports section above. Last but not least, change the **base url** to [https://api.friendli.ai/serverless/v1](https://api.friendli.ai/serverless/v1) and you are all set! ## Python This example demonstrates how you can use the OpenAI Python SDK to generate a response. #### Default Example Code ```python from openai import OpenAI import os client = OpenAI( api_key=os.getenv("FRIENDLI_TOKEN"), base_url="https://api.friendli.ai/serverless/v1", ) completion = client.chat.completions.create( model="meta-llama-3.3-70b-instruct", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Tell me a funny joke."}, ], stream=False, ) print(completion.choices[0].message.content) ``` #### Streaming Example Code ```python from openai import OpenAI import os client = OpenAI( api_key=os.getenv("FRIENDLI_TOKEN"), base_url="https://api.friendli.ai/serverless/v1", ) stream = client.chat.completions.create( model="meta-llama-3.3-70b-instruct", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Tell me a funny joke."}, ], stream=True, ) for chunk in stream: print(chunk.choices[0].delta.content or "", end="", flush=True) ``` ## Node.js This example demonstrates how you can use the OpenAI Node.js SDK to generate a response. #### Default Example Code ```javascript import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.FRIENDLI_TOKEN, baseURL: "https://api.friendli.ai/serverless/v1", }); async function main() { const completion = await client.chat.completions.create({ model: "deepseek-r1", messages: [ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: "Tell me a funny joke." }, ], }); console.log(completion.choices[0].message.content); } main().catch(console.error); ``` #### Streaming Example Code ```javascript import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.FRIENDLI_TOKEN, baseURL: "https://api.friendli.ai/serverless/v1", }); async function main() { const stream = await client.chat.completions.create({ model: "meta-llama-3.3-70b-instruct", messages: [ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: "Tell me a funny joke." }, ], stream: true, }); for await (const chunk of stream) { process.stdout.write(chunk.choices[0].delta?.content || ""); } } main().catch(console.error); ``` ## Results ``` Here's one: Why couldn't the bicycle stand up by itself? (wait for it...) Because it was two-tired! Hope that brought a smile to your face! ``` # Pricing Source: https://friendli.ai/docs/guides/serverless_endpoints/pricing Friendli Serverless Endpoints offer a range of models tailored to various tasks. Friendli Serverless Endpoints offer a flexible, scalable inference solution powered by a wide range of models. You can unlock access to more models and features based on your **usage tier**. **Important Update**: Effective June 20, 2025, we've introduced new billing options and plan changes: * Models are now billed **Token-Based** or **Time-Based**, depending on the model. * The Basic plan has been renamed to the **Starter plan**. * Existing users can continue using their current serverless models without interruption. ## Usage Tiers Usage tiers define your limits on usage and scale **monthly** based on your payment history. | Tiers | Usage Limits | Rate Limit (RPM) | Output Token Length | Qualifications | | ------ | ---------------- | ---------------- | ------------------- | --------------------------------------------------------- | | Tier 1 | \$50 / month | 100 | 2K | Valid payment method added | | Tier 2 | \$500 / month | 1,000 | 4K | Total historical spend of \$50+ | | Tier 3 | \$5,000 / month | 5,000 | 8K | Total historical spend of \$500+ | | Tier 4 | \$50,000 / month | 10,000 | 16K | Total historical spend of \$5,000+ | | Tier 5 | Custom | Custom | Custom | Contact [support@friendli.ai](mailto:support@friendli.ai) | **Qualifications** only apply to usage within the Serverless Endpoints plan. 'Output Token Length' is how much the model can write in response. It’s different from 'Context Length', which is sum of the input and output tokens. ## Billing Methods Friendli Serverless Endpoints use two different billing methods, Token-Based or Time-Based, depending on the model type. ### Token-Based Billing **Pinned models** (such as DeepSeek, Llama, and other popular models) are charged per token basis. These models are billed based on the number of tokens processed, where a "token" refers to an individual unit processed by the model. ### Time-Based Billing Other models use **time-based billing**, meaning you are charged per second of compute time used to run your inference request. ## Free Models The following models are available for free for a limited time. | Model Code | Free until | | ------------------------------------- | ---------- | | K-intelligence/Midm-2.0-Base-Instruct | August 4th | | K-intelligence/Midm-2.0-Mini-Instruct | August 4th | ## Pinned Models (Token-Based Billing) The following **pinned** popular models are billed **per token**: | Model Code | Price per Token | | --------------------------------- | ---------------------------------- | | deepseek-ai/DeepSeek-R1 | Input \$3 · Output \$7 / 1M tokens | | meta-llama/Llama-3.3-70B-Instruct | \$0.6 / 1M tokens | | meta-llama/Llama-3.1-8B-Instruct | \$0.1 / 1M tokens | ## Other Models (Time-Based Billing) Other models are billed **per second of compute time**: | Model Code | Price per Second | | --------------------------------------------- | ---------------- | | deepseek-ai/DeepSeek-R1-0528 | \$0.004 / second | | meta-llama/Llama-4-Maverick-17B-128E-Instruct | \$0.004 / second | | meta-llama/Llama-4-Scout-17B-16E-Instruct | \$0.002 / second | | Qwen/Qwen3-235B-A22B | \$0.004 / second | | Qwen/Qwen3-30B-A3B | \$0.002 / second | | Qwen/Qwen3-32B | \$0.002 / second | | google/gemma-3-27b-it | \$0.002 / second | | mistralai/Mistral-Small-3.1-24B-Instruct-2503 | \$0.002 / second | | mistralai/Devstral-Small-2505 | \$0.002 / second | | mistralai/Magistral-Small-2506 | \$0.002 / second | ## FAQs Your usage tier, which determines your rate limits, increases monthly based on your proof-of-payment. Need a faster upgrade? Reach out anytime at [support@friendli.ai](mailto:support@friendli.ai) — we’re happy to help! Popular models are available to all users, depending on the limits determined by their usage tiers. You'll receive an alert when approaching your monthly cap. Please contact [support@friendli.ai](mailto:support@friendli.ai) to discuss options for increasing your monthly cap. We may help you (1) pay early to reset your monthly cap, or (2) upgrade your plan to increase your monthly cap and unlock more features. For more questions, contact [support@friendli.ai](mailto:support@friendli.ai). # QuickStart: Friendli Serverless Endpoints Source: https://friendli.ai/docs/guides/serverless_endpoints/quickstart Learn how to get started with Friendli Serverless Endpoints in this step-by-step guide. Create an account, choose from powerful AI models like Llama 3.1, and seamlessly generate text, code, and more with ease. export const RoundedBorderBox = ({children, caption}) =>

{children} {caption &&

{caption}

}

; ## 1. Log In or Sign Up * If you have an account, log in using your preferred SSO or email/password combination. * If you're new to FriendliAI, create an account for free. Login

## 2. Access Friendli Serverless Endpoints * On your left sidebar, find the "Serverless Endpoints" option. * Click the option to access the playground page. Sidebar

## 3. Select a Model * Browse available generative models. Choose the model that best aligns with your desired use case. * Click on a model that supports Friendli Serverless Endpoints to directly select the endpoint. * First-time user receives a free trial to explore Friendli Serverless Endpoints without any financial commitment. Select Model

## 4. Generate Responses 1. Enter Your Query: * Type in your prompt or question. Chat Prompt

2. Adjust Settings: * Refer to the [Text Generation](/guides/serverless_endpoints/text-generation) docs for more details on the settings applicable for the text generation models. Chat Parameters

3. Generate Your Response: * Click submit button to start the generation process. * The model will process your query and produce the corresponding text output. That's it! Chat Response

### Generating Responses Through the Endpoint URL If you wish to send your requests through the endpoint URL, you can find the model id by hitting the info button on the top-right corner of the page. Refer to [this guide](/guides/personal_access_tokens) for general instructions on the Friendli Token. Model Info

```sh cURL curl -X POST https://api.friendli.ai/serverless/v1/chat/completions \ -H "Authorization: Bearer $FRIENDLI_TOKEN" \ -d '{ "model": "meta-llama-3.1-8b-instruct", "messages": [ { "role": "user", "content": "Python is a popular" } ] }' ``` ```python Python SDK # pip install friendli import os from friendli import SyncFriendli client = SyncFriendli(token=os.getenv("FRIENDLI_TOKEN")) chat_completion = client.serverless.chat.complete( model="meta-llama-3.3-70b-instruct", messages=[ { "role": "user", "content": "Tell me how to make a delicious pancake" } ], stream=False, ) print(chat_completion.choices[0].message.content) ``` ## Additional Tips Check out the [Text Generation](/guides/serverless_endpoints/text-generation) docs for more details. **Ready to unlock the creativity of generative AI? Get started with Friendli Serverless Endpoints today!** # Structured Outputs Source: https://friendli.ai/docs/guides/serverless_endpoints/structured-outputs Generate structured outputs using FriendliAI's Structured Outputs feature. Large language models (LLMs) excel at creative text generation, but we often face a case where we need LLM outputs to be more structured. This is where our exciting new "structured output" feature comes in. Structured Outputs is also available in [Friendli Dedicated Endpoints](https://friendli.ai/products/dedicated-endpoints) and [Friendli Container](https://friendli.ai/products/container). For more advanced use cases of our Structured Outputs feature, check out our detailed blog post on [Structured Output for LLM Agents](https://friendli.ai/blog/structured-output-llm-agents). ## Structured response modes | Type | Description | Name at OpenAI | | ------------- | ------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- | | `json_schema` | The model returns a JSON object that conforms to the given schema. | [Structured Outputs](https://platform.openai.com/docs/guides/structured-outputs#introduction) | | `json_object` | The model can return any JSON object. | [JSON mode](https://platform.openai.com/docs/guides/structured-outputs#json-mode) | | `regex` | The model returns a string that conforms to the given regex schema. | N/A | ## How to use This guide provides a step-by-step example of how to create a structured output response in JSON form.\ In this example, we will use Python and the `pydantic` library to define a schema for the output. Define a schema that contains information about a dish. ```python from pydantic import BaseModel class Result(BaseModel): dish: str cuisine: str calories: int ``` Call structured output and use schema to structure the response. ```python {17-22} import os from openai import OpenAI client = OpenAI( base_url="https://api.friendli.ai/serverless/v1", api_key=os.getenv("FRIENDLI_TOKEN"), ) completion = client.chat.completions.create( model="meta-llama-3.1-8b-instruct", messages=[ { "role": "user", "content": "Suggest a popular Italian dish in JSON format.", }, ], response_format={ "type": "json_schema", "json_schema": { "schema": Result.model_json_schema(), } } ) ``` You can use the output in the following way. ```python response = completion.choices[0].message.content print(response) ``` The code output result is as follows. ```json Result: { "dish": "Spaghetti Bolognese", "cuisine": "Italian", "calories": 540 } ``` This example demonstrates how to generate an arbitrary JSON object response without a predefined schema. In `json_object` mode, the response may start with `{` or `[` and can be any arbitrary JSON object (dictionary) or array. If you need predictable results, we recommend using `json_schema`. ```python {15} import os from openai import OpenAI client = OpenAI( base_url="https://api.friendli.ai/serverless/v1", api_key=os.getenv("FRIENDLI_TOKEN"), ) completion = client.chat.completions.create( model="meta-llama-3.1-8b-instruct", messages=[ {"role": "system", "content": "You MUST answer with JSON."}, {"role": "user", "content": "Generate a lasagna recipe. (very short)"}, ], response_format={"type": "json_object"}, ) print(completion.choices[0].message.content) ``` This example shows how to generate output that matches a specific regular expression pattern. ```python {17-18} import os from openai import OpenAI client = OpenAI( base_url="https://api.friendli.ai/serverless/v1", api_key=os.getenv("FRIENDLI_TOKEN"), ) completion = client.chat.completions.create( model="meta-llama-3.1-8b-instruct", messages=[ { "role": "user", "content": "조선 왕조의 첫번째 왕은 누구입니까 (Who is the first king of the Joseon Dynasty)?", }, ], # Korean characters and numbers are allowed in the response. response_format={"type": "regex", "schema": "[\n ,.?!0-9\uac00-\ud7af]*"}, ) print(completion.choices[0].message.content) ``` ## Supported JSON schemas We ensure super-fast schema-guided generation by disabling JSON schema features that cause computation inefficiencies. We support **all seven standard JSON schema types** (`null`, `boolean`, `number`, `integer`, `string`, `object`, `array`), and **the supported JSON schema keywords are listed below**. Using unsupported or unexpected JSON schema keywords may result in them being ignored, triggering an error, or causing undefined behavior. ### Type-specific keywords * `integer` * `exclusiveMinimum`, `exclusiveMaximum`, `minimum`, `maximum` (Note: these are not supported in `number`) * `string` * `pattern` * `format` * Supported values: `uuid`, `date-time`, `date`, `time` * `object` * `properties` * `additionalProperties` is ignored, and is always set to `False`. * `required`: We support both required and optional properties, but have these limitations: * The sequence of the properties is fixed. * The first property should be `required`. If not, the first required property is moved to the first. * `array` * `items` * `minItems`: We support only `0` or `1` for `minItems`. ### Constant values and enumerated values `const` and `enum` only support constant values of null, boolean, number, and string. ### Schema composition We support only `anyOf` for [schema composition](https://json-schema.org/understanding-json-schema/reference/combining). ### Referencing subschemas We only support referencing (`$ref`) to "internal" subschemas. These subschemas must be defined within `$defs`, and the value of `$ref` must be a valid URI pointing to a subschema. Please refer [here](https://json-schema.org/understanding-json-schema/structuring#defs) for more details. ### Annotation JSON schema annotations such as `title`, `$comments` or `description` are accepted but ignored. # Text Generation Models Source: https://friendli.ai/docs/guides/serverless_endpoints/text-generation Dive into the characteristics of popular Text Generation Models (TGMs) available on Friendli. ## Unleashing the Power of Language with Friendli Welcome to the captivating world of Text Generation Models (TGMs)! These AI models learn from massive datasets of text and code, mimicking human language patterns to generate creative and informative outputs. Friendli empowers you to harness the potential of several cutting-edge TGMs through its convenient interface, letting you unlock the magic of words with ease. This guide dives into the characteristics of popular TGMs available on Friendli Serverless Endpoints: ## Model Supports * `K-intelligence/Midm-2.0-Base-Instruct` * `K-intelligence/Midm-2.0-Mini-Instruct` * `deepseek-ai/DeepSeek-R1` * `deepseek-ai/DeepSeek-R1-0528` * `meta-llama/Llama-4-Maverick-17B-128E-Instruct` * `meta-llama/Llama-4-Scout-17B-16E-Instruct` * `meta-llama/Llama-3.3-70B-Instruct` * `meta-llama/Llama-3.1-8B-Instruct` * `Qwen/Qwen3-235B-A22B` * `Qwen/Qwen3-30B-A3B` * `Qwen/Qwen3-32B` * `google/gemma-3-27b-it` * `mistralai/Mistral-Small-3.1-24B-Instruct-2503` * `mistralai/Devstral-Small-2505` * `mistralai/Magistral-Small-2506` Please note that the pricing for each model can be found in the [pricing section](/guides/serverless_endpoints/pricing).\ In addition, you can deploy any model of your choice. Check out more models we support [here](https://friendli.ai/models)! ## Llama 3.3 70B Instruct * **Focus**: Engaging dialogues and interactive experiences. * **Strengths**: * Natural language understanding and human-like response generation in conversational settings. * Maintains coherence and context throughout dialogues, fostering seamless interactions. * Can adapt to different conversation styles and tones. * **Example Use Cases**: * Building customer service chatbots that understand natural language and offer personalized support. * Creating interactive storytelling experiences and AI companions. * Developing game AI characters with engaging back-and-forth conversations. ### Examples When you install `friendli`, you can generate chat response with Python SDK.\ Refer to [this guide](/guides/personal_access_tokens) for general instructions on the Friendli Token. ```python Default # pip install friendli import os from friendli import SyncFriendli with SyncFriendli( token=os.getenv("FRIENDLI_TOKEN", ""), ) as friendli: res = friendli.serverless.chat.complete( model="meta-llama-3.3-70b-instruct", messages=[ { "role": "user", "content": "Tell me how to make a delicious pancake" }, ], ) print(res) ``` ```python Streaming # pip install friendli import os from friendli import SyncFriendli with SyncFriendli( token=os.getenv("FRIENDLI_TOKEN", ""), ) as friendli: res = friendli.serverless.chat.stream( model="meta-llama-3.3-70b-instruct", messages=[ { "role": "user", "content": "Tell me how to make a delicious pancake" }, ], ) with res as event_stream: for event in event_stream: print(event, flush=True) ``` ```python Async # pip install friendli import asyncio import os from friendli import AsyncFriendli async def main(): async with AsyncFriendli( token=os.getenv("FRIENDLI_TOKEN", ""), ) as friendli: res = await friendli.serverless.chat.complete( model="meta-llama-3.3-70b-instruct", messages=[ { "role": "user", "content": "Tell me how to make a delicious pancake" }, ], ) print(res) asyncio.run(main()) ``` ```python Streaming (Async) # pip install friendli import asyncio import os from friendli import AsyncFriendli async def main(): async with AsyncFriendli( token=os.getenv("FRIENDLI_TOKEN", ""), ) as friendli: res = await friendli.serverless.chat.stream( model="meta-llama-3.3-70b-instruct", messages=[ { "role": "user", "content": "Tell me how to make a delicious pancake" }, ], ) async with res as event_stream: async for event in event_stream: print(event, flush=True) asyncio.run(main()) ``` ## Beyond the Models: Generation Settings: Friendli Serverless Endpoints unlocks further customization through various generation settings, allowing you to fine-tune your Text Generation Model (TGM) outputs: * **max\_tokens**: This defines the maximum number of words your TGM generates. Lower values produce concise outputs, while higher values allow for longer narratives. * **temperature**: Think of temperature as a creativity knob. Higher values promote more imaginative and surprising outputs, while lower values favor safe and predictable responses. * **top\_p**: This parameter governs the diversity of your output. Lower values focus on the most likely continuation, while higher values encourage exploration of less probable but potentially interesting options. For more details, see [here](/openapi/serverless/chat-completions). ## Unleashing the Full Potential: Friendli Serverless Endpoints removes the technical hurdles, letting you focus on exploring the magic of TGMs. Start experimenting with different models and settings, tailoring the outputs to your unique vision. Remember, practice makes perfect – the more you interact with these models, the more you'll understand their strengths and discover the incredible possibilities they hold. Ready to embark on your text generation journey? Friendli Serverless Endpoints is your gateway to a world of boundless creativity and innovative applications. Sign up today and let the words flow! # Tool Assisted API Source: https://friendli.ai/docs/guides/serverless_endpoints/tool-assisted-api Tool Assisted API enhances a model's capabilities by integrating tools that extend its functionality beyond simple conversational interactions. By using this API, the model becomes more dynamic, providing more comprehensive and actionable responses. Currently, Friendli Serverless Endpoints supports a variety of built-in tools specifically designed for Chat Completion tasks. export const ToolIcon = () => { return ; }; export const ChatIcon = () => { return ; }; ## What is Tool Assisted API? **Tool Assisted API** enhances a model's capabilities by integrating **tools** that extend its functionality beyond simple conversational interactions. By using this API, the model becomes more dynamic, providing more comprehensive and actionable responses. Currently, **[Friendli Serverless Endpoints](/guides/serverless_endpoints/introduction)** supports a variety of built-in tools specifically designed for **Chat Completion** tasks. *** ### What is Chat Completion? **[Chat completion](/openapi/serverless/chat-completions)** refers to a model's ability to generate responses in a conversation. Given a sequence of messages or conversation turns, the model processes the input and generates a response based on its internal knowledge and training data. * **Example**: * **User**: "What is the capital of France?" * **Model**: "The capital of France is Paris." However, chat completion has its limitations—it is restricted to the knowledge the model has learned during its training and cannot access real-time or external data. *** ### Is Chat Completion Different from Tool Assisted Chat Completion? Yes, **[Tool Assisted Chat Completion](/openapi/serverless/tool-assisted-chat-completions)** goes beyond basic chat completion by integrating external tools to enhance the conversation. This allows the model to access real-time data, perform specific tasks, and interact with external systems in ways that chat completion alone cannot achieve. * **Example**: * **User**: "What is the weather today?" * **Model without Tool Access**: Relies on pre-learned information, potentially giving outdated or generalized answers. * **Model with Tool Access**: Calls a weather API to retrieve live data and responds: "The weather today in New York is 72°F with clear skies." With tool access, the model provides a more accurate and up-to-date response. Additionally, some tasks—such as file processing or complex calculations—cannot be performed by the model alone but can be handled with the help of tools. * **Example**: * **User**: "Can you extract the text from this document?" (provides a file) * **Model without Tool Access**: "I cannot extract data from files directly." * **Model with Tool Access**: Extracts the text from the provided file and responds: "Using the `file:text` tool, I've extracted the following text: \[Text from the file]." When no tools are specified, the model will respond using only its internal knowledge. *** ### Benefits of Tool Assisted Chat Completion Tool Assisted Chat Completion offers several advantages over basic chat completion: * **Real-Time Data Access**: The model can pull live information. * **Extended Capabilities**: The model can perform complex tasks like running calculations, executing code, extracting text from files, and interacting with databases and APIs. *** ### Comparison: Chat Completion vs. Tool Assisted Chat Completion
| Feature | **Chat Completion** | **Tool Assisted Chat Completion** | | ----------------- | ------------------------------------------------ | -------------------------------------------------------------------- | | **Response Type** | Based on internal knowledge | Uses external tools for enhanced, real-time responses | | **Capabilities** | Limited to pre-learned knowledge | Can interact with tools for data retrieval and task execution | | **Example** | "What is the weather today?" (general knowledge) | "What is the weather today?" (live API result) | | **Use Cases** | General conversation and Q\&A | Complex tasks like real-time updates, data analysis, file processing | *** ## Built-In Tools Tool Assisted API automatically selects the best tool to perform an action based on user input when a specific tool is enabled. These tools can handle various operations, such as calculations, statistical analysis, web search, file content extraction, and code execution. Below is a more detailed description of the available tools in Tool Assisted API and when they are typically used: ### `math:calculator` **Description:** Performs basic arithmetic operations like addition, subtraction, multiplication, division, and more complex calculations like and square roots or exponents. It is useful for any tasks requiring mathematical computation. **When Used:** Automatically called when mathematical expressions or calculations are required. Whether you're solving equations, calculating percentages, or handling financial calculations, this tool performs the task for you. ### `math:statistics` **Description:** Performs statistical analysis, including calculating mean, median, mode, standard deviation, and correlations. It is tailored for situations where you need to analyze or interpret numeric datasets to understand trends or patterns. **When Used:** Automatically called when analyzing numeric data or generating insights from datasets, like summarizing survey results, or calculating probabilities. ### `math:calendar` **Description:** Handles date-related data, such as calculating date differences or finding specific days in the past or future. It is effective in managing and manipulating calendar-based information. **When Used:** Automatically called when operations involving dates or time spans are required, like finding how many days remain until an event. For example, figuring out how many days are left until an event, determining the day of the week for a specific date, or calculating the duration between two dates. ### `web:search` **Description:** Retrieves information from the web based on search queries. It fetches information based on keywords and helps gather knowledge or insights from online sources. **When Used:** Automatically called when you ask questions or seek information that requires external research or the latest data from the web. Whether it is looking up definitions, recent news, or general web searches, this tool handles such tasks effectively. ### `web:url` **Description:** Extracts specific data from a given website. You can provide a URL, and the tool will fetch the relevant content, including text, metadata, or other embedded information, from that web page. **When Used:** Automatically called when extracting content from a provided URL, such as fetching text from articles or blog posts. ### `code:python-interpreter` **Description:** Executes Python code directly within the platform for custom scripts, data processing, or automation. You can run Python scripts, test snippets of code, or automate tasks through coding logic. **When Used:** Automatically called when tasks involve writing or running Python scripts, such as custom data manipulations or logic-based automation. ### `file:text` **Description:** Reads and extracts text from files, supporting only `.txt` and `.pdf` formats. To use this tool, you must provide the file IDs. (For now, only one file is supported.) **When Used:** Automatically called when text extraction from a file is requested, such as pulling content from documents or reports. ## Conclusion * **Chat Completion**: Best for general conversations that rely on the model's pre-existing knowledge. * **Tool Assisted Chat Completion**: Ideal for real-time, dynamic tasks and more advanced interactions, leveraging external tools to enhance functionality. *** ## Explore APIs To get started with Tool Assisted Chat Completion, follow this tutorial: **[Tool calling with Serverless Endpoints](/guides/tutorials/tool-calling-with-serverless-endpoints)**. For more details, check out the API Reference documentations below: } href="/openapi/serverless/chat-completions"> Discover how to generate text through interactive conversations. } href="/openapi/serverless/tool-assisted-chat-completions"> Learn how to enhance responses with tool assisted chat completions using built-in tools. # Build an agent with Gradio Source: https://friendli.ai/docs/guides/tutorials/build-an-agent-with-gradio Build and deploy smart AI agents with Friendli Serverless Endpoints and Gradio in under 50 lines. ## Goals * Build your own AI agent using [**Friendli Serverless Endpoints**](https://friendli.ai/products/serverless-endpoints) and [**Gradio**](https://www.gradio.app) less than 50 LoC 🤖 * Use tool calling to make your agent even smarter 🤩 * Share your AI agent with the world and gather feedback 🌎 > [**Gradio**](https://www.gradio.app) is the fastest way to demo your model with a friendly web interface. ## Getting Started 1. Head to [**https://friendli.ai**](https://friendli.ai/get-started/serverless-endpoints), and create an account. 2. Grab a [Friendli Token](https://friendli.ai/suite/setting/tokens) to use Friendli Serverless Endpoints within an agent. ## 🚀 Step 1. Prerequisite Install dependencies. ``` pip install openai gradio ``` ## 🚀 Step 2. Launch your agent Build your own AI agent using **Friendli Serverless Endpoints** and **Gradio**. * Gradio provides a `ChatInterface` that implements a chatbot UI running the `chat_function`. * More information about the *chat\_function(message, history)* > *The input function should accept two parameters: a string input message and list of two-element lists of the form \[\[user\_message, bot\_message], ...] representing the chat history, and return a string response.* * Implement the `chat_function` using Friendli Serverless Endpoints. * Here, we used the `meta-llama-3.3-70b-instruct` model. * Feel free to explore other available models [here](https://friendli.ai/models/search?products=SERVERLESS). ```python from openai import OpenAI import gradio as gr friendli_client = OpenAI( base_url="https://api.friendli.ai/serverless/v1", api_key="YOUR FRIENDLI TOKEN" ) def chat_function(message, history): messages = [] for user, chatbot in history: messages.append({"role" : "user", "content": user}) messages.append({"role" : "assistant", "content": chatbot}) messages.append({"role": "user", "content": message}) stream = friendli_client.chat.completions.create( model="meta-llama-3.3-70b-instruct", messages=messages, stream=True ) res = "" for chunk in stream: res += chunk.choices[0].delta.content or "" yield res css = """ .gradio-container { max-width: 800px !important; margin-top: 100px !important; } .pending { display: none !important; } .sm { box-shadow: None !important; } #component-2 { height: 400px !important; } """ with gr.Blocks(theme=gr.themes.Soft(), css=css) as friendli_agent: gr.ChatInterface(chat_function) friendli_agent.launch() ``` ## 🚀 Step 3. Tool Calling (Advanced) Use tool calling to make your agent even smarter! We will show you how to make your agent search the web before answer as an example. * Change the `base_url` to `https://api.friendli.ai/serverless/tools/v1` * Add `tools` parameter when calling chat completion API ```python from openai import OpenAI import gradio as gr friendli_client = OpenAI( base_url="https://api.friendli.ai/serverless/tools/v1", api_key="YOUR FRIENDLI TOKEN" ) def chat_function(message, history): messages = [] for user, chatbot in history: messages.append({"role" : "user", "content": user}) messages.append({"role" : "assistant", "content": chatbot}) messages.append({"role": "user", "content": message}) stream = friendli_client.chat.completions.create( model="meta-llama-3.3-70b-instruct", messages=messages, stream=True, tools=[{"type": "web:search"}], ) res = "" for chunk in stream: if chunk.choices is None: yield "Waiting for tool response..." else: res += chunk.choices[0].delta.content or "" yield res css = """ .gradio-container { max-width: 800px !important; margin-top: 100px !important; } .pending { display: none !important; } .sm { box-shadow: None !important; } #component-2 { height: 400px !important; } """ with gr.Blocks(theme=gr.themes.Soft(), css=css) as agent: gr.ChatInterface(chat_function) agent.launch() ``` Here is the available built-in tools (Beta) list. Feel free to build your agent using the below tools. * `math:calculator` (tool for calculating arithmetic operations) * `math:statistics` (tool for analyzing statistic data) * `math:calendar` (tool for handling date-related data) * `web:search` (tool for retrieving data through the web search) * `web:url` (tool for extracting data from a given website) * `code:python-interpreter` (tool for writing and executing python code) * `file:text` (tool for extracting text data from a given file) ## 🚀 Step 4. Deploy your agent For the temporal deployment, change the last line of the code. ```python agent.launch(share=True) ``` For the permanent deployment, you can use [HuggingFace Space](https://huggingface.co/spaces)! # Build an agent with LangChain Source: https://friendli.ai/docs/guides/tutorials/build-an-agent-with-langchain Build an AI agent with LangChain and Friendli Serverless Endpoints, integrating tool calling for dynamic and efficient responses. ## Introduction This tutorial walks you through creating an Agent using LangChain and Serverless Endpoints. ## Setup ```bash pip install -qU langchain-openai langchain-community langchain wikipedia ``` Get your [Friendli Token](https://friendli.ai/suite/setting/tokens) to use Friendli Serverless Endpoints. ```python import getpass import os if not os.environ.get("FRIENDLI_TOKEN"): os.environ["FRIENDLI_TOKEN"] = getpass.getpass("Enter your Friendli Token: ") ``` ## Instantiation ```python from langchain_openai import ChatOpenAI llm = ChatOpenAI( model="meta-llama-3.1-8b-instruct", base_url="https://api.friendli.ai/serverless/v1", api_key=os.environ["FRIENDLI_TOKEN"], ) ``` ## Create Agent with LangChain ### Step 1. Create Tool ```python from langchain_community.tools import WikipediaQueryRun from langchain_community.utilities import WikipediaAPIWrapper api_wrapper = WikipediaAPIWrapper(top_k_results=1, doc_content_chars_max=100) wiki = WikipediaQueryRun(api_wrapper=api_wrapper) tools = [wiki] ``` ### Step 2. Create Prompt ```python from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder prompt = ChatPromptTemplate.from_messages( [ ("system", "You are a helpful assistant"), MessagesPlaceholder("chat_history"), ("user", "{input}"), ("placeholder", "{agent_scratchpad}"), ] ) prompt.messages ``` ### Step 3. Create Agent ```python from langchain.agents import AgentExecutor from langchain.agents import create_tool_calling_agent agent = create_tool_calling_agent(llm, tools, prompt) agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True) ``` ### Step 4. Run the Agent ```python chat_history = [] while True: user_input = input("Enter your message: ") result = agent_executor.invoke( {"input": user_input, "chat_history": chat_history}, ) chat_history.append({"role": "user", "content": user_input}) chat_history.append({"role": "assistant", "content": result["output"]}) ``` When you run the code, it will wait for the user's input. After inputting, it will wait and output the result. When you ask a question about a specific wikipedia, it will automatically call the wikipedia tool and output the result. ```text final result Enter your Friendli Token: ·········· Enter your message: hello > Entering new AgentExecutor chain... Hello, it's nice to meet you. I'm here to help with any questions or topics you'd like to discuss. Is there something in particular you'd like to talk about, or do you need assistance with something? > Finished chain. Enter your message: What does the Linux kernel do? > Entering new AgentExecutor chain... Invoking: `wikipedia` with `{'query': 'Linux kernel'}` responded: The Linux kernel is the core component of the Linux operating system. It acts as a bridge between the computer hardware and the user space applications. The kernel manages the system's hardware resources, such as memory, CPU, and I/O devices. It provides a set of interfaces and APIs that allow user space applications to interact with the hardware. Page: Linux kernel Summary: The Linux kernel is a free and open source,: 4 UNIX-like kernel that isThe Linux kernel is a free and open source, UNIX-like kernel that is responsible for managing the system's hardware resources, such as memory, CPU, and I/O devices. It provides a set of interfaces and APIs that allow user space applications to interact with the hardware. The kernel is the core component of the Linux operating system, and it plays a crucial role in ensuring the stability and security of the system. > Finished chain. Enter your message: ``` ## Full Example Code ```python import getpass import os from langchain_openai import ChatOpenAI from langchain_community.tools import WikipediaQueryRun from langchain_community.utilities import WikipediaAPIWrapper from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder from langchain.agents import AgentExecutor from langchain.agents import create_tool_calling_agent if not os.environ.get("FRIENDLI_TOKEN"): os.environ["FRIENDLI_TOKEN"] = getpass.getpass("Enter your Friendli Token: ") llm = ChatOpenAI( model="meta-llama-3.1-8b-instruct", base_url="https://api.friendli.ai/serverless/v1", api_key=os.environ["FRIENDLI_TOKEN"], ) api_wrapper = WikipediaAPIWrapper(top_k_results=1, doc_content_chars_max=100) wiki = WikipediaQueryRun(api_wrapper=api_wrapper) tools = [wiki] # Get the prompt to use - you can modify this! prompt = ChatPromptTemplate.from_messages( [ ("system", "You are a helpful assistant"), MessagesPlaceholder("chat_history"), ("user", "{input}"), ("placeholder", "{agent_scratchpad}"), ] ) agent = create_tool_calling_agent(llm, tools, prompt) agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True) chat_history = [] while True: user_input = input("Enter your message: ") result = agent_executor.invoke( {"input": user_input, "chat_history": chat_history}, ) chat_history.append({"role": "user", "content": user_input}) chat_history.append({"role": "assistant", "content": result["output"]}) ``` # Chat docs with LangChain Source: https://friendli.ai/docs/guides/tutorials/chat-docs-with-langchain You can view the content [here](https://friendli.ai/blog/chatdocs-rag-friendli-langchain). # Chat docs with MongoDB Source: https://friendli.ai/docs/guides/tutorials/chat-docs-with-mongodb You can view the content [here](https://friendli.ai/blog/rag-chatbot-friendli-mongodb-atlas-langchain). # Go Playground with Next.js Source: https://friendli.ai/docs/guides/tutorials/go-playground-with-nextjs You can view the content [here](https://friendli.ai/blog/vercel-ai-sdk-playground-tutorial). # How to Fine-tune Vision Language Models (VLMs) Source: https://friendli.ai/docs/guides/tutorials/how-to-fine-tune-vlm Fine-tune Vision Language Models (VLMs) on Friendli Dedicated Endpoints using datasets. export const RoundedBorderBox = ({children, caption}) =>

{children} {caption &&

{caption}

}

; ## Introduction Effortlessly fine-tune your Vision Language Model (VLM) with Friendli Dedicated Endpoints, which leverages the Parameter-Efficient Fine-Tuning (PEFT) method to reduce training costs while preserving model quality, similar to full-parameter fine-tuning. This can make your model become an expert on specific visual tasks and improve its ability to understand and describe images accurately. In this tutorial, we will cover: * How to upload your image-text dataset for VLM fine-tuning. * How to fine-tuning state-of-the-art VLMs like [Qwen2.5-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct) and [gemma-3-27b-it](https://huggingface.co/google/gemma-3-27b-it) on your dataset. * How to deploy your fine-tuned VLM model. ## Table of Contents * [Prerequisites](#prerequisites) * [Step 1. Prepare Your Dataset](#step-1-prepare-your-dataset) * [Step 2. Upload Your Dataset](#step-2-upload-your-dataset) * [Step 3. Fine-tune Your VLM](#step-3-fine-tune-your-vlm) * [Step 4. Monitor Training Progress](#step-4-monitor-training-progress) * [Step 5. Deploy Your Fine-tuned Model](#step-5-deploy-your-fine-tuned-model) * [Resources](#resources) ## Prerequisites 1. Head to [Friendli Suite](https://friendli.ai/get-started/dedicated-endpoints) and create an account. 2. Issue a **Friendli Token** by going to [Personal settings > Tokens](https://friendli.ai/suite/setting/tokens). Make sure to copy and store it securely in a safe place as you won't be able to see it again after refreshing the page.\ For detailed instructions, see [Personal Access Tokens](/guides/personal_access_tokens). ## Step 1. Prepare Your Dataset Your dataset should be a conversational dataset in `.jsonl` or `.parquet` format, where each line represents a sequence of messages. Each message in the conversation should include a `"role"` (e.g., `system`, `user`, or `assistant`) and `"content"`. For VLM fine-tuning, user content can contain both text and image data (Note that for image data, we support URL and Base64). Here's an example of what it should look like. Note that it's one line but beautified for readability: ```json { "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": [ { "type": "image", "image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg" }, { "type": "image", "image": "data:image/png;base64," }, { "type": "text", "text": "Describe this image in detail." } ] }, { "role": "assistant", "content": "The image is a bee." } ] } ``` You can access our example dataset ['FriendliAI/gsm8k'](https://huggingface.co/datasets/FriendliAI/gsm8k) (for Chat), ['FriendliAI/sample-vision'](https://huggingface.co/datasets/FriendliAI/sample-vision) (for Chat with image) and explore some of our quantized generative AI models on [our Hugging Face page](https://huggingface.co/FriendliAI). ## Step 2. Upload Your Dataset Once you have prepared your dataset, you can upload it to Friendli using the [Python SDK](/sdk/python-sdk). ### Install the Python SDK First, install the Friendli Python SDK: ```bash # Using pip pip install friendli # Using poetry poetry add friendli ``` ### Upload Your Dataset Use the following code to create a dataset and upload your samples: ```python import os from friendli.friendli import SyncFriendli from friendli.models import Sample TEAM_ID = os.environ["FRIENDLI_TEAM_ID"] PROJECT_ID = os.environ["FRIENDLI_PROJECT_ID"] TOKEN = os.environ["FRIENDLI_TOKEN"] # Read dataset file and parse each line as a Sample with open("dataset.jsonl", "rb") as f: data = [Sample.model_validate_json(line) for line in f] with SyncFriendli( token=TOKEN, x_friendli_team=TEAM_ID, ) as friendli: # Create a new dataset with TEXT and IMAGE modalities with friendli.dataset.create( modality=["TEXT", "IMAGE"], name="my-vlm-dataset", # name of the dataset project_id=PROJECT_ID, ) as dataset: # Upload samples to the dataset # Each line from your dataset file becomes a separate sample dataset.upload_samples( samples=data, split="train", # name of the split to upload to ) ``` ### How It Works Friendli Python SDK doesn't upload your entire dataset file at once. Instead, it processes your dataset more efficiently: 1. **Reads your dataset file line by line**: Each line is parsed as a `Sample` object containing a conversation with messages. 2. **Creates a dataset**: A new dataset is created in your Friendli project with the specified modalities (`TEXT` and `IMAGE`). 3. **Uploads each conversation as a separate sample**: Rather than uploading the entire file, each conversation (line in the dataset file) becomes an individual sample in the dataset. 4. **Organizes by splits**: Samples are organized into splits like "train", "validation", or "test" for different purposes during fine-tuning. ### Environment Variables Make sure to set the required environment variables: ```bash export FRIENDLI_TOKEN="your-friendli-token" export FRIENDLI_TEAM_ID="your-team-id" export FRIENDLI_PROJECT_ID="your-project-id" ``` You can find your Team ID and Project ID in the URL of Friendli Suite, formatted as `https://friendli.ai///...`. ### View Your Dataset To view and edit the datasets you've uploaded, visit [Friendli Suite > Dataset](https://friendli.ai/suite/~/dataset).

## Step 3. Fine-tune Your VLM Go to [Friendli Suite > Fine-tuning](https://friendli.ai/suite/~/fine-tuning), and click the **'New job'** button to create a new job.

In the job creation form, you'll need to configure the following settings: 1. **Job Name**: * Enter a name for your fine-tuning job. * If not provided, a name will be automatically generated (e.g., `accomplished-shark`). 2. **Model**: * Choose your base model from one of these sources: * Hugging Face: Select from models available on Hugging Face. * Weights & Biases: Use a model from your W\&B projects. * Uploaded model: Use a model you've previously uploaded. 3. **Dataset**: * Select the dataset to use. 4. **Weights & Biases Integration** (Optional): * Enable W\&B tracking by providing your W\&B project name. * This will automatically log training metrics to your W\&B dashboard for comprehensive monitoring and experiment tracking. * For detailed setup instructions, see [using W\&B with dedicated fine-tuning](/guides/dedicated_endpoints/faq#using-w%26b-with-dedicated-fine-tuning). 5. **Hyperparameters**: * Learning Rate (required): Initial learning rate for optimizer (e.g., 0.0001). * Batch Size (required): Total batch size used for training (e.g., 16). * Total Number of Training (required), either: * Number of Training Epoch: Total number of training epochs to perform (e.g., 1) * Training Steps: Total number of training steps to perform (e.g., 1000) * Evaluation Steps (required): Number of steps between evaluation of the model using the validation set (e.g., 300). * LoRA Rank (optional): Rank of the LoRA parameters (e.g., 16). * LoRA Alpha (optional): Scaling factor that determines the influence of the low-rank matrices during fine-tuning (e.g., 32). * LoRA Dropout (optional): Dropout rate applied during fine-tuning (e.g., 0.1). After configuring these settings, click the **'Create'** button at the bottom to start your fine-tuning job. ## Step 4. Monitor Training Progress You can now monitor your fine-tuning job progress and on Friendli Suite. If you have integrated your Weights & Biases (W\&B) account, you can also monitor the training status in your W\&B project. Read our FAQ section on [using W\&B with dedicated fine-tuning](/guides/dedicated_endpoints/faq#using-w%26b-with-dedicated-fine-tuning) to learn more about monitoring you fine-tuning jobs on their platform.

## Step 5. Deploy Your Fine-tuned Model Once the fine-tuning process is complete, you can immediately deploy the model by clicking the **'Deploy'** button in the top right corner. The name of the fine-tuned LoRA adapter will be the same as your fine-tuning job name. Completed

For more information about deploying a model, refer to [Endpoints documentation](/guides/dedicated_endpoints/endpoints). ## Resources Explore these additional resources to learn more about VLM fine-tuning and optimization: * [Browse all models supported by FriendliAI](https://friendli.ai/models) * [Example dataset](https://huggingface.co/datasets/FriendliAI/gsm8k) * [FAQ on general requirements for a model](/guides/dedicated_endpoints/faq#general-requirements-for-a-model) * [FAQ on using a Hugging Face repository as a model](/guides/dedicated_endpoints/faq#how-to-use-a-hugging-face-repository-as-a-model) * [FAQ on integrating a Hugging Face account](/guides/dedicated_endpoints/faq#how-to-integrate-a-hugging-face-account) * [FAQ on using a W\&B artifact as a model](/guides/dedicated_endpoints/faq#how-to-use-a-w%26b-artifact-as-a-model) * [FAQ on integrating a W\&B account](/guides/dedicated_endpoints/faq#how-to-integrate-a-w%26b-account) * [FAQ on using W\&B with dedicated fine-tuning](/guides/dedicated_endpoints/faq#using-w%26b-with-dedicated-fine-tuning) * [Endpoints documentation on model deployment](/guides/dedicated_endpoints/endpoints) # RAG app with LlamaIndex Source: https://friendli.ai/docs/guides/tutorials/rag-app-with-llamaindex You can view the content [here](https://friendli.ai/blog/llamaindex-rag-app-friendli-engine). # Tool calling with Serverless Endpoints Source: https://friendli.ai/docs/guides/tutorials/tool-calling-with-serverless-endpoints Build AI agents with Friendli Serverless Endpoints using tool calling for dynamic, real-time interactions with LLMs. export const RoundedBorderBox = ({children, caption}) =>

{children} {caption &&

{caption}

}

; ## Goals * Use tool calling to build your own AI agent with [**Friendli Serverless Endpoints**](https://friendli.ai/products/serverless-endpoints) * Check out the examples below to see how you can interact with state-of-the-art language models while letting them search the web, run Python code, etc. * Feel free to make your own custom tools! ## Getting Started 1. Head to [**https://friendli.ai**](https://friendli.ai/get-started/serverless-endpoints), and create an account. 2. Grab a [Friendli Token](https://friendli.ai/suite/setting/tokens) to use Friendli Serverless Endpoints within an agent. ## 🚀 Step 1. Playground UI Experience tool calling on the Playground! Sidebar

1. On your left sidebar, click the 'Serverless Endpoints' option to access the playground page. 2. You will see models that can be used as Serverless Endpoints. Choose the one you want and select the endpoint. 3. Click 'Tools' button, select Search tool, and enter a query to see the response. 😀 ## 🚀 Step 2. Tool Calling Search interesting information using the `web:search` tool. This time, let's try it by writing python code. 1. Add the user's input as an `user` role message. 2. Add the `web:search` tool to the tools option. ```python # pip install friendli import os from friendli import SyncFriendli with SyncFriendli( token=os.getenv("FRIENDLI_TOKEN", ""), ) as friendli: res = friendli.serverless.tool_assisted_chat.complete( model="meta-llama-3.1-8b-instruct", messages=[ { "role": "user", "content": "Find information on the popular movies currently showing in theaters and provide their ratings.", }, ], tools=[{"type": "web:search"}], max_tokens=200, ) print(res) ``` ## 🚀 Step 3. Multiple tool calling Use multiple tools at once to calculate "How long it will take you to buy a house in the San Francisco Bay Area based on your annual salary". Here is the available built-in tools. * `math:calculator` (tool for calculating arithmetic operations) * `math:statistics` (tool for analyzing statistic data) * `math:calendar` (tool for handling date-related data) * `web:search` (tool for retrieving data through the web search) * `web:url` (tool for extracting data from a given website) * `code:python-interpreter` (tool for writing and executing python code) * `file:text` (tool for extracting text data from a given file) ### Example Answer sheet ``` Prompt: My annual salary is $ 100k. How long it will take to buy a house in San Francisco Bay Area? (`web:search` & `math:calculator` used) Answer: Based on the web search results, the median price of an existing single-family home in the Bay Area is around $1.25 million. Using a calculator to calculate how long it would take to buy a house in the San Francisco Bay Area with an annual salary of $100,000, we get: $1,200,000 (house price) / $100,000 (annual salary) = 12 years So, it would take approximately 12 years to buy a house in the San Francisco Bay Area with an annual salary of $100,000, assuming you save your entire salary each year and don't consider other factors like interest rates, taxes, and living expenses. ``` ## 🚀 Step 4. Build a custom tool Build your own creative tool. We will show you how to make a custom tool that retrieves temperature information. (Completed code snippet is provided at the bottom) 1. **Define a function for using as a custom tool** ```python def get_temperature(location: str) -> int: """Mock function that returns the city temperature""" if "new york" in location.lower(): return 45 if "san francisco" in location.lower(): return 72 return 30 ``` 2. **Send a function calling inference request** 1. Add the user's input as an `user` role message. 2. The information about the custom function (e.g., `get_temperature`) goes into the tools option. The function's parameters are described in JSON schema. 3. The response includes the `arguments` field, which are values extracted from the user's input that can be used as parameters of the custom function. ```python # pip install friendli import os from friendli import SyncFriendli token = os.environ.get("FRIENDLI_TOKEN") or "YOUR_FRIENDLI_TOKEN" client= SyncFriendli(token=token) user_prompt = "I live in New York. What should I wear for today's weather?" messages = [ { "role": "user", "content": user_prompt, }, ] tools=[ { "type": "function", "function": { "name": "get_temperature", "description": "Get the temperature information in a given location.", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The name of current location e.g., New York", }, }, }, }, }, ] chat = client.serverless.chat.complete( model="meta-llama-3.3-70b-instruct", messages=messages, tools=tools, temperature=0, frequency_penalty=1, ) print(chat) ``` 3. **Generate the final response using the tool calling results** 1. Add the `tool_calls` response as an `assistant` role message. 2. Add the result obtained by calling the `get_weather` function as a `tool` message to the Chat API again. ```python import json func_kwargs = json.loads(chat.choices[0].message.tool_calls[0].function.arguments) temperature_info = get_temperature(**func_kwargs) messages.append( { "role": "assistant", "tool_calls": [ tool_call.model_dump() for tool_call in chat.choices[0].message.tool_calls ] } ) messages.append( { "role": "tool", "content": str(temperature_info), "tool_call_id": chat.choices[0].message.tool_calls[0].id } ) chat_w_info = client.serverless.chat.complete( model="meta-llama-3.3-70b-instruct", tools=tools, messages=messages, ) for choice in chat_w_info.choices: print(choice.message.content) ``` * **Complete Code Snippet** ```python # pip install friendli import json import os from friendli import SyncFriendli token = os.environ.get("FRIENDLI_TOKEN") or "YOUR_FRIENDLI_TOKEN" client = SyncFriendli(token=token) user_prompt = "I live in New York. What should I wear for today's weather?" messages = [ { "role": "user", "content": user_prompt, }, ] tools=[ { "type": "function", "function": { "name": "get_temperature", "description": "Get the temperature information in a given location.", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The name of current location e.g., New York", }, }, }, }, }, ] chat = client.serverless.chat.complete( model="meta-llama-3.3-70b-instruct", messages=messages, tools=tools, temperature=0, frequency_penalty=1, ) def get_temperature(location: str) -> int: """Mock function that returns the city temperature""" if "new york" in location.lower(): return 45 if "san francisco" in location.lower(): return 72 return 30 func_kwargs = json.loads(chat.choices[0].message.tool_calls[0].function.arguments) temperature_info = get_temperature(**func_kwargs) messages.append( { "role": "assistant", "tool_calls": [ tool_call.model_dump() for tool_call in chat.choices[0].message.tool_calls ] } ) messages.append( { "role": "tool", "content": str(temperature_info), "tool_call_id": chat.choices[0].message.tool_calls[0].id } ) chat_w_info = client.serverless.chat.complete( model="meta-llama-3.3-70b-instruct", tools=tools, messages=messages, ) for choice in chat_w_info.choices: print(choice.message.content) ``` ## 🎉 Congratulations! Following the above instructions, we've experienced the whole process of defining and using a custom tool to generate an accurate and rich answer from LLM models! Brainstorm creative ideas for your agent by reading our blog articles! * [**Building an AI Agent for Google Calendar**](https://friendli.ai/blog/ai-agent-google-calendar) * [**Hassle-free LLM Fine-tuning with FriendliAI and Weights & Biases**](https://friendli.ai/blog/llm-fine-tuning-friendliai-wandb) * [**Building AI Agents Using Function Calling with LLMs**](https://friendli.ai/blog/ai-agents-function-calling) * [**Function Calling: Connecting LLMs with Functions and APIs**](https://friendli.ai/blog/llm-function-calling) # Deploy from W&B Registry with Webhook Source: https://friendli.ai/docs/guides/tutorials/wandb-registry-with-dedicated-endpoints Hands-on tutorial for launching and deploying LLMs using Friendli Dedicated Endpoints with Weights & Biases artifacts through webhook automation. export const RoundedBorderBox = ({children, caption}) =>

{children} {caption &&

{caption}

}

; ## Introduction This tutorial is designed to guide you through the process of easily deploying your models from the [W\&B Registry](https://docs.wandb.ai/guides/core/registry/) to Friendli Dedicated Endpoints in the W\&B UI. Through a series of step-by-step instructions and hands-on examples, you’ll learn how to: * **Configure a webhook** in W\&B to trigger deployments to Friendli Dedicated Endpoints. * **Create a [webhook automation](https://docs.wandb.ai/guides/core/automations/create-automations/webhook/)** to automatically deploy model artifacts when adding new versions. * **Deploy a model artifact** to Friendli Dedicated Endpoints by adding an alias in the W\&B Registry. * **Understand how adding and removing aliases** affects deployments on Friendli Dedicated Endpoints. ### Why use W\&B webhook automation with Friendli Dedicated Endpoints? W\&B users often rely on W\&B Registry to manage the lifecycle of models – from tracking experiment artifacts to promoting the best-performing models for production use. As a W\&B user, integrating Friendli Dedicated Endpoints directly into this workflow allows you to: * **Streamline deployment**: Transition your models from experimentation to production with minimal effort. By leveraging W\&B’s aliasing system and FriendliAI’s automated infrastructure, you eliminate the need for custom scripts or manual configurations. * **Ensure deployment consistency**: Friendli Dedicated Endpoints include support for `idempotencyKey` to ensure the reliability of automated workflows. Each deployment trigger via webhook automation is tracked with a unique `idempotencyKey`, ensuring that operations like endpoint creation or updates are processed exactly once. It prevents duplicate or conflicting operations, giving you confidence in the consistency of your deployment. By the end of this tutorial, you’ll be equipped with the knowledge and skills necessary to seamlessly transfer your models from W\&B Registry to Friendli Dedicated Endpoints for efficient deployment. So, let’s get started and explore the possibilities of Friendli Dedicated Endpoints! ## Prerequisites * A Friendli Suite account with access to [Friendli Dedicated Endpoints](https://friendli.ai/docs/guides/dedicated_endpoints/introduction). * A [personal access token](https://friendli.ai/docs/guides/personal_access_tokens) generated through Friendli Suite. ## Step 1: Create a secret Weights & Biases Team settings

1. Navigate to the [team’s page](https://wandb.ai/home) on W\&B and click on **Team settings**. 2. Scroll down to the **Team secrets** section and click **New secret**. 3. Go to [Friendli Suite](https://friendli.ai/suite) and navigate to **[Personal settings > Tokens](https://friendli.ai/suite/setting/tokens)** and click **Create new token**. 4. Copy your [personal access token](https://friendli.ai/docs/guides/personal_access_tokens). 5. Return to W\&B and fill in the **Secret** with the personal access token generated through Friendli Suite. Weights & Biases add team secret

## Step 2: Configure a webhook 1. From the same W\&B team settings page, click on **New webhook** in the **Webhooks** section. 2. Fill in the **URL** field with **Friendli Suite Rest API URL** (see more details [here](/openapi/dedicated/endpoint/wandb-artifact-create)) and **Access token** field with the secret already created through Friendli Suite. Configure webhook from Weights & Biases

## Step 3: Create a webhook automation 1. Go to your W\&B Registry Model and click on **View details** of the model you want to deploy. 2. Click on **Create automation** in the **Automations** section. Create webhook automation from Weights & Biases

Create webhook automation from Weights & Biases

3. Select **An artifact alias is added** for the **Event**. 4. Enter an alias you want to use to trigger the deployment for the **Alias regex**. Add an alias from Weights & Biases

5. Select the **Webhooks** for **Action type**. 6. Select the webhook configured with Friendli Dedicated Endpoints for **Webhook**. 7. Fill out the box by referring to the following example for **Payload**. Webhook automation payload example

#### Example: Configuration for payload ```json { "wandbArtifactVersionName": "${artifact_version_string}" } ``` | Field | Description | | -------------------------- | ------------------------------------------ | | `wandbArtifactVersionName` | Specific model artifact version from W\&B. | ```json { "wandbArtifactVersionName": "${artifact_version_string}", "name": "Generated from WandB ${project_name}/${artifact_collection_name}", "projectId": "project-id", "idempotencyKey": "${alias}", "accelerator": { "type": "NVIDIA H100", "count": 1 }, "autoscalingPolicy": { "minReplica": 0, "maxReplica": 2, "cooldownPeriod": 300 } } ``` | Field | Description | | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | | `wandbArtifactVersionName` | Specific model artifact version from W\&B. | | `name` | Name of the endpoint. | | `projectId` | Specific project ID of where the endpoint will be created. | | `idempotencyKey` | Unique value to track which webhook automation triggered an endpoint roll out. Use any unique value, but using the example value provided is recommended. | | `accelerator` | Hardware for the endpoint. | | `autoscalingPolicy` | Autoscaling settings for the endpoint. | To gain more control over GPU resources for an endpoint, configure the `accelerator` field by specifying the desired type and count. This is particularly useful for serving large models that require model or data parallelism. ```json { "wandbArtifactVersionName": "${artifact_version_string}", "name": "Generated from WandB ${project_name}/${artifact_collection_name}", "accelerator": { "type": "NVIDIA H100", "count": 4 }, } ``` | Field | Description | | ------------------- | ---------------------------------- | | `accelerator.type` | Specifies the instance type. | | `accelerator.count` | Specifies the number of instances. | View more details about each field [here](/openapi/dedicated/endpoint/wandb-artifact-create). ## Step 4: Deploy a model artifact Deploy your model artifact to Friendli Dedicated Endpoints by simply adding the alias set in **Step 3** to a model artifact version! Deploy model artifact to Friendli Dedicated Endpoints

Deploy model artifact to Friendli Dedicated Endpoints

After adding the alias, you can see the endpoint created in Friendli Dedicated Endpoints.

Endpoint created in Friendli Dedicated Endpoints UI

## Step 5: Roll out a model artifact (Advanced) To roll out an endpoint to a new model artifact version, simply add the same alias to the new version you want to deploy. This updates the endpoint to use the new model artifact version. After assigning the alias, the endpoint will update to reflect the new version in Friendli Dedicated Endpoints. Roll out to version 1 in Weights & Biases

Roll out to version 1 in Weights & Biases

An `idempotencyKey` is required to roll out an endpoint between different model artifact versions. ```json {9} { "wandbArtifactVersionName": "${artifact_version_string}", "name": "Generated from WandB ${project_name}/${artifact_collection_name}", "accelerator": { "type": "NVIDIA H100", "count": 1 }, "projectId": "project-id", "idempotencyKey": "${alias}", "autoscalingPolicy": { "minReplica": 0, "maxReplica": 2, "cooldownPeriod": 300 } } ``` ## Step 6: Track the history of deployment versions Use the Friendli Dedicated Endpoints versioning feature to track the history of your model deployments and maintain a clear record of every update. By adding an alias to a model artifact version, you can deploy models and roll out updates across versions efficiently, without needing to create a new endpoint from scratch. * When an alias is reassigned to a different version, the existing endpoint will automatically roll out to the new version. Friendli Dedicated Endpoints Versions

In the diagram, * `v0` represents the first deployed version of the model when the endpoint was created. * `v1` is a newer model artifact version that the alias was reassigned to, triggering a rollout to update the endpoint accordingly. View more details about the versioning feature [here](/guides/dedicated_endpoints/versions). ## Frequently Asked Questions The model artifact version will be deployed as the number of aliases added. Within a model collection, only one artifact version can have a given alias at any time. Therefore, adding an alias to a new artifact version will automatically remove it from the previously aliased version with the same alias. One webhook automation is assigned to one Friendli Dedicated Endpoint. Nothing happens to the endpoint. Removing an alias will not delete the endpoint. However, if you add the removed alias to a new model artifact version, the deployed endpoint will roll out to that version. If an `idempotencyKey` is included in the payload, moving an alias to a different model artifact version will reassign the created endpoint to the new version within the same project. When adding an alias to a model artifact version for the first time, an endpoint will be created in either an existing or a new project within your default team of Friendli Suite. If `projectId` is specified, the endpoint will be made in an existing project. Otherwise, a new project will be created. ## Feedback or issue If you have any feedback or issues about the integration with Friendli Dedicated Endpoints, please ask for support by sending an email to [Support](mailto:support@friendli.ai). # Vision: Image understanding with Friendli Source: https://friendli.ai/docs/guides/vision Guide to using Friendli's Vision feature for image analysis. Covers usage via Playground and API (URL & Base64 examples). The Vision feature is available when the model supports vision capabilities. Friendli is equipped with a new Vision feature that can understand and analyze images, opening up exciting possibilities for multi-modal interactions. This guide explains how to work with images in Friendli, including best practices and code examples. ### How to Use Vision Utilize Friendli's Vision features through the following: * Select and test a vision model at [friendli.ai/playground](https://friendli.ai/playground). * Use the API to process images and receive the model's responses, referring to the methods described in this document. ### Supported Image Formats Supports formats supported by the PIL library, including jpg, png and avif. * JPEG (.jpeg and .jpg) * PNG (.png) * AVIF (.avif) ### Using the API ```python URL-based image {22} import os from openai import OpenAI client = OpenAI( base_url="https://api.friendli.ai/dedicated/v1", api_key=os.environ.get("FRIENDLI_TOKEN"), ) image_url = "https://upload.wikimedia.org/wikipedia/commons/9/9e/Ours_brun_parcanimalierpyrenees_1.jpg" completion = client.chat.completions.create( # Replace YOUR_ENDPOINT_ID with the ID of your endpoint, e.g. "zbimjgovmlcb" model="YOUR_ENDPOINT_ID", messages=[ { "role": "user", "content": [ { "type": "text", "text": "What kind of animal is shown in the image?", }, {"type": "image_url", "image_url": {"url": image_url}}, ], }, ], stream=False ) print(completion.choices[0].message.content) ``` ```python Base64-encoded image {28-30} import base64, requests, os from openai import OpenAI client = OpenAI( base_url="https://api.friendli.ai/dedicated/v1", api_key=os.environ.get("FRIENDLI_TOKEN"), ) image_url = "https://upload.wikimedia.org/wikipedia/commons/9/9e/Ours_brun_parcanimalierpyrenees_1.jpg" image_media_type = "image/jpg" image_base64 = base64.standard_b64encode(requests.get(image_url).content).decode( "utf-8" ) completion = client.chat.completions.create( # Replace YOUR_ENDPOINT_ID with the ID of your endpoint, e.g. "zbimjgovmlcb" model="YOUR_ENDPOINT_ID", messages=[ { "role": "user", "content": [ { "type": "text", "text": "What kind of animal is shown in the image?", }, { "type": "image_url", "image_url": { "url": f"data:{image_media_type};base64,{image_base64}" }, }, ], }, ], ) print(completion.choices[0].message.content) ``` # Container chat completions Source: https://friendli.ai/docs/openapi/container/chat-completions post /v1/chat/completions Given a list of messages forming a conversation, the model generates a response. When streaming mode is used (i.e., `stream` option is set to `true`), the response is in MIME type `text/event-stream`. Otherwise, the content type is `application/json`. You can view the schema of the streamed sequence of chunk objects in streaming mode [here](/openapi/container/chat-completions-chunk-object). # Container chat completions chunk object Source: https://friendli.ai/docs/openapi/container/chat-completions-chunk-object Represents a streamed chunk of a chat completions response returned by model, based on the provided input. ```json Response data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "role": "assistant", "content": "This" }, "finish_reason": null, "logprobs": null } ], "usage": null, "created": 1726294381 } data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "content": " is" }, "finish_reason": null, "logprobs": null } ], "usage": null, "created": 1726294381 } ... data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": {}, "finish_reason": "stop", "logprobs": null } ], "usage": null, "created": 1726294383 } data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "object": "chat.completion.chunk", "choices": [], "usage": { "prompt_tokens": 8, "completion_tokens": 4, "total_tokens": 12 }, "created": 1726294402 } data: [DONE] ``` ```json With tools data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "role": "assistant", "content": "This" }, "finish_reason": null, "logprobs": null } ], "usage": null, "created": 1726294442 } ... data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "role": "assistant", "tool_calls": [ { "index": 0, "id": "call_TARbemDG9CFdwuoaQBTRXiYK", "type": "function", "function": { "name": "func", "arguments": "{\"" } } ] }, "finish_reason": null, "logprobs": null } ], "usage": null, "created": 1726294442 } data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "role": "assistant", "tool_calls": [ { "index": 0, "type": "function", "function": { "arguments": "arg" } } ] }, "finish_reason": null, "logprobs": null } ], "usage": null, "created": 1726294442 } ... data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "role": "assistant", "tool_calls": [ { "index": 0, "type": "function", "function": { "arguments": "}" } } ] }, "finish_reason": null, "logprobs": null } ], "usage": null, "created": 1726294442 } data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": {}, "finish_reason": "tool_calls", "logprobs": null } ], "usage": null, "created": 1726294442 } data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "object": "chat.completion.chunk", "choices": [], "usage": { "prompt_tokens": 468, "completion_tokens": 59, "total_tokens": 527 }, "created": 1726294443 } data: [DONE] ``` A unique ID of the chat completion. The object type, which is always set to `chat.completion.chunk`. The model to generate the completion. The index of the choice in the list of generated choices. Role of the generated message author, in this case `assistant`. The contents of the assistant message. The index of tool call being generated. The ID of the tool call. The type of the tool, which is always set to `function`. The name of the function to call. The arguments for calling the function, generated by the model in JSON format. Ensure to validate these arguments in your code before invoking the function since the model may not always produce valid JSON. Termination condition of the generation. `stop` means the API returned the full chat completions generated by the model without running into any limits. `length` means the generation exceeded `max_tokens` or the conversation exceeded the max context length. `tool_calls` means the API has generated tool calls. Available options: `stop`, `length`, `tool_calls` Log probability information for the choice. A list of message content tokens with log probability information. The token. The log probability of this token. A list of integers representing the UTF-8 bytes representation of the token. Useful in instances where characters are represented by multiple tokens and their byte representations must be combined to generate the correct text representation. Can be `null` if there is no bytes representation for the token. List of the most likely tokens and their log probability, at this token position. The token. The log probability of this token. A list of integers representing the UTF-8 bytes representation of the token. Useful in instances where characters are represented by multiple tokens and their byte representations must be combined to generate the correct text representation. Can be `null` if there is no bytes representation for the token. Number of tokens in the prompt. Number of tokens in the generated chat completions. Total number of tokens used in the request (`prompt_tokens` + `completion_tokens`). The Unix timestamp (in seconds) for when the token sampled. # Container completions Source: https://friendli.ai/docs/openapi/container/completions post /v1/completions Generate text based on the given text prompt. When streaming mode is used (i.e., `stream` option is set to `true`), the response is in MIME type `text/event-stream`. Otherwise, the content type is `application/json`. You can view the schema of the streamed sequence of chunk objects in streaming mode [here](/openapi/container/completions-chunk-object). # Container completions chunk object Source: https://friendli.ai/docs/openapi/container/completions-chunk-object Represents a streamed chunk of a completions response returned by model, based on the provided input. ```json Response data: { "id": "cmpl-26a1e10db8544bc3adb488d2d205288b", "object": "text_completion", "choices": [ { "index": 0, "text": " such", "token": 1778, "finish_reason": null, "logprobs": null } ], "created": 1733382157 } data: { "id": "cmpl-26a1e10db8544bc3adb488d2d205288b", "object": "text_completion", "choices": [ { "index": 0, "text": " as", "token": 439, "finish_reason": null, "logprobs": null } ], "created": 1733382157 } ... data: { "id": "cmpl-26a1e10db8544bc3adb488d2d205288b", "object": "text_completion", "choices": [ { "index": 0, "text": "", "finish_reason": "length", "logprobs": null } ], "created": 1733382157 } data: { "id": "cmpl-26a1e10db8544bc3adb488d2d205288b", "object": "text_completion", "choices": [], "usage": { "prompt_tokens": 5, "completion_tokens": 10, "total_tokens": 15 }, "created": 1733382157 } data: [DONE] ``` A unique ID of the completion. The object type, which is always set to `text_completion`. The model to generate the completion. The index of the choice in the list of generated choices. The text. The token. Termination condition of the generation. `stop` means the API returned the full completions generated by the model without running into any limits. `length` means the generation exceeded `max_tokens` or the conversation exceeded the max context length. Available options: `stop`, `length` Log probability information for the choice. The starting character position of each token in the generated text, useful for mapping tokens back to their exact location for detailed analysis. The log probabilities of each generated token, indicating the model's confidence in selecting each token. A list of individual tokens generated in the completion, representing segments of text such as words or pieces of words. A list of dictionaries, where each dictionary represents the top alternative tokens considered by the model at a specific position in the generated text, along with their log probabilities. The number of items in each dictionary matches the value of `logprobs`. Number of tokens in the prompt. Number of tokens in the generated completions. Total number of tokens used in the request (`prompt_tokens` + `completion_tokens`). The Unix timestamp (in seconds) for when the token sampled. # Container detokenization Source: https://friendli.ai/docs/openapi/container/detokenization post /v1/detokenize By giving a list of tokens, generate a detokenized output text string. # Container image generations Source: https://friendli.ai/docs/openapi/container/image-generations post /v1/images/generations Given a description, the model generates image. # Container overview Source: https://friendli.ai/docs/openapi/container/overview OpenAPI reference of Friendli Container API. ### Inference Discover how to generate text through interactive conversations. Learn how to generate text. Explore the process of breaking down text into smaller tokens for machine processing. Learn how to reconstruct tokenized text back into its original, human-readable form. Learn how to generate images. # Container tokenization Source: https://friendli.ai/docs/openapi/container/tokenization post /v1/tokenize By giving a text input, generate a tokenized output of token IDs. # Add samples Source: https://friendli.ai/docs/openapi/dataset/add-samples post /beta/dataset/{dataset_id}/split/{split_id}/sample Add samples to dataset. To request successfully, it is required to enter a **Friendli Token** (e.g. flp\_XXX) in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn more and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Create a new dataset Source: https://friendli.ai/docs/openapi/dataset/create-a-new-dataset post /beta/dataset Create a new dataset. To request successfully, it is required to enter a **Friendli Token** (e.g. flp\_XXX) in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn more and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Create a new split Source: https://friendli.ai/docs/openapi/dataset/create-a-split post /beta/dataset/{dataset_id}/split Create a new split. To request successfully, it is required to enter a **Friendli Token** (e.g. flp\_XXX) in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn more and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Create a new version Source: https://friendli.ai/docs/openapi/dataset/create-a-version post /beta/dataset/{dataset_id}/version Create a new version. To request successfully, it is required to enter a **Friendli Token** (e.g. flp\_XXX) in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn more and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Delete a version Source: https://friendli.ai/docs/openapi/dataset/delete-a-version delete /beta/dataset/{dataset_id}/version/{version_id} Delete a version. To request successfully, it is required to enter a **Friendli Token** (e.g. flp\_XXX) in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn more and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Delete a dataset Source: https://friendli.ai/docs/openapi/dataset/delete-dataset delete /beta/dataset/{dataset_id} Delete a dataset. To request successfully, it is required to enter a **Friendli Token** (e.g. flp\_XXX) in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn more and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Delete samples Source: https://friendli.ai/docs/openapi/dataset/delete-samples post /beta/dataset/{dataset_id}/split/{split_id}/sample/delete Delete samples. To request successfully, it is required to enter a **Friendli Token** (e.g. flp\_XXX) in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn more and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Delete a split Source: https://friendli.ai/docs/openapi/dataset/delete-split delete /beta/dataset/{dataset_id}/split/{split_id} Delete a split. To request successfully, it is required to enter a **Friendli Token** (e.g. flp\_XXX) in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn more and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Get dataset info Source: https://friendli.ai/docs/openapi/dataset/get-dataset-info get /beta/dataset/{dataset_id} Get dataset info. To request successfully, it is required to enter a **Friendli Token** (e.g. flp\_XXX) in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn more and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Get split info Source: https://friendli.ai/docs/openapi/dataset/get-split-info get /beta/dataset/{dataset_id}/split/{split_id} Get split info. To request successfully, it is required to enter a **Friendli Token** (e.g. flp\_XXX) in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn more and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Get version info Source: https://friendli.ai/docs/openapi/dataset/get-version-info get /beta/dataset/{dataset_id}/version/{version_id} Get version info. To request successfully, it is required to enter a **Friendli Token** (e.g. flp\_XXX) in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn more and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # List datasets Source: https://friendli.ai/docs/openapi/dataset/list-datasets get /beta/dataset List datasets. To request successfully, it is required to enter a **Friendli Token** (e.g. flp\_XXX) in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn more and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # List samples Source: https://friendli.ai/docs/openapi/dataset/list-samples get /beta/dataset/{dataset_id}/split/{split_id}/sample List samples. To request successfully, it is required to enter a **Friendli Token** (e.g. flp\_XXX) in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn more and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # List splits Source: https://friendli.ai/docs/openapi/dataset/list-splits get /beta/dataset/{dataset_id}/split List splits. To request successfully, it is required to enter a **Friendli Token** (e.g. flp\_XXX) in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn more and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # List versions Source: https://friendli.ai/docs/openapi/dataset/list-versions get /beta/dataset/{dataset_id}/version List versions. To request successfully, it is required to enter a **Friendli Token** (e.g. flp\_XXX) in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn more and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Dataset overview Source: https://friendli.ai/docs/openapi/dataset/overview OpenAPI reference of Friendli Dataset API. ### Dataset Management (Beta) Discover how to list datasets. Discover how to list versions of a dataset. Discover how to list splits of a dataset version. Discover how to list samples in a dataset split. Discover how to get information about a dataset. Discover how to get information about a dataset version. Discover how to get information about a dataset split. Discover how to create a new dataset. Discover how to create a new version of a dataset. Discover how to create a new split in a dataset. Discover how to add samples to a dataset. Discover how to delete samples from a dataset. Discover how to update samples in a dataset. Discover how to delete a dataset version. Discover how to delete a dataset. Discover how to delete a dataset split. # Update samples Source: https://friendli.ai/docs/openapi/dataset/update-samples put /beta/dataset/{dataset_id}/split/{split_id}/sample Update samples. To request successfully, it is required to enter a **Friendli Token** (e.g. flp\_XXX) in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn more and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Dedicated create endpoint Source: https://friendli.ai/docs/openapi/dedicated/endpoint/create post /dedicated/beta/endpoint Create a Dedicated Endpoint deployment for a Hugging Face model. To request successfully, it is mandatory to enter a **Friendli Token** (e.g. flp\_XXX) value in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn how to acquire this variable and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Dedicated delete endpoint Source: https://friendli.ai/docs/openapi/dedicated/endpoint/delete delete /dedicated/beta/endpoint/{endpoint_id} Delete an endpoint. To request successfully, it is mandatory to enter a **Friendli Token** (e.g. flp\_XXX) value in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn how to acquire this variable and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Dedicated get endpoint Source: https://friendli.ai/docs/openapi/dedicated/endpoint/get-spec get /dedicated/beta/endpoint/{endpoint_id} Given an endpoint ID, return its specification. To request successfully, it is mandatory to enter a **Friendli Token** (e.g. flp\_XXX) value in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn how to acquire this variable and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Dedicated get endpoint status Source: https://friendli.ai/docs/openapi/dedicated/endpoint/get-status get /dedicated/beta/endpoint/{endpoint_id}/status Given an endpoint ID, return its current status. To request successfully, it is mandatory to enter a **Friendli Token** (e.g. flp\_XXX) value in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn how to acquire this variable and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Dedicated get endpoint version Source: https://friendli.ai/docs/openapi/dedicated/endpoint/get-version get /dedicated/beta/endpoint/{endpoint_id}/version Given an endpoint ID, return its version history. To request successfully, it is mandatory to enter a **Friendli Token** (e.g. flp\_XXX) value in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn how to acquire this variable and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Dedicated list endpoints Source: https://friendli.ai/docs/openapi/dedicated/endpoint/list get /dedicated/beta/endpoint List Dedicated Endpoint deployments. To request successfully, it is mandatory to enter a **Friendli Token** (e.g. flp\_XXX) value in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn how to acquire this variable and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Dedicated restart endpoint Source: https://friendli.ai/docs/openapi/dedicated/endpoint/restart put /dedicated/beta/endpoint/{endpoint_id}/restart Restart a failed or terminated Dedicated Endpoint. To request successfully, it is mandatory to enter a **Friendli Token** (e.g. flp\_XXX) value in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn how to acquire this variable and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Dedicated sleep endpoint Source: https://friendli.ai/docs/openapi/dedicated/endpoint/sleep put /dedicated/beta/endpoint/{endpoint_id}/sleep Put a Dedicated Endpoint to sleep mode. To request successfully, it is mandatory to enter a **Friendli Token** (e.g. flp\_XXX) value in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn how to acquire this variable and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Dedicated terminate endpoint Source: https://friendli.ai/docs/openapi/dedicated/endpoint/terminate put /dedicated/beta/endpoint/{endpoint_id}/terminate Terminate an endpoint. To request successfully, it is mandatory to enter a **Friendli Token** (e.g. flp\_XXX) value in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn how to acquire this variable and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Dedicated update endpoint Source: https://friendli.ai/docs/openapi/dedicated/endpoint/update put /dedicated/beta/endpoint/{endpoint_id} Update a Dedicated Endpoint deployment with new configuration. To request successfully, it is mandatory to enter a **Friendli Token** (e.g. flp\_XXX) value in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn how to acquire this variable and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Dedicated wake endpoint Source: https://friendli.ai/docs/openapi/dedicated/endpoint/wake put /dedicated/beta/endpoint/{endpoint_id}/wake Wake up a sleeping Dedicated Endpoint. To request successfully, it is mandatory to enter a **Friendli Token** (e.g. flp\_XXX) value in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn how to acquire this variable and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Dedicated create endpoint from W&B artifact Source: https://friendli.ai/docs/openapi/dedicated/endpoint/wandb-artifact-create post /dedicated/endpoint/wandb-artifact-create Create an endpoint from Weights & Biases artifact. To request successfully, it is mandatory to enter a **Friendli Token** (e.g. flp\_XXX) value in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn how to acquire this variable and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. # Dedicated audio transcriptions Source: https://friendli.ai/docs/openapi/dedicated/inference/audio-transcriptions post /dedicated/v1/audio/transcriptions Given an audio file, the model transcribes it into text. To request successfully, it is mandatory to enter a **Friendli Token** (e.g. flp\_XXX) value in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn how to acquire this variable and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Dedicated chat completions Source: https://friendli.ai/docs/openapi/dedicated/inference/chat-completions post /dedicated/v1/chat/completions Given a list of messages forming a conversation, the model generates a response. To request successfully, it is mandatory to enter a **Friendli Token** (e.g. flp\_XXX) value in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn how to acquire this variable and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. When streaming mode is used (i.e., `stream` option is set to `true`), the response is in MIME type `text/event-stream`. Otherwise, the content type is `application/json`. You can view the schema of the streamed sequence of chunk objects in streaming mode [here](/openapi/dedicated/inference/chat-completions-chunk-object). # Dedicated chat completions chunk object Source: https://friendli.ai/docs/openapi/dedicated/inference/chat-completions-chunk-object Represents a streamed chunk of a chat completions response returned by model, based on the provided input. ```json Response data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "(endpoint-id)", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "role": "assistant", "content": "This" }, "finish_reason": null, "logprobs": null } ], "usage": null, "created": 1726294381 } data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "(endpoint-id)", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "content": " is" }, "finish_reason": null, "logprobs": null } ], "usage": null, "created": 1726294381 } ... data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "(endpoint-id)", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": {}, "finish_reason": "stop", "logprobs": null } ], "usage": null, "created": 1726294383 } data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "(endpoint-id)", "object": "chat.completion.chunk", "choices": [], "usage": { "prompt_tokens": 8, "completion_tokens": 4, "total_tokens": 12 }, "created": 1726294402 } data: [DONE] ``` ```json With tools data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "(endpoint-id)", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "role": "assistant", "content": "This" }, "finish_reason": null, "logprobs": null } ], "usage": null, "created": 1726294442 } ... data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "(endpoint-id)", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "role": "assistant", "tool_calls": [ { "index": 0, "id": "call_TARbemDG9CFdwuoaQBTRXiYK", "type": "function", "function": { "name": "func", "arguments": "{\"" } } ] }, "finish_reason": null, "logprobs": null } ], "usage": null, "created": 1726294442 } data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "(endpoint-id)", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "role": "assistant", "tool_calls": [ { "index": 0, "type": "function", "function": { "arguments": "arg" } } ] }, "finish_reason": null, "logprobs": null } ], "usage": null, "created": 1726294442 } ... data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "(endpoint-id)", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "role": "assistant", "tool_calls": [ { "index": 0, "type": "function", "function": { "arguments": "}" } } ] }, "finish_reason": null, "logprobs": null } ], "usage": null, "created": 1726294442 } data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "(endpoint-id)", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": {}, "finish_reason": "tool_calls", "logprobs": null } ], "usage": null, "created": 1726294442 } data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "(endpoint-id)", "object": "chat.completion.chunk", "choices": [], "usage": { "prompt_tokens": 468, "completion_tokens": 59, "total_tokens": 527 }, "created": 1726294443 } data: [DONE] ``` A unique ID of the chat completion. The object type, which is always set to `chat.completion.chunk`. The model to generate the completion. For dedicated endpoints, it returns the endpoint id. The index of the choice in the list of generated choices. Role of the generated message author, in this case `assistant`. The contents of the assistant message. The index of tool call being generated. The ID of the tool call. The type of the tool, which is always set to `function`. The name of the function to call. The arguments for calling the function, generated by the model in JSON format. Ensure to validate these arguments in your code before invoking the function since the model may not always produce valid JSON. Termination condition of the generation. `stop` means the API returned the full chat completions generated by the model without running into any limits. `length` means the generation exceeded `max_tokens` or the conversation exceeded the max context length. `tool_calls` means the API has generated tool calls. Available options: `stop`, `length`, `tool_calls` Log probability information for the choice. A list of message content tokens with log probability information. The token. The log probability of this token. A list of integers representing the UTF-8 bytes representation of the token. Useful in instances where characters are represented by multiple tokens and their byte representations must be combined to generate the correct text representation. Can be `null` if there is no bytes representation for the token. List of the most likely tokens and their log probability, at this token position. The token. The log probability of this token. A list of integers representing the UTF-8 bytes representation of the token. Useful in instances where characters are represented by multiple tokens and their byte representations must be combined to generate the correct text representation. Can be `null` if there is no bytes representation for the token. Number of tokens in the prompt. Number of tokens in the generated chat completions. Total number of tokens used in the request (`prompt_tokens` + `completion_tokens`). The Unix timestamp (in seconds) for when the token sampled. # Dedicated completions Source: https://friendli.ai/docs/openapi/dedicated/inference/completions post /dedicated/v1/completions Generate text based on the given text prompt. To request successfully, it is mandatory to enter a **Friendli Token** (e.g. flp\_XXX) value in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn how to acquire this variable and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. When streaming mode is used (i.e., `stream` option is set to `true`), the response is in MIME type `text/event-stream`. Otherwise, the content type is `application/json`. You can view the schema of the streamed sequence of chunk objects in streaming mode [here](/openapi/dedicated/inference/completions-chunk-object). # Dedicated completions chunk object Source: https://friendli.ai/docs/openapi/dedicated/inference/completions-chunk-object Represents a streamed chunk of a completions response returned by model, based on the provided input. ```json Response data: { "id": "cmpl-26a1e10db8544bc3adb488d2d205288b", "model": "(endpoint-id)", "object": "text_completion", "choices": [ { "index": 0, "text": " such", "token": 1778, "finish_reason": null, "logprobs": null } ], "created": 1733382157 } data: { "id": "cmpl-26a1e10db8544bc3adb488d2d205288b", "model": "(endpoint-id)", "object": "text_completion", "choices": [ { "index": 0, "text": " as", "token": 439, "finish_reason": null, "logprobs": null } ], "created": 1733382157 } ... data: { "id": "cmpl-26a1e10db8544bc3adb488d2d205288b", "model": "(endpoint-id)", "object": "text_completion", "choices": [ { "index": 0, "text": "", "finish_reason": "length", "logprobs": null } ], "created": 1733382157 } data: { "id": "cmpl-26a1e10db8544bc3adb488d2d205288b", "model": "(endpoint-id)", "object": "text_completion", "choices": [], "usage": { "prompt_tokens": 5, "completion_tokens": 10, "total_tokens": 15 }, "created": 1733382157 } data: [DONE] ``` A unique ID of the completion. The object type, which is always set to `text_completion`. The model to generate the completion. For dedicated endpoints, it returns the endpoint id. The index of the choice in the list of generated choices. The text. The token. Termination condition of the generation. `stop` means the API returned the full completions generated by the model without running into any limits. `length` means the generation exceeded `max_tokens` or the conversation exceeded the max context length. Available options: `stop`, `length` Log probability information for the choice. The starting character position of each token in the generated text, useful for mapping tokens back to their exact location for detailed analysis. The log probabilities of each generated token, indicating the model's confidence in selecting each token. A list of individual tokens generated in the completion, representing segments of text such as words or pieces of words. A list of dictionaries, where each dictionary represents the top alternative tokens considered by the model at a specific position in the generated text, along with their log probabilities. The number of items in each dictionary matches the value of `logprobs`. Number of tokens in the prompt. Number of tokens in the generated completions. Total number of tokens used in the request (`prompt_tokens` + `completion_tokens`). The Unix timestamp (in seconds) for when the token sampled. # Dedicated detokenization Source: https://friendli.ai/docs/openapi/dedicated/inference/detokenization post /dedicated/v1/detokenize By giving a list of tokens, generate a detokenized output text string. To request successfully, it is mandatory to enter a **Friendli Token** (e.g. flp\_XXX) value in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn how to acquire this variable and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. # Dedicated image generations Source: https://friendli.ai/docs/openapi/dedicated/inference/image-generations post /dedicated/v1/images/generations Given a description, the model generates image(s). To request successfully, it is mandatory to enter a **Friendli Token** (e.g. flp\_XXX) value in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn how to acquire this variable and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Dedicated tokenization Source: https://friendli.ai/docs/openapi/dedicated/inference/tokenization post /dedicated/v1/tokenize By giving a text input, generate a tokenized output of token IDs. To request successfully, it is mandatory to enter a **Friendli Token** (e.g. flp\_XXX) value in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn how to acquire this variable and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. # Dedicated overview Source: https://friendli.ai/docs/openapi/dedicated/overview OpenAPI reference of Friendli Dedicated Endpoints API. ### Inference Discover how to generate text through interactive conversations. Learn how to generate text. Explore the process of breaking down text into smaller tokens for machine processing. Learn how to reconstruct tokenized text back into its original, human-readable form. Learn how to generate images. ### Endpoint (Beta) List Dedicated Endpoint deployments. Given an endpoint ID, return its specification. Given an endpoint ID, return its version history. Given an endpoint ID, return its current status. Create a Dedicated Endpoint deployment for a Hugging Face model. Create an endpoint from Weights & Biases artifact. Update a Dedicated Endpoint deployment with new configuration. Terminate an endpoint. Restart a failed or terminated Dedicated Endpoint. Put a Dedicated Endpoint to sleep mode. Wake up a sleeping Dedicated Endpoint. Delete an endpoint. # Complete file upload Source: https://friendli.ai/docs/openapi/file/complete-file-upload patch /beta/file/{file_id} Complete file upload. To request successfully, it is required to enter a **Friendli Token** (e.g. flp\_XXX) in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn more and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Get file download URL Source: https://friendli.ai/docs/openapi/file/get-file-download-url get /beta/file/{file_id}/download_url Get file download URL. To request successfully, it is required to enter a **Friendli Token** (e.g. flp\_XXX) in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn more and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Get file info Source: https://friendli.ai/docs/openapi/file/get-file-info get /beta/file/{file_id} Get file info. To request successfully, it is required to enter a **Friendli Token** (e.g. flp\_XXX) in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn more and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Initiate file upload Source: https://friendli.ai/docs/openapi/file/init-file-upload post /beta/file Initiate file upload. To request successfully, it is required to enter a **Friendli Token** (e.g. flp\_XXX) in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn more and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # File overview Source: https://friendli.ai/docs/openapi/file/overview OpenAPI reference of Friendli File API. ### File Management (Beta) Discover how to initiate a file upload. Discover how to complete a file upload. Discover how to get information about a file. Discover how to get a download URL for a file. # Friendli Suite API Reference Source: https://friendli.ai/docs/openapi/introduction OpenAPI reference of Friendli Suite API. You can interact with the API through HTTP requests from any language. export const RoundedBorderBox = ({children, caption}) =>

{children} {caption &&

{caption}

}

; To send inference requests, send to the URI with the prefix: `https://api.friendli.ai`.\ For more information, visit [FriendliAI](https://friendli.ai). ## Authentication When using Friendli Suite API for inference requests, you need to provide a **Friendli Token** for authentication and authorization purposes. A Friendli Token serves as an alternative method of authorization to signing in with an email and a password. You can generate a new Friendli Token through the [Friendli Suite](https://friendli.ai/suite), at your **'Personal settings'** page by following the steps below. 1. Go to the [Friendli Suite](https://friendli.ai/suite) and sign in with your account. 2. Click the profile icon at the top-right corner of the page. 3. Click **'Personal settings'** menu.

4. Go to the **'Tokens'** tab on the navigation bar. 5. Create a new Friendli Token by clicking the **'Create token'** button. 6. Copy the token and save it in a safe place. You will not be able to see this token again once the page is refreshed. Tokens

# Serverless chat completions Source: https://friendli.ai/docs/openapi/serverless/chat-completions post /serverless/v1/chat/completions Given a list of messages forming a conversation, the model generates a response. See available models at [this pricing table](/guides/serverless_endpoints/pricing#text-generation-models). To request successfully, it is mandatory to enter a **Friendli Token** (e.g. flp\_XXX) value in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn how to acquire this variable and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. When streaming mode is used (i.e., `stream` option is set to `true`), the response is in MIME type `text/event-stream`. Otherwise, the content type is `application/json`. You can view the schema of the streamed sequence of chunk objects in streaming mode [here](/openapi/serverless/chat-completions-chunk-object). You can explore examples on the [Friendli Serverless Endpoints](https://friendli.ai/get-started/serverless-endpoints) playground and adjust settings with just a few clicks. # Serverless chat completions chunk object Source: https://friendli.ai/docs/openapi/serverless/chat-completions-chunk-object Represents a streamed chunk of a chat completions response returned by model, based on the provided input. ```json Response data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "meta-llama-3.1-8b-instruct", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "role": "assistant", "content": "This" }, "finish_reason": null, "logprobs": null } ], "usage": null, "created": 1726294381 } data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "meta-llama-3.1-8b-instruct", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "content": " is" }, "finish_reason": null, "logprobs": null } ], "usage": null, "created": 1726294381 } ... data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "meta-llama-3.1-8b-instruct", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": {}, "finish_reason": "stop", "logprobs": null } ], "usage": null, "created": 1726294383 } data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "meta-llama-3.1-8b-instruct", "object": "chat.completion.chunk", "choices": [], "usage": { "prompt_tokens": 8, "completion_tokens": 4, "total_tokens": 12 }, "created": 1726294402 } data: [DONE] ``` ```json With tools data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "meta-llama-3.1-8b-instruct", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "role": "assistant", "content": "This" }, "finish_reason": null, "logprobs": null } ], "usage": null, "created": 1726294442 } ... data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "meta-llama-3.1-8b-instruct", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "role": "assistant", "tool_calls": [ { "index": 0, "id": "call_TARbemDG9CFdwuoaQBTRXiYK", "type": "function", "function": { "name": "func", "arguments": "{\"" } } ] }, "finish_reason": null, "logprobs": null } ], "usage": null, "created": 1726294442 } data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "meta-llama-3.1-8b-instruct", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "role": "assistant", "tool_calls": [ { "index": 0, "type": "function", "function": { "arguments": "arg" } } ] }, "finish_reason": null, "logprobs": null } ], "usage": null, "created": 1726294442 } ... data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "meta-llama-3.1-8b-instruct", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "role": "assistant", "tool_calls": [ { "index": 0, "type": "function", "function": { "arguments": "}" } } ] }, "finish_reason": null, "logprobs": null } ], "usage": null, "created": 1726294442 } data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "meta-llama-3.1-8b-instruct", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": {}, "finish_reason": "tool_calls", "logprobs": null } ], "usage": null, "created": 1726294442 } data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "meta-llama-3.1-8b-instruct", "object": "chat.completion.chunk", "choices": [], "usage": { "prompt_tokens": 468, "completion_tokens": 59, "total_tokens": 527 }, "created": 1726294443 } data: [DONE] ``` A unique ID of the chat completion. The object type, which is always set to `chat.completion.chunk`. The model to generate the completion. The index of the choice in the list of generated choices. Role of the generated message author, in this case `assistant`. The contents of the assistant message. The index of tool call being generated. The ID of the tool call. The type of the tool, which is always set to `function`. The name of the function to call. The arguments for calling the function, generated by the model in JSON format. Ensure to validate these arguments in your code before invoking the function since the model may not always produce valid JSON. Termination condition of the generation. `stop` means the API returned the full chat completions generated by the model without running into any limits. `length` means the generation exceeded `max_tokens` or the conversation exceeded the max context length. `tool_calls` means the API has generated tool calls. Available options: `stop`, `length`, `tool_calls` Log probability information for the choice. A list of message content tokens with log probability information. The token. The log probability of this token. A list of integers representing the UTF-8 bytes representation of the token. Useful in instances where characters are represented by multiple tokens and their byte representations must be combined to generate the correct text representation. Can be `null` if there is no bytes representation for the token. List of the most likely tokens and their log probability, at this token position. The token. The log probability of this token. A list of integers representing the UTF-8 bytes representation of the token. Useful in instances where characters are represented by multiple tokens and their byte representations must be combined to generate the correct text representation. Can be `null` if there is no bytes representation for the token. Number of tokens in the prompt. Number of tokens in the generated chat completions. Total number of tokens used in the request (`prompt_tokens` + `completion_tokens`). The Unix timestamp (in seconds) for when the token sampled. # Serverless completions Source: https://friendli.ai/docs/openapi/serverless/completions post /serverless/v1/completions Generate text based on the given text prompt. See available models at [this pricing table](/guides/serverless_endpoints/pricing#text-generation-models). To request successfully, it is mandatory to enter a **Friendli Token** (e.g. flp\_XXX) value in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn how to acquire this variable and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. When streaming mode is used (i.e., `stream` option is set to `true`), the response is in MIME type `text/event-stream`. Otherwise, the content type is `application/json`. You can view the schema of the streamed sequence of chunk objects in streaming mode [here](/openapi/serverless/completions-chunk-object). # Serverless completions chunk object Source: https://friendli.ai/docs/openapi/serverless/completions-chunk-object Represents a streamed chunk of a completions response returned by model, based on the provided input. ```json Response data: { "id": "cmpl-26a1e10db8544bc3adb488d2d205288b", "model": "meta-llama-3.1-8b-instruct", "object": "text_completion", "choices": [ { "index": 0, "text": " such", "token": 1778, "finish_reason": null, "logprobs": null } ], "created": 1733382157 } data: { "id": "cmpl-26a1e10db8544bc3adb488d2d205288b", "model": "meta-llama-3.1-8b-instruct", "object": "text_completion", "choices": [ { "index": 0, "text": " as", "token": 439, "finish_reason": null, "logprobs": null } ], "created": 1733382157 } ... data: { "id": "cmpl-26a1e10db8544bc3adb488d2d205288b", "model": "meta-llama-3.1-8b-instruct", "object": "text_completion", "choices": [ { "index": 0, "text": "", "finish_reason": "length", "logprobs": null } ], "created": 1733382157 } data: { "id": "cmpl-26a1e10db8544bc3adb488d2d205288b", "model": "meta-llama-3.1-8b-instruct", "object": "text_completion", "choices": [], "usage": { "prompt_tokens": 5, "completion_tokens": 10, "total_tokens": 15 }, "created": 1733382157 } data: [DONE] ``` A unique ID of the completion. The object type, which is always set to `text_completion`. The model to generate the completion. The index of the choice in the list of generated choices. The text. The token. Termination condition of the generation. `stop` means the API returned the full completions generated by the model without running into any limits. `length` means the generation exceeded `max_tokens` or the conversation exceeded the max context length. Available options: `stop`, `length` Log probability information for the choice. The starting character position of each token in the generated text, useful for mapping tokens back to their exact location for detailed analysis. The log probabilities of each generated token, indicating the model's confidence in selecting each token. A list of individual tokens generated in the completion, representing segments of text such as words or pieces of words. A list of dictionaries, where each dictionary represents the top alternative tokens considered by the model at a specific position in the generated text, along with their log probabilities. The number of items in each dictionary matches the value of `logprobs`. Number of tokens in the prompt. Number of tokens in the generated completions. Total number of tokens used in the request (`prompt_tokens` + `completion_tokens`). The Unix timestamp (in seconds) for when the token sampled. # Serverless detokenization Source: https://friendli.ai/docs/openapi/serverless/detokenization post /serverless/v1/detokenize By giving a list of tokens, generate a detokenized output text string. To request successfully, it is mandatory to enter a **Friendli Token** (e.g. flp\_XXX) value in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn how to acquire this variable and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. # Serverless overview Source: https://friendli.ai/docs/openapi/serverless/overview OpenAPI reference of Friendli Serverless Endpoints API. ### Inference Discover how to generate text through interactive conversations. Learn how to enhance responses with tool assisted chat completions using built-in tools. Learn how to generate text. Explore the process of breaking down text into smaller tokens for machine processing. Learn how to reconstruct tokenized text back into its original, human-readable form. # Serverless tokenization Source: https://friendli.ai/docs/openapi/serverless/tokenization post /serverless/v1/tokenize By giving a text input, generate a tokenized output of token IDs. To request successfully, it is mandatory to enter a **Friendli Token** (e.g. flp\_XXX) value in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn how to acquire this variable and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. # Serverless tool assisted chat completions Source: https://friendli.ai/docs/openapi/serverless/tool-assisted-chat-completions post /serverless/tools/v1/chat/completions Given a list of messages forming a conversation, the model generates a response. Additionally, the model can utilize built-in tools for tool calls, enhancing its capability to provide more comprehensive and actionable responses. See available models at [this pricing table](/guides/serverless_endpoints/pricing#text-generation-models). To request successfully, it is mandatory to enter a **Friendli Token** (e.g. flp\_XXX) value in the **Bearer Token** field. Refer to the [authentication section](/openapi/introduction#authentication) on our introduction page to learn how to acquire this variable and [visit here](https://friendli.ai/suite/setting/tokens) to generate your token. When streaming mode is used (i.e., `stream` option is set to `true`), the response is in MIME type `text/event-stream`. Otherwise, the content type is `application/json`. You can view the schema of the streamed sequence of chunk objects in streaming mode [here](/openapi/serverless/tool-assisted-chat-completions-chunk-object). You can explore examples on the [Friendli Serverless Endpoints](https://friendli.ai/get-started/serverless-endpoints) playground and adjust settings with just a few clicks. Tool assisted chat completions does not fully support parallel tool calls now. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support # Serverless tool assisted chat completions chunk object Source: https://friendli.ai/docs/openapi/serverless/tool-assisted-chat-completions-chunk-object Represents a streamed chunk of a tool assisted chat completions response returned by model, based on the provided input. This API is currently in **Beta**. While we strive to provide a stable and reliable experience, this feature is still under active development. As a result, you may encounter unexpected behavior or limitations. We encourage you to provide feedback to help us improve the feature before its official release. * { e.preventDefault(); window.Intercom('show'); }}>Feature request & feedback * { e.preventDefault(); window.Intercom('showNewMessage'); }}>Contact support ```json Response event: tool_status data: { "tool_call_id": "call_3QrfStXSU6fGdOGPcETocIAq", "name": "math:calculator", "status": "STARTED", "parameters": [{ "name": "expression", "value": "150 * 1.60934" }], "result": null, "files": null, "message": null, "error": null, "usage": null, "timestamp": 1726277121 } event: tool_status data: { "tool_call_id": "call_3QrfStXSU6fGdOGPcETocIAq", "name": "math:calculator", "status": "ENDED", "parameters": [{ "name": "expression", "value": "150 * 1.60934" }], "result": "\"{\\\"result\\\": \\\"150 * 1.60934=241.401000000000\\\"}\"", "files": null, "message": null, "error": null, "usage": null, "timestamp": 1726277121 } data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "meta-llama-3.1-8b-instruct", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "role": "assistant", "content": "To" }, "finish_reason": null, "logprobs": null } ], "created": 1726277121 } ... data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "meta-llama-3.1-8b-instruct", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "role": "assistant", "content": "." }, "finish_reason": null, "logprobs": null } ], "created": 1726277121 } data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "meta-llama-3.1-8b-instruct", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": {}, "finish_reason": "stop", "logprobs": null } ], "created": 1726277121 } data: [DONE] ``` ```json Multiple tools event: tool_status data: { "tool_call_id": "call_5X9KQ52bV3CUigqHWleTzD9A", "name": "code:python-interpreter", "status": "STARTED", "parameters": [{ "name": "code", "value": "def is_prime(n): ... \n" }], "result": null, "files": null, "message": null, "error": null, "usage": null, "timestamp": 1726277008 } event: tool_status data: { "tool_call_id": "call_5X9KQ52bV3CUigqHWleTzD9A", "name": "code:python-interpreter", "status": "ENDED", "parameters": [{ "name": "code", "value": "def is_prime(n): ... \n" }], "result": "\"[2, 3, 5, 7, 11, 13, 17]\\n\"", "files": [], "message": null, "error": null, "usage": null, "timestamp": 1726277011 } data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "meta-llama-3.1-8b-instruct", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "role": "assistant", "content": "Now" }, "finish_reason": null, "logprobs": null } ], "created": 1726277011 } ... event: tool_status data: { "tool_call_id": "call_FgfZYpRoDdPtz3QwLrLZIhdP", "name": "math:calculator", "status": "STARTED", "parameters": [{ "name": "expression", "value": "2 * 3 * 5 * 7 * 11 * 13 * 17" }], "result": null, "files": null, "message": null, "error": null, "usage": null, "timestamp": 1726277012 } event: tool_status data: { "tool_call_id": "call_FgfZYpRoDdPtz3QwLrLZIhdP", "name": "math:calculator", "status": "ENDED", "parameters": [{ "name": "expression", "value": "2 * 3 * 5 * 7 * 11 * 13 * 17" }], "result": "\"{\\\"result\\\": \\\"2 * 3 * 5 * 7 * 11 * 13 * 17=510510\\\"}\"", "files": null, "message": null, "error": null, "usage": null, "timestamp": 1726277016 } data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "meta-llama-3.1-8b-instruct", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "role": "assistant", "content": "The" }, "finish_reason": null, "logprobs": null } ], "created": 1726277016 } ... data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "meta-llama-3.1-8b-instruct", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "role": "assistant", "content": "." }, "finish_reason": null, "logprobs": null } ], "created": 1726277016 } data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "meta-llama-3.1-8b-instruct", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": {}, "finish_reason": "stop", "logprobs": null } ], "created": 1726277016 } data: [DONE] ``` ```json With custom tool event: tool_status data: { "tool_call_id": "call_iryDFgBCcNoc2ICXuuyZqQUe", "name": "web:search", "status": "STARTED", "parameters": [{ "name": "query", "value": "tallest buildings in the world" }], "result": null, "files": null, "message": null, "error": null, "usage": null, "timestamp": 1726294660 } event: tool_status data: { "tool_call_id": "call_iryDFgBCcNoc2ICXuuyZqQUe", "name": "web:search", "status": "UPDATING", "parameters": [{ "name": "query", "value": "tallest buildings in the world" }], "result": "https://en.wikipedia.org/wiki/List_of_tallest_buildings", "files": null, "message": null, "error": null, "usage": null, "timestamp": 1726294666 } ... event: tool_status data: { "tool_call_id": "call_iryDFgBCcNoc2ICXuuyZqQUe", "name": "web:search", "status": "ENDED", "parameters": [{ "name": "query", "value": "tallest buildings in the world" }], "result": "['https://en.wikipedia.org/wiki/List_of_tallest_buildings', ...]", "files": null, "message": null, "error": null, "usage": null, "timestamp": 1726294671 } data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "meta-llama-3.1-8b-instruct", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "role": "assistant", "content": "The" }, "finish_reason": null, "logprobs": null } ], "created": 1726294672 } ... data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "meta-llama-3.1-8b-instruct", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "role": "assistant", "tool_calls": [ { "index": 0, "id": "call_yuvrTUk4O2Uh7Hns5ieUcu1S", "type": "function", "function": { "name": "func", "arguments": "{\"" }, } ] }, "finish_reason": null, "logprobs": null } ], "created": 1726294673 } data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "meta-llama-3.1-8b-instruct", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "role": "assistant", "tool_calls": [ { "index": 0, "type": "function", "function": { "arguments": "arg" } } ] }, "finish_reason": null, "logprobs": null } ], "created": 1726294673 } ... data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "meta-llama-3.1-8b-instruct", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": { "role": "assistant", "tool_calls": [ { "index": 0, "type": "function", "function": { "arguments": "}" } } ] }, "finish_reason": null, "logprobs": null } ], "created": 1726294673 } data: { "id": "chatcmpl-4b71d12c86d94e719c7e3984a7bb7941", "model": "meta-llama-3.1-8b-instruct", "object": "chat.completion.chunk", "choices": [ { "index": 0, "delta": {}, "finish_reason": "tool_calls", "logprobs": null } ], "created": 1726294673 } data: [DONE] ``` A unique ID of the chat completion. The object type, which is always set to `chat.completion.chunk`. The model to generate the completion. The index of the choice in the list of generated choices. Role of the generated message author, in this case `assistant`. The contents of the assistant message. The index of tool call being generated. The ID of the tool call. The type of the tool, which is always set to `function`. The name of the function to call. The arguments for calling the function, generated by the model in JSON format. Ensure to validate these arguments in your code before invoking the function since the model may not always produce valid JSON. Termination condition of the generation. `stop` means the API returned the full chat completions generated by the model without running into any limits. `length` means the generation exceeded `max_tokens` or the conversation exceeded the max context length. `tool_calls` means the API has generated tool calls. Available options: `stop`, `length`, `tool_calls` Log probability information for the choice. A list of message content tokens with log probability information. The token. The log probability of this token. A list of integers representing the UTF-8 bytes representation of the token. Useful in instances where characters are represented by multiple tokens and their byte representations must be combined to generate the correct text representation. Can be `null` if there is no bytes representation for the token. List of the most likely tokens and their log probability, at this token position. The token. The log probability of this token. A list of integers representing the UTF-8 bytes representation of the token. Useful in instances where characters are represented by multiple tokens and their byte representations must be combined to generate the correct text representation. Can be `null` if there is no bytes representation for the token. Number of tokens in the prompt. Number of tokens in the generated chat completions. Total number of tokens used in the request (`prompt_tokens` + `completion_tokens`). The Unix timestamp (in seconds) for when the token sampled. ### `event: tool_status` chunk object `event: tool_status` tracks the execution progress of built-in tools, such as calculator or web search functions. It provides real-time updates on their status and results. The ID of the tool call. The name of the built-in tool. Available options: `math:calculator`, `math:statistics`, `math:calendar`, `web:search`, `web:url`, `code:python-interpreter`, `file:text` Indicates the current execution status of the tool. Available options: `STARTED`, `UPDATING`, `ENDED`, `ERRORED` The name of the tool's function parameter. The value of the tool's function parameter. The output from the tool's execution. The name of the file generated by the tool's execution. URL of the file generated by the tool's execution. Message generated by the tool's execution. The type of error encountered during the tool's execution. The message of error. {/* */} The Unix timestamp (in seconds) for when the event occurred. # Langchain Node.js SDK Source: https://friendli.ai/docs/sdk/integrations/langchain/nodejs Utilize the LangChain Node.js SDK with FriendliAI for seamless integration and enhanced tool calling capabilities in your applications. You can use [**LangChain Node.js SDK**](https://github.com/langchain-ai/langchainjs) to interact with FriendliAI. This makes migration of existing applications already using LangChain particularly easy. ## How to use Before you start, ensure you've already obtained the `FRIENDLI_TOKEN` from the [Friendli Suite](https://friendli.ai/suite/setting/tokens). Our products are entirely compatible with OpenAI, so we use the `@langchain/openai` package by referring to the FriendliAI `baseURL`. ```bash npm npm i @langchain/core @langchain/openai ``` ```bash yarn yarn add @langchain/core @langchain/openai ``` ```bash pnpm pnpm add @langchain/core @langchain/openai ``` ### Instantiation Now we can instantiate our model object and generate chat completions. We provide usage examples for each type of endpoint. Choose the one that best suits your needs: ```js Serverless Endpoints import { ChatOpenAI } from "@langchain/openai"; const model = new ChatOpenAI({ model: "meta-llama-3.1-8b-instruct", apiKey: process.env.FRIENDLI_TOKEN, configuration: { baseURL: "https://api.friendli.ai/serverless/v1", }, }); ``` ```js Dedicated Endpoints import { ChatOpenAI } from "@langchain/openai"; const model = new ChatOpenAI({ model: "YOUR_ENDPOINT_ID", apiKey: process.env.FRIENDLI_TOKEN, configuration: { baseURL: "https://api.friendli.ai/dedicated/v1", }, }); ``` ```js Fine-tuned Dedicated Endpoints import { ChatOpenAI } from "@langchain/openai"; const model = new ChatOpenAI({ model: "YOUR_ENDPOINT_ID:YOUR_ADAPTER_ROUTE", apiKey: process.env.FRIENDLI_TOKEN, configuration: { baseURL: "https://api.friendli.ai/dedicated/v1", }, }); ``` ### Runnable interface We support both synchronous and asynchronous runnable methods to generate a response. {/* #### Synchronous methods: #### Asynchronous methods: TODO: Add more examples */} ```js import { HumanMessage, SystemMessage } from "@langchain/core/messages"; const messages = [ new SystemMessage("Translate the following from English into Italian"), new HumanMessage("hi!"), ]; const result = await model.invoke(messages); console.log(result); ``` ### Chaining We can chain our model with a prompt template. Prompt templates convert raw user input to better input to the LLM. ```javascript import { ChatPromptTemplate } from "@langchain/core/prompts"; const prompt = ChatPromptTemplate.fromMessages([ ["system", "You are a world class technical documentation writer."], ["user", "{input}"], ]); const chain = prompt.pipe(model); console.log( await chain.invoke({ input: "how can langsmith help with testing?" }) ); ``` To get the string value instead of the message, we can add an output parser to the chain. ```javascript import { StringOutputParser } from "@langchain/core/output_parsers"; const outputParser = new StringOutputParser(); const chain = prompt.pipe(model).pipe(outputParser); console.log( await chain.invoke({ input: "how can langsmith help with testing?" }) ); ``` ### Tool calling Describe tools and their parameters, and let the model return a tool to invoke with the input arguments. Tool calling is extremely useful for enhancing the model's capability to provide more comprehensive and actionable responses. #### Define tools to use We can define tools with Zod schemas and use them to generate tool calls. ```bash npm npm i zod ``` ```bash yarn yarn add zod ``` ```bash pnpm pnpm add zod ``` ```js import { tool } from "@langchain/core/tools"; import { z } from "zod"; /** * Note that the descriptions here are crucial, as they will be passed along * to the model along with the class name. */ const calculatorSchema = z.object({ operation: z .enum(["add", "subtract", "multiply", "divide"]) .describe("The type of operation to execute."), number1: z.number().describe("The first number to operate on."), number2: z.number().describe("The second number to operate on."), }); const calculatorTool = tool( async ({ operation, number1, number2 }) => { // Functions must return strings if (operation === "add") { return `${number1 + number2}`; } else if (operation === "subtract") { return `${number1 - number2}`; } else if (operation === "multiply") { return `${number1 * number2}`; } else if (operation === "divide") { return `${number1 / number2}`; } else { throw new Error("Invalid operation."); } }, { name: "calculator", description: "Can perform mathematical operations.", schema: calculatorSchema, } ); console.log( await calculatorTool.invoke({ operation: "add", number1: 3, number2: 4 }) ); ``` #### Bind tools to the model Now models can generate a tool calling response. ```js const modelWithTools = model.bindTools([calculatorTool]); const messages = [new HumanMessage("What is 3 * 12? Also, what is 11 + 49?")]; const aiMessage = await modelWithTools.invoke(messages); console.log(aiMessage); ``` #### Generate a tool assisted message Use the tool call results to generate a message. ```js messages.push(aiMessage); const toolsByName = { calculator: calculatorTool, }; for (const toolCall of aiMessage.tool_calls) { const selectedTool = toolsByName[toolCall.name]; const toolMessage = await selectedTool.invoke(toolCall); messages.push(toolMessage); } console.log(await modelWithTools.invoke(messages)); ``` For more information on how to use tools, check out the [LangChain documentation](https://js.langchain.com/v0.2/docs/how_to/#tools). # LangChain Python SDK Source: https://friendli.ai/docs/sdk/integrations/langchain/python Utilize the LangChain Python SDK with FriendliAI for easy integration and advanced tool calling in your applications. You can use [**LangChain Python SDK**](https://github.com/langchain-ai/langchain) to interact with FriendliAI. This makes migration of existing applications already using LangChain particularly easy. ## How to use Before you start, ensure you've already obtained the `FRIENDLI_TOKEN` from the [Friendli Suite](https://friendli.ai/suite/setting/tokens). Our products are entirely compatible with OpenAI, so we use the `langchain-openai` package by referring to the FriendliAI `baseURL`. ```bash pip install -qU langchain-openai langchain ``` ### Instantiation Now we can instantiate our model object and generate chat completions. We provide usage examples for each type of endpoint. Choose the one that best suits your needs: ```python Serverless Endpoints import os from langchain_openai import ChatOpenAI llm = ChatOpenAI( model="meta-llama-3.1-8b-instruct", base_url="https://api.friendli.ai/serverless/v1", api_key=os.environ["FRIENDLI_TOKEN"], ) ``` ```python Dedicated Endpoints import os from langchain_openai import ChatOpenAI llm = ChatOpenAI( model="YOUR_ENDPOINT_ID", base_url="https://api.friendli.ai/dedicated/v1", api_key=os.environ["FRIENDLI_TOKEN"], ) ``` ```python Fine-tuned Dedicated Endpoints import os from langchain_openai import ChatOpenAI llm = ChatOpenAI( model="YOUR_ENDPOINT_ID:YOUR_ADAPTER_ROUTE", base_url="https://api.friendli.ai/dedicated/v1", api_key=os.environ["FRIENDLI_TOKEN"], ) ``` ### Runnable interface We support both synchronous and asynchronous runnable methods to generate a response. #### Synchronous methods: ```python invoke result = llm.invoke("Tell me a joke.") print(result.content) ``` ```python stream for chunk in llm.stream("Tell me a joke."): print(chunk.content, end="", flush=True) ``` ```python batch for r in llm.batch(["Tell me a joke.", "Tell me a useless fact."]): print(r.content, "\n\n") ``` #### Asynchronous methods: ```python ainvoke result = await llm.ainvoke("Tell me a joke.") print(result.content) ``` ```python astream async for chunk in llm.astream("Tell me a joke."): print(chunk.content, end="", flush=True) ``` ```python abatch for r in await llm.abatch(["Tell me a joke.", "Tell me a useless fact."]): print(r.content, "\n\n") ``` ### Chaining We can [chain](https://python.langchain.com/v0.2/docs/how_to/sequence) our model with a prompt template. Prompt templates convert raw user input to better input to the LLM. ```python from langchain_core.prompts import ChatPromptTemplate prompt = ChatPromptTemplate.from_messages([ ("system", "You are a world class technical documentation writer."), ("user", "{input}") ]) chain = prompt | llm print(chain.invoke({"input": "how can langsmith help with testing?"})) ``` To get the string value instead of the message, we can add an output parser to the chain. ```python from langchain_core.output_parsers import StrOutputParser output_parser = StrOutputParser() chain = prompt | llm | output_parser print(chain.invoke({"input": "how can langsmith help with testing?"})) ``` ### Tool calling Describe tools and their parameters, and let the model return a tool to invoke with the input arguments. Tool calling is extremely useful for enhancing the model's capability to provide more comprehensive and actionable responses. #### Define tools to use The `@tool` decorator is used to define a tool. If you set `parse_docstring=True`, the tool will parse the docstring to extract the information of arguments. ```python Default from langchain_core.tools import tool @tool def add(a: int, b: int) -> int: """Adds a and b.""" return a + b @tool def multiply(a: int, b: int) -> int: """Multiplies a and b.""" return a * b tools = [add, multiply] ``` ```python Parse Docstring from langchain_core.tools import tool @tool(parse_docstring=True) def add(a: int, b: int) -> int: """Adds a and b. Args: a: The first integer. b: The second integer. """ return a + b @tool(parse_docstring=True) def multiply(a: int, b: int) -> int: """Multiplies a and b. Args: a: The first integer. b: The second integer. """ return a * b tools = [add, multiply] ``` #### Bind tools to the model Now models can generate a tool calling response. ```python import os from langchain_openai import ChatOpenAI llm = ChatOpenAI( model="meta-llama-3.1-8b-instruct", base_url="https://api.friendli.ai/serverless/v1", api_key=os.environ["FRIENDLI_TOKEN"], ) llm_with_tools = llm.bind_tools(tools) query = "What is 3 * 12? Also, what is 11 + 49?" print(llm_with_tools.invoke(query).tool_calls) ``` #### Generate a tool assisted message Use the tool call results to generate a message. ```python from langchain_core.messages import HumanMessage, ToolMessage messages = [HumanMessage(query)] ai_msg = llm_with_tools.invoke(messages) messages.append(ai_msg) for tool_call in ai_msg.tool_calls: selected_tool = {"add": add, "multiply": multiply}[tool_call["name"].lower()] tool_output = selected_tool.invoke(tool_call["args"]) messages.append(ToolMessage(tool_output, tool_call_id=tool_call["id"])) print(llm_with_tools.invoke(messages)) ``` For more information on how to use tools, check out the [LangChain documentation](https://python.langchain.com/v0.2/docs/how_to/#tools). # LiteLLM Source: https://friendli.ai/docs/sdk/integrations/litellm LiteLLM SDK supports all FriendliAI models, offering easy integration with serverless, dedicated, and fine-tuned endpoints. You can use [**LiteLLM**](https://github.com/BerriAI/litellm) to interact with FriendliAI. This makes migration of existing applications already using LiteLLM particularly easy. ## How to use Before you start, ensure you've already obtained the `FRIENDLI_TOKEN` from the [Friendli Suite](https://friendli.ai/suite/setting/tokens). Add `friendliai` prefix to your endpoint name for the `model` parameter. ### Chat completion We provide usage examples for each type of endpoint. Choose the one that best suits your needs. You can specify one of the [available models](/guides/serverless_endpoints/text-generation#model-supports) for the serverless endpoints. ```python Serverless Endpoints import os from litellm import completion os.environ['FRIENDLI_TOKEN'] = "YOUR_FRIENDLI_TOKEN" response = completion( model="friendliai/meta-llama-3.3-70b-instruct", messages=[ {"role": "user", "content": "hello from litellm"} ], ) print(response) ``` ```python Dedicated Endpoints import os from litellm import completion os.environ['FRIENDLI_TOKEN'] = "YOUR_FRIENDLI_TOKEN" os.environ['FRIENDLI_API_BASE'] = "https://api.friendli.ai/dedicated/v1" response = completion( model="friendliai/YOUR_ENDPOINT_ID", messages=[ {"role": "user", "content": "hello from litellm"} ], ) print(response) ``` ```python Fine-tuned Dedicated Endpoints import os from litellm import completion os.environ['FRIENDLI_TOKEN'] = "YOUR_FRIENDLI_TOKEN" os.environ['FRIENDLI_API_BASE'] = "https://api.friendli.ai/dedicated/v1" response = completion( model="friendliai/YOUR_ENDPOINT_ID:YOUR_ADAPTER_ROUTE", messages=[ {"role": "user", "content": "hello from litellm"} ], ) print(response) ``` ### Chat completion - Streaming ```python Serverless Endpoints import os from litellm import completion os.environ['FRIENDLI_TOKEN'] = "YOUR_FRIENDLI_TOKEN" response = completion( model="friendliai/meta-llama-3.3-70b-instruct", messages=[ {"role": "user", "content": "hello from litellm"} ], stream=True ) for chunk in response: print(chunk) ``` ```python Dedicated Endpoints import os from litellm import completion os.environ['FRIENDLI_TOKEN'] = "YOUR_FRIENDLI_TOKEN" os.environ['FRIENDLI_API_BASE'] = "https://api.friendli.ai/dedicated/v1" response = completion( model="friendliai/YOUR_ENDPOINT_ID", messages=[ {"role": "user", "content": "hello from litellm"} ], stream=True ) for chunk in response: print(chunk) ``` ```python Fine-tuned Dedicated Endpoints import os from litellm import completion os.environ['FRIENDLI_TOKEN'] = "YOUR_FRIENDLI_TOKEN" os.environ['FRIENDLI_API_BASE'] = "https://api.friendli.ai/dedicated/v1" response = completion( model="friendliai/YOUR_ENDPOINT_ID:YOUR_ADAPTER_ROUTE", messages=[ {"role": "user", "content": "hello from litellm"} ], stream=True ) for chunk in response: print(chunk) ``` # LlamaIndex Source: https://friendli.ai/docs/sdk/integrations/llamaindex Easily integrate large language models with the LlamaIndex SDK, featuring FriendliAI for seamless interaction. {/*

*/} You can use [**LlamaIndex**](https://github.com/run-llama/llama_index) to interact with FriendliAI. This makes migration of existing applications already using LlamaIndex particularly easy. ## How to use Before you start, ensure you've already obtained the `FRIENDLI_TOKEN` from the [Friendli Suite](https://friendli.ai/suite/setting/tokens). ```python pip install llama-index llama-index-llms-friendli ``` ### Instantiation Now we can instantiate our model object and generate chat completions. The default model (i.e. `meta-llama-3.3-70b-instruct`) will be used if no model is specified. ```python import os from llama_index.llms.friendli import Friendli os.environ['FRIENDLI_TOKEN'] = "YOUR_FRIENDLI_TOKEN" llm = Friendli(model="meta-llama-3.3-70b-instruct") ``` ### Chat completion Generate a response from a given conversation. ```python Default from llama_index.core.llms import ChatMessage, MessageRole message = ChatMessage(role=MessageRole.USER, content="Tell me a joke.") resp = llm.chat([message]) print(resp) ``` ```python Streaming from llama_index.core.llms import ChatMessage, MessageRole message = ChatMessage(role=MessageRole.USER, content="Tell me a joke.") resp = llm.stream_chat([message]) for r in resp: print(r.delta, end="") ``` ```python Async from llama_index.core.llms import ChatMessage, MessageRole message = ChatMessage(role=MessageRole.USER, content="Tell me a joke.") resp = await llm.achat([message]) print(resp) ``` ```python Async Streaming from llama_index.core.llms import ChatMessage, MessageRole message = ChatMessage(role=MessageRole.USER, content="Tell me a joke.") resp = await llm.astream_chat([message]) async for r in resp: print(r.delta, end="") ``` ### Completion Generate a response from a given prompt. ```python Default prompt = "Draft a cover letter for a role in software engineering." resp = llm.complete(prompt) print(resp) ``` ```python Streaming prompt = "Draft a cover letter for a role in software engineering." resp = llm.stream_complete(prompt) for r in resp: print(r.delta, end="") ``` ```python Async prompt = "Draft a cover letter for a role in software engineering." resp = await llm.acomplete(prompt) print(resp) ``` ```python Async Streaming prompt = "Draft a cover letter for a role in software engineering." resp = await llm.astream_complete(prompt) async for r in resp: print(r.delta, end="") ``` # OpenAI Node.js SDK Source: https://friendli.ai/docs/sdk/integrations/openai/nodejs Easily integrate FriendliAI with the OpenAI Node.js SDK. You can use [**OpenAI Node.js SDK**](https://github.com/openai/openai-node) to interact with FriendliAI. This makes migration of existing applications already using OpenAI particularly easy. ## How to use Before you start, ensure the `baseURL` and `apiKey` refer to FriendliAI. Since our products are entirely compatible with OpenAI SDK, now you are good to follow the examples below. Choose one of the [available models](/guides/serverless_endpoints/text-generation#model-supports) for `model` parameter. ```bash npm npm i openai ``` ```bash yarn yarn add openai ``` ```bash pnpm pnpm add openai ``` ### Chat Completion Chat completion API that generates a response from a given conversation. We provide multiple usage examples. Try to find the best one that aligns with your needs: ```ts Default import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.friendli.ai/serverless/v1", apiKey: process.env.FRIENDLI_TOKEN, }); async function main() { const completion = await client.chat.completions.create({ model: "meta-llama-3.1-8b-instruct", messages: [ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: "Hello!" }, ], }); console.log(completion.choices[0]); } main(); ``` ```ts Streaming import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.friendli.ai/serverless/v1", apiKey: process.env.FRIENDLI_TOKEN, }); async function main() { const completion = await client.chat.completions.create({ model: "meta-llama-3.1-8b-instruct", messages: [ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: "Hello!" }, ], stream: true, }); for await (const chunk of completion) { console.log(chunk.choices[0].delta.content); } } main(); ``` ```ts Functions import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.friendli.ai/serverless/v1", apiKey: process.env.FRIENDLI_TOKEN, }); async function main() { const messages = [ { role: "user", content: "What's the weather like in Boston today?" }, ]; const tools = [ { type: "function", function: { name: "get_current_weather", description: "Get the current weather in a given location", parameters: { type: "object", properties: { location: { type: "string", description: "The city and state, e.g. San Francisco, CA", }, unit: { type: "string", enum: ["celsius", "fahrenheit"] }, }, required: ["location"], }, }, }, ]; const completion = await client.chat.completions.create({ model: "meta-llama-3.1-8b-instruct", messages: messages, tools: tools, tool_choice: "auto", }); console.log(completion); } main(); ``` ```ts Logprobs import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.friendli.ai/serverless/v1", apiKey: process.env.FRIENDLI_TOKEN, }); async function main() { const completion = await client.chat.completions.create({ model: "meta-llama-3.1-8b-instruct", messages: [{ role: "user", content: "Hello!" }], logprobs: true, top_logprobs: 2, }); console.log(completion.choices[0].message); console.log(completion.choices[0].logprobs); } main(); ``` ### Tool assisted chat completion This feature is in Beta and available only on the **Serverless Endpoints**. Using tool assisted chat completion API, models can utilize built-in tools prepared for tool calls, enhancing its capability to provide more comprehensive and actionable responses. Available tools are listed [here](/guides/serverless_endpoints/tool-assisted-api#built-in-tools). ```ts Basic import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.friendli.ai/serverless/tools/v1", apiKey: process.env.FRIENDLI_TOKEN, }); async function main() { const messages = [ { role: "user", content: "What is the current average home price in New York City, and if I put 15% down, how much will my mortgage be?", }, ]; const tools = [{ type: "code:python-interpreter" }, { type: "web:search" }]; const completion = await client.chat.completions.create({ model: "meta-llama-3.1-8b-instruct", messages: messages, tools: tools, tool_choice: "auto", stream: true, }); for await (const chunk of completion) { if (chunk.choices === undefined) { console.log(`event: ${chunk.event}, data: ${JSON.stringify(chunk.data)}`); } else { console.log(chunk.choices[0].delta.content); } } } main(); ``` ```ts Advanced (REPL) import OpenAI from "openai"; import * as readline from "node:readline/promises"; const client = new OpenAI({ baseURL: "https://api.friendli.ai/serverless/tools/v1", apiKey: process.env.FRIENDLI_TOKEN, }); const terminal = readline.createInterface({ input: process.stdin, output: process.stdout, }); async function chatbot(input) { const stream = await client.chat.completions.create({ model: "meta-llama-3.1-8b-instruct", messages: [{ role: "user", content: input }], tools: [ { type: "web:url" }, { type: "code:python-interpreter" }, { type: "math:calculator" }, { type: "web:search" }, ], tool_choice: "auto", stream: true, }); for await (const chunk of stream) { if (chunk.choices === undefined) { if (chunk.event === "tool_status") { if (chunk.data.result !== "") { switch (chunk.data.status) { case "STARTED": terminal.write( `⚒️ TOOL CALL: ${chunk.data.name}(${JSON.stringify( chunk.data.parameters )})` ); break; case "ENDED": terminal.write(`🔧 TOOL RESULT: ${chunk.data.result}`); break; case "ERRORED": terminal.write(`🔧 TOOL ERROR: ${chunk.data.error}`); break; case "UPDATING": terminal.write(`🔧 TOOL UPDATE: ${chunk.data.result}`); break; default: terminal.write(`Unknown tool status: ${chunk.data}`); } } terminal.write("\n"); } else { terminal.write("Unknown event", chunk); } } else { terminal.write(chunk.choices[0]?.delta?.content || ""); } } terminal.write("\n"); } while (true) { const input = await terminal.question("You: "); terminal.write(" "); await chatbot(input); } ``` # OpenAI Python SDK Source: https://friendli.ai/docs/sdk/integrations/openai/python Integrate FriendliAI with OpenAI Python SDK for chat, streaming, and more. You can use [**OpenAI Python SDK**](https://github.com/openai/openai-python) to interact with FriendliAI. This makes migration of existing applications already using OpenAI particularly easy. ## How to use Before you start, ensure the `base_url` and `api_key` refer to FriendliAI. Since our products are entirely compatible with OpenAI SDK, now you are good to follow the examples below. Choose one of the [available models](/guides/serverless_endpoints/text-generation#model-supports) for `model` parameter. ```bash pip install -qU openai ``` ### Chat Completion Chat completion API that generates a response from a given conversation. We provide multiple usage examples. Try to find the best one that aligns with your needs. ```python Default import os from openai import OpenAI client = OpenAI( base_url="https://api.friendli.ai/serverless/v1", api_key=os.environ.get("FRIENDLI_TOKEN") ) completion = client.chat.completions.create( model="meta-llama-3.1-8b-instruct", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"} ] ) print(completion.choices[0].message) ``` ```python Streaming import os from openai import OpenAI client = OpenAI( base_url="https://api.friendli.ai/serverless/v1", api_key=os.environ.get("FRIENDLI_TOKEN") ) completion = client.chat.completions.create( model="meta-llama-3.1-8b-instruct", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"} ], stream=True ) for chunk in completion: print(chunk.choices[0].delta) ``` ```python Functions import os from openai import OpenAI client = OpenAI( base_url="https://api.friendli.ai/serverless/v1", api_key=os.environ.get("FRIENDLI_TOKEN") ) tools = [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA", }, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}, }, "required": ["location"], }, } } ] completion = client.chat.completions.create( model="meta-llama-3.1-8b-instruct", messages=[ {"role": "user", "content": "What's the weather like in Boston today?"} ], tools=tools, tool_choice="auto" ) print(completion) ``` ```python Logprobs import os from openai import OpenAI client = OpenAI( base_url="https://api.friendli.ai/serverless/v1", api_key=os.environ.get("FRIENDLI_TOKEN") ) completion = client.chat.completions.create( model="meta-llama-3.1-8b-instruct", messages=[ {"role": "user", "content": "Hello!"} ], logprobs=True, top_logprobs=2 ) print(completion.choices[0].message) print(completion.choices[0].logprobs) ``` ### Tool assisted chat completion This feature is in Beta and available only on the **Serverless Endpoints**. Using tool assisted chat completion API, models can utilize built-in tools prepared for tool calls, enhancing its capability to provide more comprehensive and actionable responses. Available tools are listed [here](/guides/serverless_endpoints/tool-assisted-api#built-in-tools). ```python Basic import os from openai import OpenAI client = OpenAI( base_url="https://api.friendli.ai/serverless/tools/v1", api_key=os.environ.get("FRIENDLI_TOKEN") ) stream = client.chat.completions.create( model="meta-llama-3.1-8b-instruct", messages=[{"role": "user", "content": "What is the current average home price in New York City, and if I put 15% down, how much will my mortgage be?"}], tools=[ {"type": "web:search"}, {"type": "math:calculator"}, ], stream=True, ) for chunk in stream: if chunk.choices is None: print(f"{chunk.event=}, {chunk.data=}") elif chunk.choices[0].delta.content is not None: print(chunk.choices[0].delta.content, end="") ``` ```python Advanced (REPL) import os from openai import OpenAI client = OpenAI( base_url="https://api.friendli.ai/serverless/tools/v1", api_key=os.environ.get("FRIENDLI_TOKEN") ) class bcolors: OKBLUE = '\033[94m' OKCYAN = '\033[96m' FAIL = '\033[91m' WHITE = '\033[97m' def print_response(response): print(f"{bcolors.OKCYAN}{response}", end='') def print_tool_call(data): print(f"\n{bcolors.OKBLUE}⚒️ TOOL CALL: { data['name']}({data['parameters']})") def print_tool_result(data): print(f"{bcolors.OKBLUE}🔧 TOOL RESULT: {data['result']}") def print_tool_error(data): print(f"{bcolors.FAIL}🔧 TOOL ERROR: {data['error']}", end='') def print_tool_update(data): print(f"{bcolors.OKBLUE}🔧 TOOL UPDATE: {data['result']}") def chatbot(prompt): stream = client.chat.completions.create( model="meta-llama-3.1-8b-instruct", messages=[{"role": "user", "content": prompt}], stream=True, tools=[ {"type": "web:url"}, {"type": "code:python-interpreter"}, {"type": "math:calculator"}, {"type": "web:search"} ] ) for chunk in stream: if chunk.choices is None: if chunk.event == "tool_status": match chunk.data: case {"status": "STARTED"}: print_tool_call(chunk.data) case {"status": "ENDED"}: print_tool_result(chunk.data) case {"status": "ERRORED"}: print_tool_error(chunk.data) case {"status": "UPDATING"}: print_tool_update(chunk.data) elif chunk.choices[0].delta.content is not None: print_response(chunk.choices[0].delta.content) print("\n") print("Welcome to the Tool Inference!") print("To exit, enter 'q'.") while True: user_input = input(f"{bcolors.WHITE}You: ") if user_input.lower() == 'q': break chatbot(user_input) ``` # Friendli Integrations Source: https://friendli.ai/docs/sdk/integrations/overview Effortlessly integrate FriendliAI models into your projects with support for popular SDKs and frameworks. ## Effortless AI integration with popular SDKs Friendli is committed to providing developers with flexible and powerful tools to integrate our AI models into their projects. We support a variety of popular SDKs and frameworks, making it easy to incorporate Friendli's capabilities into existing workflows and applications. Our integration options include LiteLLM for unified LLM interactions, Vercel AI SDK for seamless web application development, LangChain for building complex AI-driven applications, and an OpenAI-compatible API for those familiar with OpenAI's interface. These integrations enable developers to leverage Friendli's AI models across a wide range of use cases, from simple chat applications to sophisticated AI systems, all while maintaining ease of use and compatibility with existing tools and practices. openai

# Vercel AI SDK Source: https://friendli.ai/docs/sdk/integrations/vercel-ai-sdk Easily integrate FriendliAI models with the Vercel AI SDK, supporting serverless, dedicated, and fine-tuned endpoints. You can use [**Vercel AI SDK**](https://sdk.vercel.ai) to interact with FriendliAI. This makes migration of existing applications already using Vercel AI SDK particularly easy. ## How to use Before you start, ensure you've already obtained the `FRIENDLI_TOKEN` from the [Friendli Suite](https://friendli.ai/suite/setting/tokens). ```bash npm npm i ai @friendliai/ai-provider ``` ```bash yarn yarn add ai @friendliai/ai-provider ``` ```bash pnpm pnpm add ai @friendliai/ai-provider ``` ### Instantiation Instantiate your models using a Friendli provider instance. We provide usage examples for each type of endpoint. Choose the one that best suits your needs: ```ts Serverless Endpoints {4,7-9} import { friendli } from '@friendliai/ai-provider'; // Automatically select serverless endpoints const model = friendli("meta-llama-3.3-70b-instruct"); // Or specify a specific serverless endpoint const model = friendli("meta-llama-3.3-70b-instruct", { endpoint: "serverless", }); ``` ```ts Dedicated Endpoints {4,7-9} import { friendli } from '@friendliai/ai-provider'; // Replace YOUR_ENDPOINT_ID with the ID of your endpoint, e.g. "zbimjgovmlcb" const model = friendli("YOUR_ENDPOINT_ID"); // Specify a dedicated endpoint instead of auto-selecting const model = friendli("YOUR_ENDPOINT_ID", { endpoint: "dedicated", }); ``` ```ts Friendli Container {9} import { createFriendli } from "@friendliai/ai-provider"; const friendli = createFriendli({ // Update with the URL where your container is running. baseURL: "http://localhost:8000/v1", }); // Containers do not require a model id. const model = friendli(""); ``` ### Example: Generating text Generate a response with the `generateText` function: ```ts import { friendli } from "@friendliai/ai-provider"; import { generateText } from "ai"; const { text } = await generateText({ model: friendli("meta-llama-3.3-70b-instruct"), prompt: "Write a vegetarian lasagna recipe for 4 people.", }); console.log(text); ``` ### Example: Using Enforcing Patterns (Regex) Specify a specific pattern (e.g., CSV), character sets, or specific language characters (e.g., Korean Hangul characters) for your LLM's output. ```ts {6} import { friendli } from "@friendliai/ai-provider"; import { generateText } from "ai"; const { text } = await generateText({ model: friendli("meta-llama-3.3-70b-instruct", { regex: new RegExp("[\n ,.?!0-9\uac00-\ud7af]*"), }), prompt: "Who is the first king of the Joseon Dynasty?", }); console.log(text); ``` ### Example: Using built-in tools This feature is in Beta and available only on the **Serverless Endpoints**. Using tool assisted chat completion API, models can utilize built-in tools prepared for tool calls, enhancing its capability to provide more comprehensive and actionable responses. Available tools are listed [here](/guides/serverless_endpoints/tool-assisted-api#built-in-tools). ```ts {6-9} import { friendli } from "@friendliai/ai-provider"; import { streamText } from "ai"; const result = await streamText({ model: friendli("meta-llama-3.3-70b-instruct", { tools: [ {"type": "web:search"}, {"type": "math:calculator"}, ], }), prompt: "Find the current USD to CAD exchange rate and calculate how much $5,000 USD would be in Canadian dollars.", }); for await (const textPart of result.textStream) { console.log(textPart); } ``` ## OpenAI Compatibility You can also use `@ai-sdk/openai` as the APIs are OpenAI-compatible. ```ts import { createOpenAI } from '@ai-sdk/openai'; const friendli = createOpenAI({ baseURL: 'https://api.friendli.ai/serverless/v1', apiKey: process.env.FRIENDLI_TOKEN, }); ``` If you are using dedicated endpoints ```ts import { createOpenAI } from '@ai-sdk/openai'; const friendli = createOpenAI({ baseURL: 'https://api.friendli.ai/dedicated/v1', apiKey: process.env.FRIENDLI_TOKEN, }); ``` ## Further resources * [Implementing a simple streaming chat with Next.js](https://sdk.vercel.ai/examples/next-app/basics/streaming-text-generation) * [Build a Next.js app with the Vercel AI SDK](https://sdk.vercel.ai/docs/getting-started/nextjs-app-router) * [Explore the Vercel AI SDK Core Reference](https://sdk.vercel.ai/docs/ai-sdk-core/overview) # FriendliAI + Weaviate (Node.js) Source: https://friendli.ai/docs/sdk/integrations/weaviate/nodejs Utilize the Weaviate to build applications with less hallucination open-source vector database. Integration with [**Weaviate**](https://github.com/weaviate/weaviate) enables performing Retrieval Augmented Generation (RAG) directly within the Weaviate database. This combines the power of [**Friendli Engine**](https://friendli.ai/solutions/engine) and Weaviate's efficient storage and fast retrieval capabilities to generate personalized and context-aware responses. ## How to use Before you start, ensure you've already obtained the `FRIENDLI_TOKEN` from the [Friendli Suite](https://friendli.ai/suite/setting/tokens). Also, set up your Weaviate instance following this [guide](https://weaviate.io/developers/weaviate/starter-guides/which-weaviate). Your Weaviate instance must be configured with the FriendliAI generative AI integration (`generative-friendliai`) module. ```bash npm npm i weaviate-client ``` ```bash yarn yarn add weaviate-client ``` ```bash pnpm pnpm add weaviate-client ``` ### Instantiation Now we can instantiate a [Weaviate collection](https://weaviate.io/developers/weaviate/manage-data/collections) using our model. We provide usage examples for each type of endpoint. Choose the one that best suits your needs. You can specify one of the [available models](/guides/serverless_endpoints/text-generation#model-supports) for the serverless endpoints. The default model (i.e. `meta-llama-3.3-70b-instruct`) will be used if no model is specified. ```ts Serverless Endpoints import weaviate from 'weaviate-client' const client = await weaviate.connectToWeaviateCloud( 'WEAVIATE_INSTANCE_URL', // your Weaviate instance URL { authCredentials: new weaviate.ApiKey('WEAVIATE_INSTANCE_APIKEY'), headers: { 'X-Friendli-Api-Key': process.env.FRIENDLI_TOKEN, } } ) await client.collections.create({ name: 'DemoCollection', generative: weaviate.configure.generative.friendliai({ model: 'meta-llama-3.3-70b-instruct' }), // Additional parameters ... }); client.close() ``` ```ts Dedicated Endpoints import weaviate from 'weaviate-client' const client = await weaviate.connectToWeaviateCloud( 'WEAVIATE_INSTANCE_URL', // your Weaviate instance URL { authCredentials: new weaviate.ApiKey('WEAVIATE_INSTANCE_APIKEY'), headers: { 'X-Friendli-Api-Key': process.env.FRIENDLI_TOKEN, "X-Friendli-Baseurl": "https://api.friendli.ai/dedicated", } } ) await client.collections.create({ name: 'DemoCollection', generative: weaviate.configure.generative.friendliai({ model: 'YOUR_ENDPOINT_ID' }), // Additional parameters ... }); client.close() ``` ```ts Fine-tuned Dedicated Endpoints import weaviate from 'weaviate-client' const client = await weaviate.connectToWeaviateCloud( 'WEAVIATE_INSTANCE_URL', // your Weaviate instance URL { authCredentials: new weaviate.ApiKey('WEAVIATE_INSTANCE_APIKEY'), headers: { 'X-Friendli-Api-Key': process.env.FRIENDLI_TOKEN, "X-Friendli-Baseurl": "https://api.friendli.ai/dedicated", } } ) await client.collections.create({ name: 'DemoCollection', generative: weaviate.configure.generative.friendliai({ model: 'YOUR_ENDPOINT_ID:YOUR_ADAPTER_ROUTE' }), // Additional parameters ... }); client.close() ``` #### Configurable parameters Configure the following generative parameters to customize the model behavior. ```ts await client.collections.create({ name: 'DemoCollection', generative: weaviate.configure.generative.friendliai({ model: 'meta-llama-3.3-70b-instruct', maxTokens: 500, temperature: 0.7, }), // Additional parameters ... }); ``` ### Retrieval Augmented Generation After configuring Weaviate, perform RAG operations, either with the single prompt or grouped task method. #### Single prompt To generate text for each object in the search results, use the single prompt method. The example below generates outputs for each of the n search results, where n is specified by the limit parameter. When creating a single prompt query, use braces `{}` to interpolate the object properties you want Weaviate to pass on to the language model. For example, to pass on the object's title property, include `{title}` in the query. ```ts let myCollection = client.collections.get('DemoCollection'); const singlePromptResults = await myCollection.generate.nearText( ['A holiday film'], { singlePrompt: `Translate this into French: {title}`, }, { limit: 2, } ); for (const obj of singlePromptResults.objects) { console.log(obj.properties['title']); console.log(`Generated output: ${obj.generated}`); // Note that the generated output is per object } ``` #### Grouped task To generate one text for the entire set of search results, use the grouped task method. In other words, when you have n search results, the generative model generates one output for the entire group. ```ts let myCollection = client.collections.get('DemoCollection'); const groupedTaskResults = await myCollection.generate.nearText( ['A holiday film'], { groupedTask: `Write a fun tweet to promote readers to check out these films.`, }, { limit: 2, } ); console.log(`Generated output: ${groupedTaskResults.generated}`); // Note that the generated output is per query for (const obj of groupedTaskResults.objects) { console.log(obj.properties['title']); } ``` ### Further resources Once the integrations are configured at the collection, the data management and search operations in Weaviate work identically to any other collection. See the following model-agnostic examples: * [How-to manage data guides show how to perform data operations](https://weaviate.io/developers/weaviate/manage-data/create). * [How-to search guides show how to perform search operations](https://weaviate.io/developers/weaviate/search/basics). # FriendliAI + Weaviate (Python) Source: https://friendli.ai/docs/sdk/integrations/weaviate/python Utilize the Weaviate to build applications with less hallucination open-source vector database. Integration with [**Weaviate**](https://github.com/weaviate/weaviate) enables performing Retrieval Augmented Generation (RAG) directly within the Weaviate database. This combines the power of [**Friendli Engine**](https://friendli.ai/solutions/engine) and Weaviate's efficient storage and fast retrieval capabilities to generate personalized and context-aware responses. ## How to use Before you start, ensure you've already obtained the `FRIENDLI_TOKEN` from the [Friendli Suite](https://friendli.ai/suite/setting/tokens). Also, set up your Weaviate instance following this [guide](https://weaviate.io/developers/weaviate/starter-guides/which-weaviate). Your Weaviate instance must be configured with the FriendliAI generative AI integration (`generative-friendliai`) module. ```bash pip install -qU weaviate-client ``` ### Instantiation Now we can instantiate a [Weaviate collection](https://weaviate.io/developers/weaviate/manage-data/collections) using our model. We provide usage examples for each type of endpoint. Choose the one that best suits your needs. You can specify one of the [available models](/guides/serverless_endpoints/text-generation#model-supports) for the serverless endpoints. The default model (i.e. `meta-llama-3.3-70b-instruct`) will be used if no model is specified. ```python Serverless Endpoints import weaviate import os from weaviate.classes.init import Auth from weaviate.classes.config import Configure headers = { "X-Friendli-Api-Key": os.getenv("FRIENDLI_TOKEN"), } client = weaviate.connect_to_weaviate_cloud( cluster_url=weaviate_url, # `weaviate_url`: your Weaviate URL auth_credentials=Auth.api_key(weaviate_key), # `weaviate_key`: your Weaviate API key headers=headers ) client.collections.create( "DemoCollection", generative_config=Configure.Generative.friendliai( model = "meta-llama-3.3-70b-instruct", ) # Additional parameters not shown ) client.close() ``` ```python Dedicated Endpoints import weaviate import os from weaviate.classes.init import Auth from weaviate.classes.config import Configure headers = { "X-Friendli-Api-Key": os.getenv("FRIENDLI_TOKEN"), "X-Friendli-Baseurl": "https://api.friendli.ai/dedicated", } client = weaviate.connect_to_weaviate_cloud( cluster_url=weaviate_url, # `weaviate_url`: your Weaviate URL auth_credentials=Auth.api_key(weaviate_key), # `weaviate_key`: your Weaviate API key headers=headers ) client.collections.create( "DemoCollection", generative_config=Configure.Generative.friendliai( model = "YOUR_ENDPOINT_ID", ) # Additional parameters not shown ) client.close() ``` ```python Fine-tuned Dedicated Endpoints import weaviate import os from weaviate.classes.init import Auth from weaviate.classes.config import Configure headers = { "X-Friendli-Api-Key": os.getenv("FRIENDLI_TOKEN"), "X-Friendli-Baseurl": "https://api.friendli.ai/dedicated", } client = weaviate.connect_to_weaviate_cloud( cluster_url=weaviate_url, # `weaviate_url`: your Weaviate URL auth_credentials=Auth.api_key(weaviate_key), # `weaviate_key`: your Weaviate API key headers=headers ) client.collections.create( "DemoCollection", generative_config=Configure.Generative.friendliai( model = "YOUR_ENDPOINT_ID:YOUR_ADAPTER_ROUTE", ) # Additional parameters not shown ) client.close() ``` #### Configurable parameters Configure the following generative parameters to customize the model behavior. ```python from weaviate.classes.config import Configure client.collections.create( "DemoCollection", generative_config=Configure.Generative.friendliai( # These parameters are optional model = "meta-llama-3.3-70b-instruct", max_tokens = 500, temperature = 0.7, ) ) ``` ### Retrieval Augmented Generation After configuring Weaviate, perform RAG operations, either with the single prompt or grouped task method. #### Single prompt To generate text for each object in the search results, use the single prompt method. The example below generates outputs for each of the n search results, where n is specified by the limit parameter. When creating a single prompt query, use braces `{}` to interpolate the object properties you want Weaviate to pass on to the language model. For example, to pass on the object's title property, include `{title}` in the query. ```python collection = client.collections.get("DemoCollection") response = collection.generate.near_text( query="A holiday film", # The model provider integration will automatically vectorize the query single_prompt="Translate this into French: {title}", limit=2 ) for obj in response.objects: print(obj.properties["title"]) print(f"Generated output: {obj.generated}") # Note that the generated output is per object ``` #### Grouped task To generate one text for the entire set of search results, use the grouped task method. In other words, when you have n search results, the generative model generates one output for the entire group. ```python collection = client.collections.get("DemoCollection") response = collection.generate.near_text( query="A holiday film", # The model provider integration will automatically vectorize the query grouped_task="Write a fun tweet to promote readers to check out these films.", limit=2 ) print(f"Generated output: {response.generated}") # Note that the generated output is per query for obj in response.objects: print(obj.properties["title"]) ``` ### Further resources Once the integrations are configured at the collection, the data management and search operations in Weaviate work identically to any other collection. See the following model-agnostic examples: * [How-to manage data guides show how to perform data operations](https://weaviate.io/developers/weaviate/manage-data/create). * [How-to search guides show how to perform search operations](https://weaviate.io/developers/weaviate/search/basics). # Friendli Python SDK Source: https://friendli.ai/docs/sdk/python-sdk Interact with Friendli AI services using the official Python SDK for seamless integration with your applications. ## Introduction The [Friendli Python SDK](https://github.com/friendliai/friendli-python) provides a powerful and flexible way to interact with FriendliAI services, including Serverless Endpoints, Dedicated Endpoints, and Container. This allows developers to easily integrate their Python applications with FriendliAI. ## Installation The SDK can be installed with either pip or poetry: ```bash # Using pip pip install friendli # Using poetry poetry add friendli ``` ## Authentication Authentication is done using a Friendli Token, which can be generated from the [Friendli Suite](https://friendli.ai/suite) in your Personal Settings: ```python import os from friendli import SyncFriendli with SyncFriendli( token=os.environ["FRIENDLI_TOKEN"], ) as friendli: # Your code here ``` For detailed instructions on generating a Friendli Token, see the [Personal Access Tokens](/guides/personal_access_tokens) guide. ## Chat Completions The SDK supports chat completions across all deployment types. Choose the deployment option that best fits your needs. ```python Serverless Endpoints import os from friendli import SyncFriendli with SyncFriendli( token=os.environ["FRIENDLI_TOKEN"], ) as friendli: res = friendli.serverless.chat.complete( messages=[ { "content": "You are a helpful assistant.", "role": "system", }, { "content": "Hello!", "role": "user", }, ], model="meta-llama-3.1-8b-instruct", max_tokens=200, ) print(res) ``` ```python Dedicated Endpoints import os from friendli import SyncFriendli with SyncFriendli( token=os.environ["FRIENDLI_TOKEN"], ) as friendli: res = friendli.dedicated.chat.complete( messages=[ { "content": "You are a helpful assistant.", "role": "system", }, { "content": "Hello!", "role": "user", }, ], model="YOUR_ENDPOINT_ID", max_tokens=200, ) print(res) ``` ```python Container Deployment from friendli import SyncFriendli with SyncFriendli() as friendli: res = friendli.container.chat.complete( messages=[ { "content": "You are a helpful assistant.", "role": "system", }, { "content": "Hello!", "role": "user", }, ], max_tokens=200, ) print(res) ``` ### Asynchronous Chat Completions ```python Serverless Endpoints import asyncio import os from friendli import AsyncFriendli async def main(): async with AsyncFriendli( token=os.environ["FRIENDLI_TOKEN"], ) as friendli: res = await friendli.serverless.chat.complete( messages=[ { "content": "You are a helpful assistant.", "role": "system", }, { "content": "Hello!", "role": "user", }, ], model="meta-llama-3.1-8b-instruct", max_tokens=200, ) print(res) asyncio.run(main()) ``` ```python Dedicated Endpoints import asyncio import os from friendli import AsyncFriendli async def main(): async with AsyncFriendli( token=os.environ["FRIENDLI_TOKEN"], ) as friendli: res = await friendli.dedicated.chat.complete( messages=[ { "content": "You are a helpful assistant.", "role": "system", }, { "content": "Hello!", "role": "user", }, ], model="YOUR_ENDPOINT_ID", max_tokens=200, ) print(res) asyncio.run(main()) ``` ```python Container Deployment import asyncio from friendli import AsyncFriendli async def main(): async with AsyncFriendli() as friendli: res = await friendli.container.chat.complete( messages=[ { "content": "You are a helpful assistant.", "role": "system", }, { "content": "Hello!", "role": "user", }, ], max_tokens=200, ) print(res) asyncio.run(main()) ``` ### Tool-Assisted Chat Completions Tool-assisted chat completions are only available for Serverless endpoints. ```python import os from friendli import SyncFriendli with SyncFriendli( token=os.environ["FRIENDLI_TOKEN"], ) as friendli: res = friendli.serverless.tool_assisted_chat.complete( messages=[ { "content": "What is 3 + 6?", "role": "user", }, ], model="meta-llama-3.1-8b-instruct", max_tokens=200, tools=[ { "type": "math:calculator", }, ], ) print(res) ``` ## Advanced Features ### Streaming Responses The SDK supports streaming responses using server-sent events, which can be consumed using a simple `for` loop: ```python import os from friendli import SyncFriendli with SyncFriendli( token=os.environ["FRIENDLI_TOKEN"], ) as friendli: res = friendli.serverless.chat.stream( messages=[ { "content": "You are a helpful assistant.", "role": "system", }, { "content": "Hello!", "role": "user", }, ], model="meta-llama-3.1-8b-instruct", max_tokens=200, ) with res as event_stream: for event in event_stream: # Process each chunk as it arrives print(event, flush=True) ``` ### Custom Retry Strategy You can customize retry behavior for operations that support retries: ```python import os from friendli import SyncFriendli from friendli.utils import BackoffStrategy, RetryConfig with SyncFriendli( token=os.environ["FRIENDLI_TOKEN"], ) as friendli: res = friendli.serverless.chat.complete( messages=[ { "content": "You are a helpful assistant.", "role": "system", }, { "content": "Hello!", "role": "user", }, ], model="meta-llama-3.1-8b-instruct", max_tokens=200, retries=RetryConfig("backoff", BackoffStrategy(1, 50, 1.1, 100), False), ) # Handle response print(res) ``` ### Error Handling The SDK provides comprehensive error handling with detailed exception information: ```python import os from friendli import SyncFriendli, models with SyncFriendli( token=os.environ["FRIENDLI_TOKEN"], ) as friendli: try: res = friendli.dedicated.endpoint.create( advanced={ "tokenizer_add_special_tokens": True, "tokenizer_skip_special_tokens": False, }, hf_model_repo="", instance_option_id="", name="", project_id="", ) # Handle response print(res) except models.HTTPValidationError as e: # Handle validation errors print(f"Validation error: {e.data}") except models.SDKError as e: # Handle general SDK errors print(f"Error {e.status_code}: {e.message}") ``` ### Custom Logging You can pass your own logger to the client class to help troubleshoot and diagnose issues during API interactions. This is especially useful when you encounter unexpected behavior or errors. ```python import logging import os from friendli import SyncFriendli # Configure your custom logger, for example: logger = logging.getLogger(__name__) logging.basicConfig( format="[%(filename)s:%(lineno)s - %(funcName)s()] %(message)s", level=logging.INFO, handlers=[logging.StreamHandler()], ) with SyncFriendli( server_url=SERVER_URL, token=TOKEN, debug_logger=logger, # Pass your logger here ) as friendli: # Your code here pass ``` ## Beta Features ### Dataset Management (Beta) Our SDK provides a straightforward way to create, retrieve, and update datasets within your projects. Datasets can contain samples across various modalities—such as text, images, and more—allowing flexible and comprehensive dataset construction for your fine-tuning and validation workflows. ```python import os from friendli.friendli import SyncFriendli from friendli.models import Sample TEAM_ID = os.environ["FRIENDLI_TEAM_ID"] PROJECT_ID = os.environ["FRIENDLI_PROJECT_ID"] TOKEN = os.environ["FRIENDLI_TOKEN"] with SyncFriendli( token=TOKEN, x_friendli_team=TEAM_ID, ) as friendli: # Create dataset with friendli.dataset.create( modality=["TEXT", "IMAGE"], name="test-create-dataset-sync", project_id=PROJECT_ID, ) as dataset: # Read dataset with open("dataset.jsonl", "rb") as f: data = [Sample.model_validate_json(line) for line in f] # Add samples to dataset dataset.upload_samples( samples=data, split="train", ) ``` ### File Management (Beta) You can download and upload files to and from our database. This feature is primarily designed for storing sample files related to datasets, with additional use cases planned for the future. ```python import io import os from hashlib import sha256 import httpx from friendli import SyncFriendli TEAM_ID = os.environ["FRIENDLI_TEAM_ID"] PROJECT_ID = os.environ["FRIENDLI_PROJECT_ID"] TOKEN = os.environ["FRIENDLI_TOKEN"] with SyncFriendli( token=TOKEN, ) as friendli: # Read data from file with open("lorem.txt", "rb") as f: data = f.read() # Inititate upload init_upload_res = friendli.file.init_upload( digest=f"sha256:{sha256(data).hexdigest()}", name="lorem.txt", project_id=PROJECT_ID, size=len(data), x_friendli_team=TEAM_ID, ) # Upload to S3 if init_upload_res.upload_url is not None: httpx.post( url=init_upload_res.upload_url, data=init_upload_res.aws, files={"file": io.BytesIO(data)}, timeout=60, ).raise_for_status() # Complete upload friendli.file.complete_upload( file_id=init_upload_res.file_id, x_friendli_team=TEAM_ID, ) # Get download URL get_download_url_res = friendli.file.get_download_url( file_id=init_upload_res.file_id, x_friendli_team=TEAM_ID, ) print(get_download_url_res.download_url) ``` ## Further Resources For complete API documentation, advanced usage examples, and detailed reference information, please visit the [Friendli Python SDK GitHub repository](https://github.com/friendliai/friendli-python).