# Friendli Docs ## Docs - [friendli api chat-completions create](https://friendli.ai/docs/cli/api/chat-completions/create.md): Create chat completions using the Friendli API. Customize your requests with various options like model selection, message input, token limits, and more to generate tailored results. - [friendli api completions create](https://friendli.ai/docs/cli/api/completions/create.md): Create text completions using the Friendli API. Customize your completions with various options like prompts, model selection, token limits, and more to create precise, tailored outputs. - [friendli endpoint create](https://friendli.ai/docs/cli/endpoint/create.md): Create and deploy new endpoints with the Friendli API. Customize with model selection, GPU configuration, and more to efficiently serve your machine learning models. - [friendli endpoint get](https://friendli.ai/docs/cli/endpoint/get.md): Get detailed information about a specific endpoint using the Friendli API. - [friendli endpoint list](https://friendli.ai/docs/cli/endpoint/list.md): View all your deployed endpoints with the Friendli API. Easily list endpoints for efficient model management. - [friendli endpoint terminate](https://friendli.ai/docs/cli/endpoint/terminate.md): Terminate a running endpoint with the Friendli API using the endpoint ID. Easily manage and stop your deployed models when needed. - [Installation](https://friendli.ai/docs/cli/installation.md): Install friendli-client package to access advanced features for AI integration. Supports Python 3.8+, with options for machine learning libraries and Hugging Face checkpoint conversion. - [friendli login](https://friendli.ai/docs/cli/login.md): Sign in to Friendli using the command line interface. - [friendli logout](https://friendli.ai/docs/cli/logout.md): Sign out to Friendli using the command line interface. - [friendli model convert](https://friendli.ai/docs/cli/model/convert.md): Convert Hugging Face model checkpoints to Friendli format for deployment. Includes options for quantization, data type selection, and model optimization using the Friendli API. - [friendli model list](https://friendli.ai/docs/cli/model/list.md): View all available models with the Friendli API. Easily list models to streamline your deployment and optimization processes. - [friendli project list](https://friendli.ai/docs/cli/project/list.md): List all accessible projects with the Friendli API. Easily manage your available projects for efficient workflow management. - [friendli project switch](https://friendli.ai/docs/cli/project/switch.md): Switch between project contexts using the Friendli API. Quickly change the active project by providing the project ID for smooth workflow management. - [friendli team list](https://friendli.ai/docs/cli/team/list.md): View all available teams with the Friendli API. Easily list teams for project organization. - [friendli team switch](https://friendli.ai/docs/cli/team/switch.md): Switch between team contexts using the Friendli API. Quickly change the active team by providing the team ID for efficient collaboration and management. - [friendli version](https://friendli.ai/docs/cli/version.md): Check the installed package version of Friendli using the command line interface. - [friendli whoami](https://friendli.ai/docs/cli/whoami.md): Show my user information of Friendli using the command line interface. - [CUDA Compatibility](https://friendli.ai/docs/guides/container/cuda_compatibility.md): The Friendli Engine supports CUDA-enabled NVIDIA GPUs, which means it relies on a specific version of CUDA and necessitates proper CUDA compute compatibilities. - [Inference with gRPC](https://friendli.ai/docs/guides/container/inference_with_grpc.md): Run gRPC inference server with Friendli Container and interact with it through friendli-client SDK. - [Introducing Friendli Container](https://friendli.ai/docs/guides/container/introduction.md): While Friendli Serverless Endpoints and Dedicated Endpoints offer convenient cloud-based solutions, some users crave even more control and flexibility. For those pioneers, Friendli Container is the answer. - [Observability for Friendli Container](https://friendli.ai/docs/guides/container/monitoring.md): Observability is an integral part of DevOps. To support this, Friendli Container exports internal metrics in a Prometheus text format. - [Optimizing Inference with Policy Search](https://friendli.ai/docs/guides/container/optimizing_inference_with_policy_search.md): For specialized cases like MoE or quantized models, optimizing the execution policy in Friendli Engine can boost inference performance by 1.5x to 2x, improving throughput and reducing latency. - [QuickStart: Friendli Container Trial](https://friendli.ai/docs/guides/container/quickstart.md): Learn how to get started with Friendli Container in this step-by-step guide. Activate your free trial, access to the Container registry, perpare you container secret, run your Friendli Container, and monitor using Granfana. - [Running Friendli Container](https://friendli.ai/docs/guides/container/running_friendli_container.md): Friendli Container enables you to effortlessly deploy your generative AI model on your own machine. This tutorial will guide you through the process of running a Friendli Container. - [Running Friendli Container on SageMaker](https://friendli.ai/docs/guides/container/running_friendli_container_on_sagemaker.md): Create a real-time inference endpoint in Amazon SageMaker with Friendli Container backend. By utilizing Friendli Container in your SageMaker pipeline, you'll benefit from the Friendli Engine's speed and resource efficiency. - [Serving MoE Models](https://friendli.ai/docs/guides/container/serving_moe_models.md): Explore the steps to serve Mixture of Experts (MoE) models such as Mixtral 8x7B using Friendli Container. - [Serving Multi-LoRA Models](https://friendli.ai/docs/guides/container/serving_multi_lora_models.md): The Friendli Engine introduces an innovative approach to this challenge through Multi-LoRA (Low-Rank Adaptation) serving, a method that allows for the simultaneous serving of multiple LLMs, optimized for specific tasks without the need for extensive retraining. - [Serving Quantized Models](https://friendli.ai/docs/guides/container/serving_quantized_models.md): Tutorial for serving quantized model with Friendli Engine. Friendli Engine supports FP8, IN8, and AWQ-ed model checkpoints. - [Endpoints](https://friendli.ai/docs/guides/dedicated_endpoints/endpoints.md): Endpoints are the actual deployments of your models on your specified GPU resource. - [Frequently Asked Questions and Troubleshooting](https://friendli.ai/docs/guides/dedicated_endpoints/faq.md): While following through our tutorials, you might have had questions regarding the details of the requirements and specifications. We have listed out the frequently asked questions and as a separate document. - [Fine-tuning](https://friendli.ai/docs/guides/dedicated_endpoints/fine-tuning.md): Effortlessly fine-tune your model with Friendli Dedicated Endpoints, which leverages the Parameter-Efficient Fine-Tuning (PEFT) method to reduce training costs while preserving model quality, similar to full-parameter fine-tuning. - [Deploy with Hugging Face Models](https://friendli.ai/docs/guides/dedicated_endpoints/huggingface_tutorial.md): Hands-on tutorial for launching and deploying LLMs using Friendli Dedicated Endpoints with Hugging Face models. - [Introducing Friendli Dedicated Endpoints](https://friendli.ai/docs/guides/dedicated_endpoints/introduction.md): Friendli Dedicated Endpoints gives you the reins to explore the full potential of your custom generative AI models on the hardware of your choice, whether you're crafting innovative eloquent texts, generating stunning images, or even more. - [Models](https://friendli.ai/docs/guides/dedicated_endpoints/models.md): Within your Friendli Dedicated Endpoints projects you can prepare and manage the models that you wish to deploy. You may upload your models within your project to deploy them directly on your endpoints. Alternatively, you may manage them on the HuggingFace repository or Weights & Biases artifacts, as our endpoints can load models from your project, HuggingFace repositories, and Weights & Biases artifacts. - [Pricing](https://friendli.ai/docs/guides/dedicated_endpoints/pricing.md): Friendli Dedicated Endpoints pricing detail page. - [Projects](https://friendli.ai/docs/guides/dedicated_endpoints/projects.md): Friendli Dedicated Endpoints projects are a basic working unit for your team. - [QuickStart: Friendli Dedicated Endpoints](https://friendli.ai/docs/guides/dedicated_endpoints/quickstart.md): Learn how to get started with Friendli Dedicated Endpoints in this step-by-step guide. Create an account, select your project, choose a model you wish to serve, deploy your endpoint, and seamlessly generate text, code, and more with ease. - [Deploy with W&B Models](https://friendli.ai/docs/guides/dedicated_endpoints/wandb_tutorial.md): Hands-on tutorial for launching and deploying LLMs using Friendli Dedicated Endpoints with Weights & Biases artifacts. - [Image Generation Models](https://friendli.ai/docs/guides/image-generation.md): Dive into the characteristics of popular Image Generation Models available on Friendli Dedicated Endpoints. - [Unleash the Power of Generative AI with Friendli Suite: Your End-to-End Solution](https://friendli.ai/docs/guides/introduction.md): Friendli Suite empowers you to explore generative AI with three solutions: Serverless Endpoints for quick access to open-source models, Dedicated Endpoints for deploying custom models on dedicated GPUs, and Containers for secure, on-premise control. Powered by the optimized Friendli Engine, each option ensures fast, cost-efficient AI serving for text, code, and image generation. - [Friendli Documentation](https://friendli.ai/docs/guides/overview.md): Get started with FriendliAI products and explore APIs. - [Personal Access Tokens](https://friendli.ai/docs/guides/personal_access_tokens.md): Learn how to manage credentials in Friendli Suite, including using personal access tokens for authentication and authorization. - [Advanced Applications on Friendli Serverless Endpoints (Coming Soon!)](https://friendli.ai/docs/guides/serverless_endpoints/applications.md): Stay tuned for detailed guides on how to perform tasks like Retrieval-Augmented Generation (RAG), Conditional Image Generation, Fine-tuning Custom Models. - [Function Calling](https://friendli.ai/docs/guides/serverless_endpoints/function-calling.md): Learn how to do OpenAI compatible function calling on Friendli Serverless Endpoints. - [Integrations](https://friendli.ai/docs/guides/serverless_endpoints/integrations.md): Friendli integrates with LangChain, LiteLLM, LlamaIndex, and MongoDB to streamline GenAI application deployment. LangChain and LlamaIndex enable tool calling AI agents and Retrieval-Augmented Generation (RAG), while MongoDB provides memory via vector databases, and LiteLLM boosts performance through load balancing. - [Introducing Friendli Serverless Endpoints](https://friendli.ai/docs/guides/serverless_endpoints/introduction.md): Guide for Friendli Serverless Endpoints, allowing you to seamlessly integrate state-of-the-art AI models into your workflows, regardless of your technical expertise. - [OpenAI Compatibility](https://friendli.ai/docs/guides/serverless_endpoints/openai-compatibility.md): Friendli Serverless Endpoints is compatible with the OpenAI API standard through the Python API Libraries and the Node API Libraries. Friendli Dedicated Endpoints and Friendli Container are also OpenAI API compatible. - [Pricing](https://friendli.ai/docs/guides/serverless_endpoints/pricing.md): Friendli Serverless Endpoints offer a range of models tailored to various tasks. - [QuickStart: Friendli Serverless Endpoints](https://friendli.ai/docs/guides/serverless_endpoints/quickstart.md): Learn how to get started with Friendli Serverless Endpoints in this step-by-step guide. Create an account, choose from powerful AI models like Llama 3.1, and seamlessly generate text, code, and more with ease. - [Rate Limits](https://friendli.ai/docs/guides/serverless_endpoints/rate_limit.md): Understand the rate limits for Friendli Serverless Endpoints, including Requests per Minute (RPM) and Tokens per Minute (TPM), to ensure efficient usage of resources and balanced performance when interacting with AI models. - [Structured Outputs](https://friendli.ai/docs/guides/serverless_endpoints/structured-outputs.md): Generate structured outputs using FriendliAI's Structured Outputs feature. - [Text Generation Models](https://friendli.ai/docs/guides/serverless_endpoints/text-generation.md): Dive into the characteristics of six popular Text Generation Models (TGMs) available on Friendli Serverless Endpoints. - [Tool Assisted API (Beta)](https://friendli.ai/docs/guides/serverless_endpoints/tool-assisted-api.md): Tool Assisted API enhances a model's capabilities by integrating tools that extend its functionality beyond simple conversational interactions. By using this API, the model becomes more dynamic, providing more comprehensive and actionable responses. Currently, Friendli Serverless Endpoints supports a variety of built-in tools specifically designed for Chat Completion tasks. - [Build an agent with Gradio](https://friendli.ai/docs/guides/tutorials/build-an-agent-with-gradio.md): Build and deploy smart AI agents with Friendli Serverless Endpoints and Gradio in under 50 lines. - [Build an agent with LangChain](https://friendli.ai/docs/guides/tutorials/build-an-agent-with-langchain.md): Build an AI agent with LangChain and Friendli Serverless Endpoints, integrating tool calling for dynamic and efficient responses. - [Chat docs with LangChain](https://friendli.ai/docs/guides/tutorials/chat-docs-with-langchain.md) - [Chat docs with MongoDB](https://friendli.ai/docs/guides/tutorials/chat-docs-with-mongodb.md) - [Go Playground with Next.js](https://friendli.ai/docs/guides/tutorials/go-playground-with-nextjs.md) - [RAG app with LlamaIndex](https://friendli.ai/docs/guides/tutorials/rag-app-with-llamaindex.md) - [Tool calling with Serverless Endpoints](https://friendli.ai/docs/guides/tutorials/tool-calling-with-serverless-endpoints.md): Build AI agents with Friendli Serverless Endpoints using tool calling for dynamic, real-time interactions with LLMs. - [Vision](https://friendli.ai/docs/guides/vision.md): Guide to using Friendli's Vision feature for image analysis. Covers usage via Playground and API (URL & Base64 examples). - [Container chat completions](https://friendli.ai/docs/openapi/container/chat-completions.md): Given a list of messages forming a conversation, the model generates a response. - [Container chat completions chunk object](https://friendli.ai/docs/openapi/container/chat-completions-chunk-object.md): Represents a streamed chunk of a chat completions response returned by model, based on the provided input. - [Container completions](https://friendli.ai/docs/openapi/container/completions.md): Generate text based on the given text prompt. - [Container completions chunk object](https://friendli.ai/docs/openapi/container/completions-chunk-object.md): Represents a streamed chunk of a completions response returned by model, based on the provided input. - [Container detokenization](https://friendli.ai/docs/openapi/container/detokenization.md): By giving a list of tokens, generate a detokenized output text string. - [Container image generations (Beta)](https://friendli.ai/docs/openapi/container/image-generations.md): Given a description, the model generates image. - [Container overview](https://friendli.ai/docs/openapi/container/overview.md): OpenAPI reference of Friendli Container API. - [Container tokenization](https://friendli.ai/docs/openapi/container/tokenization.md): By giving a text input, generate a tokenized output of token IDs. - [Dedicated chat completions](https://friendli.ai/docs/openapi/dedicated/chat-completions.md): Given a list of messages forming a conversation, the model generates a response. - [Dedicated chat completions chunk object](https://friendli.ai/docs/openapi/dedicated/chat-completions-chunk-object.md): Represents a streamed chunk of a chat completions response returned by model, based on the provided input. - [Dedicated completions](https://friendli.ai/docs/openapi/dedicated/completions.md): Generate text based on the given text prompt. - [Dedicated completions chunk object](https://friendli.ai/docs/openapi/dedicated/completions-chunk-object.md): Represents a streamed chunk of a completions response returned by model, based on the provided input. - [Dedicated detokenization](https://friendli.ai/docs/openapi/dedicated/detokenization.md): By giving a list of tokens, generate a detokenized output text string. - [Dedicated create endpoint from W&B artifact](https://friendli.ai/docs/openapi/dedicated/endpoint-wandb-artifact-create.md): Create an endpoint from Weights & Biases artifact. - [Dedicated image generations (Beta)](https://friendli.ai/docs/openapi/dedicated/image-generations.md): Given a description, the model generates image(s). - [Dedicated overview](https://friendli.ai/docs/openapi/dedicated/overview.md): OpenAPI reference of Friendli Dedicated Endpoints API. - [Dedicated tokenization](https://friendli.ai/docs/openapi/dedicated/tokenization.md): By giving a text input, generate a tokenized output of token IDs. - [Friendli Suite API Reference](https://friendli.ai/docs/openapi/introduction.md): OpenAPI reference of Friendli Suite API. You can interact with the API through HTTP requests from any language. - [Serverless chat completions](https://friendli.ai/docs/openapi/serverless/chat-completions.md): Given a list of messages forming a conversation, the model generates a response. - [Serverless chat completions chunk object](https://friendli.ai/docs/openapi/serverless/chat-completions-chunk-object.md): Represents a streamed chunk of a chat completions response returned by model, based on the provided input. - [Serverless completions](https://friendli.ai/docs/openapi/serverless/completions.md): Generate text based on the given text prompt. - [Serverless completions chunk object](https://friendli.ai/docs/openapi/serverless/completions-chunk-object.md): Represents a streamed chunk of a completions response returned by model, based on the provided input. - [Serverless detokenization](https://friendli.ai/docs/openapi/serverless/detokenization.md): By giving a list of tokens, generate a detokenized output text string. - [Serverless overview](https://friendli.ai/docs/openapi/serverless/overview.md): OpenAPI reference of Friendli Serverless Endpoints API. - [Serverless tokenization](https://friendli.ai/docs/openapi/serverless/tokenization.md): By giving a text input, generate a tokenized output of token IDs. - [Serverless tool assisted chat completions (Beta)](https://friendli.ai/docs/openapi/serverless/tool-assisted-chat-completions.md): Given a list of messages forming a conversation, the model generates a response. Additionally, the model can utilize built-in tools for tool calls, enhancing its capability to provide more comprehensive and actionable responses. - [Serverless tool assisted chat completions chunk object (Beta)](https://friendli.ai/docs/openapi/serverless/tool-assisted-chat-completions-chunk-object.md): Represents a streamed chunk of a tool assisted chat completions response returned by model, based on the provided input. - [Langchain Node.js SDK](https://friendli.ai/docs/sdk/integrations/langchain/nodejs.md): Utilize the LangChain Node.js SDK with FriendliAI for seamless integration and enhanced tool calling capabilities in your applications. - [LangChain Python SDK](https://friendli.ai/docs/sdk/integrations/langchain/python.md): Utilize the LangChain Python SDK with FriendliAI for easy integration and advanced tool calling in your applications. - [LiteLLM](https://friendli.ai/docs/sdk/integrations/litellm.md): LiteLLM SDK supports all FriendliAI models, offering easy integration with serverless, dedicated, and fine-tuned endpoints. - [LlamaIndex](https://friendli.ai/docs/sdk/integrations/llamaindex.md): Easily integrate large language models with the LlamaIndex SDK, featuring FriendliAI for seamless interaction. - [OpenAI Node.js SDK](https://friendli.ai/docs/sdk/integrations/openai/nodejs.md): Easily integrate FriendliAI with the OpenAI Node.js SDK. - [OpenAI Python SDK](https://friendli.ai/docs/sdk/integrations/openai/python.md): Integrate FriendliAI with OpenAI Python SDK for chat, streaming, and more. - [Friendli Integrations](https://friendli.ai/docs/sdk/integrations/overview.md): Effortlessly integrate FriendliAI models into your projects with support for popular SDKs and frameworks. - [Vercel AI SDK](https://friendli.ai/docs/sdk/integrations/vercel-ai-sdk.md): Easily integrate FriendliAI models with the Vercel AI SDK, supporting serverless, dedicated, and fine-tuned endpoints. - [FriendliAI + Weaviate (Node.js)](https://friendli.ai/docs/sdk/integrations/weaviate/nodejs.md): Utilize the Weaviate to build applications with less hallucination open-source vector database. - [FriendliAI + Weaviate (Python)](https://friendli.ai/docs/sdk/integrations/weaviate/python.md): Utilize the Weaviate to build applications with less hallucination open-source vector database. ## Optional - [Blog](https://friendli.ai/blog) - [GitHub](https://github.com/friendliai)