- October 27, 2023
- 2 min read
LangChain Integration with Friendli Dedicated Endpoints

In this article, we will demonstrate how to use Friendli Dedicated Endpoints with LangChain. Friendli Dedicated Endpoints is our SaaS service for deploying generative AI models that run Friendli Inference, our flagship LLM serving engine, on various cloud platforms. LangChain is a popular framework for building language model applications. It offers developers a convenient way of combining multiple components into a language model application. Using Friendli Dedicated Endpoints with LangChain allows developers to not only write language model applications easily, but also leverages the capabilities of Friendli Inference, our flagship LLM serving engine, to enhance the performance and cost-efficiency of serving the LLM model.
Building a Friendli LLM interface for LangChain
LangChain provides various LLM model interfaces and also allows defining a custom interface with ease by inheriting LangChain’s base LLM model. First, to get started, you'll need a running Friendli Inference deployment and an API key. Please refer to our docs for running a deployment on Friendli Dedicated Endpoints. Then, Friendli Inference provides a Python SDK for running language completion tasks, so we’ll use its completion API to implement our custom interface.
Here is our Friendli Inference LLM interface for LangChain:
python
Now we can simply create an instance and use it like any other LLMs in the LangChain framework:
python
Streaming
Friendli Inference also supports streaming a response, so that instead of waiting for the full response, you can receive intermediate results during generation. The LangChain framework also supports the streaming interface as _stream and _astream method, so we’ll also implement them using Friendli Inference's stream option.
python
With the streaming interface, you can display the response to the user as it’s being generated in real-time:
python
In summary, we’ve implemented a custom Friendli Inference LLM interface for LangChain and looked at how it can be used with basic examples. In our next blog post, we will see how to build more complex LLM applications using the Friendli Inference and LangChain. Get started today with Friendli Inference!
Written by
FriendliAI Tech & Research
Share
General FAQ
What is FriendliAI?
FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.
How does FriendliAI help my business?
Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing
Which models and modalities are supported?
Over 380,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models
Can I deploy models from Hugging Face directly?
Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership
Still have questions?
If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Contact Sales — our experts (not a bot) will reply within one business day.