- May 3, 2024
- 2 min read
Building a RAG Chatbot with Friendli, MongoDB Atlas, and LangChain

Recently, LangChain introduced support for Friendli as an LLM inference serving engine. This integration allows you to leverage Friendli Inference’s blazing-fast performance and cost-efficiency for your RAG (Retrieval-Augmented Generation) pipelines.
In this guide, we will build a simple RAG-based chatbot that answers questions about the contents of a PDF document. This tutorial will use Friendli Serverless Endpoints for LLM inference and MongoDB Atlas for the vector store.
Dependencies
First, let’s install the required packages:
bash
Setting Up MongoDB Atlas
While you can run MongoDB locally, we will use MongoDB Atlas, a managed service, for this tutorial. Sign up for a MongoDB Cloud account and create a new cluster. After the cluster is set up, create a new DB and collection following their guide.
Once the DB is set up, check your MongoDB Cloud UI for the database, collection, and index names:
Then, initialize the MongoDB client with the appropriate variables:
python
Test the connection by running:
python
Creating a Vector Search Index
To use MongoDB as a vector store, you need to create a vector search index for querying. Configure the search index as follows:
python
Loading Documents and Embeddings
Now, let’s load a document from a PDF file and insert them into MongoDB Atlas with their embeddings. In our case, we’ll load the BPipe paper from the ICML 2023 conference:
python
Initializing the LLM with Friendli
Now, let’s initialize the LLM part using Friendli Serverless Endpoints, using Meta’s new Llama 3 70B model:
python
Building the RAG Chain
We have prepared all the components for our RAG pipeline. Here’s how to ask questions about the PDF file. In our case, we’ll find out what the ‘memory imbalance problem’ is, within BPipe’s context.
python
Upon execution, you will be able to get the following response from the RAG-applied model, which correctly describes the information from the pdf file, despite it being excluded from the data used to train the original model:
python
By following these steps and incorporating the provided code, you’ll be well on your way to implementing RAG in your applications. Remember, this is just a starting point – feel free to experiment and customize the process to suit your specific needs.
Ready to Unleash the Power of Your LLM? Experience Friendli Inference's performance! We offer three options to suit your preferences:
- Friendli Container: Deploy the engine on your own infrastructure for ultimate control.
- Friendli Dedicated Endpoints: Run any custom generative AI models on dedicated GPU instances in autopilot.
- Friendli Serverless Endpoints: No setup required, simply call our APIs and let us handle the rest.
Visit https://friendli.ai/try-friendli to begin your journey into the world of high-performance LLM serving with the Friendli Inference!
Written by
FriendliAI Tech & Research
Share
General FAQ
What is FriendliAI?
FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.
How does FriendliAI help my business?
Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing
Which models and modalities are supported?
Over 380,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models
Can I deploy models from Hugging Face directly?
Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership
Still have questions?
If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Contact Sales — our experts (not a bot) will reply within one business day.