May 3, 2024
2 min read

Building a RAG Chatbot with Friendli, MongoDB Atlas, and LangChain

Recently, LangChain introduced support for Friendli as an LLM inference serving engine. This integration allows you to leverage Friendli Inference’s blazing-fast performance and cost-efficiency for your RAG (Retrieval-Augmented Generation) pipelines.

In this guide, we will build a simple RAG-based chatbot that answers questions about the contents of a PDF document. This tutorial will use Friendli Serverless Endpoints for LLM inference and MongoDB Atlas for the vector store.

Dependencies

First, let’s install the required packages:

Setting Up MongoDB Atlas

While you can run MongoDB locally, we will use MongoDB Atlas, a managed service, for this tutorial. Sign up for a MongoDB Cloud account and create a new cluster. After the cluster is set up, create a new DB and collection following their guide.

Once the DB is set up, check your MongoDB Cloud UI for the database, collection, and index names:

MongoDB

Then, initialize the MongoDB client with the appropriate variables:

Test the connection by running:

Creating a Vector Search Index

To use MongoDB as a vector store, you need to create a vector search index for querying. Configure the search index as follows:

Loading Documents and Embeddings

Now, let’s load a document from a PDF file and insert them into MongoDB Atlas with their embeddings. In our case, we’ll load the BPipe paper from the ICML 2023 conference:

Initializing the LLM with Friendli

Now, let’s initialize the LLM part using Friendli Serverless Endpoints, using Meta’s new Llama 3 70B model:

Building the RAG Chain

We have prepared all the components for our RAG pipeline. Here’s how to ask questions about the PDF file. In our case, we’ll find out what the ‘memory imbalance problem’ is, within BPipe’s context.

Upon execution, you will be able to get the following response from the RAG-applied model, which correctly describes the information from the pdf file, despite it being excluded from the data used to train the original model:

By following these steps and incorporating the provided code, you’ll be well on your way to implementing RAG in your applications. Remember, this is just a starting point – feel free to experiment and customize the process to suit your specific needs.

Ready to Unleash the Power of Your LLM? Experience Friendli Inference's performance! We offer three options to suit your preferences:

Friendli Container: Deploy the engine on your own infrastructure for ultimate control.
Friendli Dedicated Endpoints: Run any custom generative AI models on dedicated GPU instances in autopilot.
Friendli Serverless Endpoints: No setup required, simply call our APIs and let us handle the rest.

Visit https://friendli.ai/try-friendli to begin your journey into the world of high-performance LLM serving with the Friendli Inference!

Written by

FriendliAI Tech & Research

General FAQ

What is FriendliAI?

FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.

How does FriendliAI help my business?

Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing

Which models and modalities are supported?

Over 380,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models

Can I deploy models from Hugging Face directly?

Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Talk to an expert — our experts (not a bot) will reply within one business day.

May 22, 2024
8 min read

Measuring LLM Serving Performance with LLMServingPerfEvaluator

LLM inference

Open-source

Benchmarks

April 29, 2024
2 min read

Meta Llama 3 now available on Friendli

Llama

Quantization