Friendli integrates with LangChain, LiteLLM, LlamaIndex, and MongoDB to streamline the deployment of compound GenAI applications. The integration of LangChain and LlamaIndex facilitates tool calling AI agents or Retrieval-Augmented Generation (RAG). MongoDB supports these agentic systems by providing memory with vector databases, while LiteLLM enhances performance through load balancing and evaluation.

Get a quick overview of Friendli Serverless Endpoints’ integrations and learn more through the linked resources.

LangChain

LangChain is a framework for developing applications powered by large language models (LLMs). Utilize Friendli Serverless Endpoints for LLM inferencing in LangChain by preparing a Friendli Token.

To install the required packages, run:

pip install langchain langchain-community friendli-client

Here’s a streaming chat sample code to get started with LangChain and FriendliAI:

from langchain_community.chat_models.friendli import ChatFriendli

llm = ChatFriendli(model="meta-llama-3.1-70b-instruct")

for chunk in llm.stream("Tell me a funny joke."):
    print(chunk.content, end="", flush=True)

Output:

Here's one:
Why couldn't the bicycle stand up by itself?
(Wait for it...)
Because it was two-tired!
Hope that brought a smile to your face!

Resources

MongoDB

MongoDB Atlas is a developer data platform offering vector stores and searches for compound GenAI applications, compatible through both LangChain and LlamaIndex. Utilize Friendli Serverless Endpoints for LLM inferencing in MongoDB by preparing a Friendli Token.

To install the required packages, run:

pip install pymongo friendli-client langchain langchain-mongodb langchain-community pypdf langchain-openai tiktoken

Here’s a RAG sample code to get started with MongoDB and FriendliAI using LangChain:

# Note: You can find detailed explanation on this code in the blog post below.
from pymongo import MongoClient
from langchain_mongodb.vectorstores import MongoDBAtlasVectorSearch
from langchain_community.chat_models.friendli import ChatFriendli
from langchain_community.document_loaders import PyPDFLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough

# Fill in your Cluster URI here.
MONGODB_ATLAS_CLUSTER_URI = "{YOUR CLUSTER URI}"

client = MongoClient(MONGODB_ATLAS_CLUSTER_URI)

# Fill in your DB information here.
DB_NAME = "{YOUR DB NAME}"
COLLECTION_NAME = "{YOUR COLLECTION NAME}"
ATLAS_VECTOR_SEARCH_INDEX_NAME = "{YOUR INDEX NAME}"

MONGODB_COLLECTION = client[DB_NAME][COLLECTION_NAME]

# Fill in your PDF link here.
loader = PyPDFLoader("{YOUR PDF DOCUMENT LINK}")
data = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
docs = text_splitter.split_documents(data)

vector_store = MongoDBAtlasVectorSearch.from_documents(
    documents=docs,
    embedding=OpenAIEmbeddings(disallowed_special=()),
    collection=MONGODB_COLLECTION,
    index_name=ATLAS_VECTOR_SEARCH_INDEX_NAME,
)
retriever = vector_store.as_retriever()

llm = ChatFriendli(model="meta-llama-3.1-70b-instruct")

prompt = PromptTemplate.from_template(
    """
    Use the following pieces of context to answer the question.
    {context}
    Question: {question}
    Helpful Answer:
    """
)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Input your user query here.
rag_chain.invoke("{Sample Query Texts}")

Resources

LlamaIndex

LlamaIndex is a data framework designed to connect LLMs to custom data sources. Utilize Friendli Serverless Endpoints for LLM inferencing in LlamaIndex by preparing a Friendli Token. Additionally, an OpenAI API key is required to access the OpenAI embedding API.

To install the required packages, run:

pip install llama-index-llms-friendli llama-index

Here’s a RAG streaming chat sample code to get started with LlamaIndex and FriendliAI:

from llama_index.llms.friendli import Friendli
from llama_index.core import Settings, SimpleDirectoryReader, VectorStoreIndex

Settings.llm = Friendli()

# Assuming a directory named 'data_folder' stores your pdf file.
documents = SimpleDirectoryReader('data_folder').load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(streaming=True)

# Input your user query here.
response = query_engine.query("{Sample Query Texts}")
response.print_response_stream()

Resources

LiteLLM

LiteLLM is a versatile platform offering access to 100+ LLMs in the OpenAI API format. Utilize Friendli Serverless Endpoints for LLM inferencing in LiteLLM by preparing a Friendli Token.

To install the required package, run:

pip install litellm

Here’s a streaming chat sample code to get started with LiteLLM and FriendliAI:

from litellm import completion

response = completion(
    # Simply change the model ID to use different LLM inference models & engines.
    model="friendliai/meta-llama-3-70b-instruct",
    messages=[
       {"role": "user", "content": "Hello from LiteLLM"}
    ],
    stream=True,
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="", flush=True)

Output:

Hello from an AI! It's great to meet you, LiteLLM! How's your day going so far?

Resources