(function() { var utmInheritingDomain = "appstore.com", utmRegExp = /(&|\?)utm_[A-Za-z]+=[A-Za-z0-9]+/gi, links = document.getElementsByTagName("a"), utms = [ "utm_medium={{URL - utm_medium}}", "utm_source={{URL - utm_source}}", "utm_campaign={{URL - utm_campaign}}" ]; for (var index = 0; index < links.length; index += 1) { var tempLink = links[index].href, tempParts; if (tempLink.indexOf(utmInheritingDomain) > 0) { tempLink = tempLink.replace(utmRegExp, ""); tempParts = tempLink.split("#"); if (tempParts[0].indexOf("?") < 0 ) { tempParts[0] += "?" + utms.join("&"); } else { tempParts[0] += "&" + utms.join("&"); } tempLink = tempParts.join("#"); } links[index].href = tempLink; } }());
  • December 18, 2024
  • 3 min read

Leveraging Milvus and Friendli Serverless Endpoints for Advanced RAG and Multi-Modal Queries

Leveraging Milvus and Friendli Serverless Endpoints for Advanced RAG and Multi-Modal Queries thumbnail

link to Colab: Leveraging Milvus and Friendli Serverless Endpoints for Advanced RAG and Multi-Modal Queries.ipynb

In this article, we'll explore how to use Milvus with Friendli Serverless Endpoints to perform Retrieval-Augmented Generation (RAG) on particular documents and materials, as well as to execute multi-modal queries that incorporate images and other visual content. This powerful combination allows for more sophisticated and context-aware AI applications.

Understanding RAG and Multi-Modal Models

Retrieval-Augmented Generation (RAG)

RAG is a technique that enhances language models by providing them with relevant information, primarily retrieved from a vector database-powered knowledge base. This approach allows AI models to generate more accurate and contextually appropriate responses by referencing designated external data sources.

Multi-Modal Models

Multimodal models can process and understand multiple types of input data, such as text, images, and audio. They can analyze and generate responses based on diverse information sources, enabling more comprehensive and nuanced interactions.

Why Incorporate RAG and Multi-modal models together?

The combination of RAG and multi-modal capabilities significantly improves AI systems by providing the following features simultaneously:

  1. Allowing for more diverse and rich input types of the user’s choice
  2. Providing up-to-date information
  3. Enhancing accuracy and relevance of responses
  4. Enabling context-aware interactions

Hands-On Implementation

Let's dive into the practical implementation of RAG and multi-modal queries using the Milvus vector database and Friendli Serverless Endpoints.

Step 1: Install Prerequisites and Download Milvus Docs

First, we'll install the necessary libraries and download the Milvus documentation that we'll use for our RAG job:

bash
!pip install --upgrade pymilvus requests tqdm langchain langchain-community langchain-huggingface langchain-openai friendli-client tiktoken

!wget https://github.com/milvus-io/milvus-docs/releases/download/v2.4.6-preview/milvus_docs_2.4.x_en.zip
!rm -rf milvus_docs
!unzip -q milvus_docs_2.4.x_en.zip -d milvus_docs

Step 2: Process Documentation Files

Next, we'll read the Milvus documentation files and use a simple file-splitting strategy to treat each text line as an individual chunk:

python
from glob import glob

text_lines = []

for file_path in glob("milvus_docs/en/faq/*.md", recursive=True):
   with open(file_path, "r") as file:
       file_text = file.read()

   text_lines += file_text.split("# ")

Step 3: Prepare Embeddings

We'll use the Hugging Face embeddings library to use a simple `all-MiniLM-L6` model to create vector representations of our text:

python
from langchain_huggingface import HuggingFaceEmbeddings

embeddings_model_name = "sentence-transformers/all-MiniLM-L6-v2"
embedding = HuggingFaceEmbeddings(model_name=embeddings_model_name)

test_embedding = embedding.embed_query("This is a test")
embedding_dim = len(test_embedding)
print(embedding_dim)
print(test_embedding[:10])

Step 4: Set Up Milvus Client

Now, let's prepare the Milvus client for our RAG implementation. In this simple example, we use Milvus Lite, which runs locally and materializes a file in a local file. You can also consider other Milvus deployment options:

  • If you only need a local vector database for small scale data or prototyping, setting the uri as a local file, e.g../milvus.db, is the most convenient method, as it automatically utilizes Milvus Lite to store all data in this file.
  • For larger scale data and traffic in production, you can set up a Milvus server on Docker or Kubernetes. In this setup, please use the server address and port as your uri, e.g.http://localhost:19530. If you enable the authentication feature on Milvus, set the token as "<your_username>:<your_password>", otherwise there is no need to set the token.
  • You can also use fully managed Milvus on Zilliz Cloud. Simply set the uri and token to the Public Endpoint and API key of your Zilliz Cloud instance.
python
from pymilvus import MilvusClient

milvus_client = MilvusClient(uri="./milvus_demo.db")

collection_name = "my_rag_collection"

Step 5: Create Milvus Collection

We'll create a collection in the Milvus client if it doesn't already exist:

python
if milvus_client.has_collection(collection_name):
   milvus_client.drop_collection(collection_name)

milvus_client.create_collection(
   collection_name=collection_name,
   dimension=embedding_dim,
   metric_type="IP",  # Inner product distance
   consistency_level="Strong",  # Strong consistency level
)

Step 6: Embed and Insert Text into Milvus

Let's embed our text and insert it into the Milvus collection:

python
from tqdm import tqdm

data = []

for i, line in enumerate(tqdm(text_lines, desc="Creating embeddings")):
   data.append({"id": i, "vector": embedding.embed_query(line), "text": line})

milvus_client.insert(collection_name=collection_name, data=data)

Step 7: Perform RAG Query

Now we can ask a question and search for relevant data within our Milvus database:

python
question = "How is data stored in milvus?"

search_res = milvus_client.search(
   collection_name=collection_name,
   data=[
       embedding.embed_query(question)
   ],
   limit=3,  # Return top 3 results
   search_params={"metric_type": "IP", "params": {}},  # Inner product distance
   output_fields=["text"],  # Return the text field
)

import json

retrieved_lines_with_distances = [
   (res["entity"]["text"], res["distance"]) for res in search_res[0]
]
print(json.dumps(retrieved_lines_with_distances, indent=4))


context = "\n".join(
   [line_with_distance[0] for line_with_distance in retrieved_lines_with_distances]
)

Step 8: Create Prompts for RAG

Let's create the system and user prompts for our RAG query:

python
SYSTEM_PROMPT = """
Human: You are an AI assistant. You are able to find answers to the questions from the contextual passage snippets provided.
"""
USER_PROMPT = f"""
Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.
<context>
{context}
</context>
<question>
{question}
</question>
"""

Step 9: Set Up Friendli Token

Obtain your FRIENDLI_TOKEN from the Friendli Suite and set it as an environment variable:

python
import os

if "FRIENDLI_TOKEN" not in os.environ:
   os.environ["FRIENDLI_TOKEN"] = 'flp_FILL_IN_WITH_YOUR_OWN_PERSONAL_ACCESS_TOKEN'

Step 10: Execute RAG Query

Now we can execute our RAG query using the Friendli Serverless Endpoints:

python
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
   model="meta-llama-3.1-70b-instruct",
   base_url="https://api.friendli.ai/serverless/v1",
   api_key=os.environ["FRIENDLI_TOKEN"],
)

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

prompt = ChatPromptTemplate.from_messages([
   ("system", SYSTEM_PROMPT),
   ("user", USER_PROMPT)
])
output_parser = StrOutputParser()

chain = prompt | llm | output_parser

print(chain.invoke({"input": question}))

This produces the answer based on the provided documents:

In Milvus, data is stored in two forms: inserted data and metadata.
Inserted data (vector data, scalar data, and collection-specific schema) is stored in persistent storage as incremental logs. Milvus supports multiple object storage backends, including MinIO, AWS S3, Google Cloud Storage (GCS), Azure Blob Storage, Alibaba Cloud OSS, and Tencent Cloud Object Storage (COS).
Metadata, on the other hand, is generated within Milvus and is stored in etcd, with each Milvus module having its own metadata.

Step 11: Multi-Modal Queries

For multi-modal queries, we'll use the Llama-3.2-11b-vision model:

python
multimodalllm = ChatOpenAI(
   model="llama-3.2-11b-vision-instruct",
   base_url="https://api.friendli.ai/serverless/beta",
   api_key=os.environ["FRIENDLI_TOKEN"],
)

image_url = "https://milvus.io/docs/v2.4.x/assets/highly-decoupled-architecture.png"
message = HumanMessage(
   content=[
       {"type": "text", "text": "describe what is in this image"},
       {"type": "image_url", "image_url": {"url": image_url}},
   ],
)

response = multimodalllm.invoke([message])
print(response.content)

From its response, we can infer that the model correctly understands the image:

The image depicts a flowchart of the components of a system, with the following components:
**Coordinator Service**
* Root
* Query

Step 12: Combine RAG and Multi-Modal Capabilities

Finally, let's combine the RAG and multi-modal capabilities:

python
question = "How is data stored in milvus with respect to this picture?"
USER_PROMPT = f"""
Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.
<context>
{context}
</context>
<question>
{question}
</question>
"""

message = HumanMessage(
   content=[
       {"type": "text", "text": USER_PROMPT},
       {"type": "image_url", "image_url": {"url": image_url}},
   ],
)

response = multimodalllm.invoke([message])
print(response.content)

The model correctly generates a correct response based on the image and the documents:

**Step 1: Identify the components involved in storing data in Milvus.**
The components involved in storing data in Milvus include:
*   Access Layer
*   Message Storage
*   Worker Node

**Step 2: Determine how data is stored in Milvus.**
Data is stored in the Access Layer and Message Storage.

**Step 3: Determine where data is stored in Milvus.**
Data is stored in both Access Layer and Message Storage.

**Answer:** Data is stored in both Access Layer and Message Storage.

Conclusion

This tutorial has demonstrated how to leverage Milvus and Friendli Serverless Endpoints to implement advanced RAG and multi-modal queries. By combining these powerful technologies, you can create more sophisticated AI applications that can understand and process diverse types of information, leading to more accurate and context-aware responses.


Written by

FriendliAI logo

FriendliAI Tech & Research


Share