June 25, 2024
2 min read

Level Up Your Client-Side Interactions with Friendli's gRPC Support

Alongside HTTP methods, Friendli Inference, your favorite LLM serving engine for generating creative text formats, also offers ways to interact with its completion services through gRPC! This blog post dives into what gRPC is and how you can leverage it within the Friendli client for a more efficient and performant experience. You can also access relevant information from our documentation.

What is gRPC?

gRPC is a high performance Remote Procedure Call (RPC) framework. It's a modern open-source framework that allows applications to communicate efficiently by treating remote service calls like local function calls. This translates to advanced functionalities and more streamlined communication compared to traditional methods like REST APIs.

Friendli Containers Support HTTP and gRPC

Friendli Containers offer its functionalities through both the conventional HTTP and gRPC methods. The gRPC support for its completion services also supports response-streaming gRPC, allowing you to receive results in chunks as they become available, ideal for scenarios where responses might be lengthy.

Using gRPC with the friendli-client SDK

To utilize gRPC with Friendli Containers, you'll need the friendli-client SDK (version 1.4.1 or later). Here's a breakdown of how to integrate it into your code:

1. Enable gRPC when Launching Friendli Container:

Start the Friendli Container with the --grpc true flag to activate the gRPC server for completions.

sh
 # Fill the values of following variables.
 export HF_MODEL_NAME=""  # Hugging Face model name (e.g., "meta-llama/Meta-Llama-3-8B-Instruct")
 export FRIENDLI_CONTAINER_SECRET=""  # Friendli container secret
 export FRIENDLI_CONTAINER_IMAGE=""  # Friendli container image (e.g., "registry.friendli.ai/trial")
 export GPU_ENUMERATION=""  # GPUs (e.g., '"device=0,1"')

 docker run \
   --gpus $GPU_ENUMERATION \
   -p 8000:8000 \
   -v ~/.cache/huggingface:/root/.cache/huggingface \
   -e FRIENDLI_CONTAINER_SECRET=$FRIENDLI_CONTAINER_SECRET \
   $FRIENDLI_CONTAINER_IMAGE \
     --hf-model-name $HF_MODEL_NAME \
     --grpc true

2. Choose Your Flavor: Sync or Async

This article provides examples for both synchronous and asynchronous programming styles. Assuming that the Friendli Container gRPC server is running on 0.0.0.0:8000:

- Synchronous Approach:

python
 from friendli import Friendli

 client = Friendli(base_url="0.0.0.0:8000", use_grpc=True)

 stream = client.completions.create(
     prompt="Explain what gRPC is.",
     stream=True,  # Should be True
     top_k=1,
 )

 for chunk in stream:
     print(chunk.text, end="", flush=True)

- Asynchronous Approach:

python
import asyncio
from friendli import AsyncFriendli

client = AsyncFriendli(base_url="0.0.0.0:8000", use_grpc=True)

async def run():
    stream = await client.completions.create(
        prompt="Explain what gRPC is.",
        stream=True,  # Should be True
        top_k=1,
    )

    async for chunk in stream:
        print(chunk.text, end="", flush=True)

asyncio.run(run())

3. Remember to Close Connections Properly

By default, the library closes the HTTP and gRPC connections when the client object is garbage-collected. However, for better resource management, it's recommended to explicitly close connections using the .close() method or employing a context manager with a with block. For example, you could implement the example with proper closing using a context manager:

- Synchronous Approach:

python
from friendli import Friendli

client = Friendli(base_url="0.0.0.0:8000", use_grpc=True)

with client:
    stream = client.completions.create(
        prompt="Explain what gRPC is. Also give me a Python code snippet of gRPC client.",
        stream=True,
        top_k=1,
        min_tokens=10,
    )

    for chunk in stream:
        print(chunk.text, end="", flush=True)

- Asynchronous Approach:

python
 import asyncio
 from friendli import AsyncFriendli

 client = AsyncFriendli(base_url="0.0.0.0:8000", use_grpc=True)

 async def run():
     async with client:
         stream = await client.completions.create(
             prompt="Explain what gRPC is.",
             stream=True,  # Should be True
             top_k=1,
         )

         async for chunk in stream:
             print(chunk.text, end="", flush=True)

 asyncio.run(run())

Embrace Streamlined Communication with Friendli's gRPC

gRPC offers a powerful alternative for interacting with Friendli's completion services. With its ability to handle streaming responses, gRPC provides an efficient and performant solution for various use cases. So, next time you're building applications that require real-time or chunked data from Friendli, consider leveraging the power of gRPC!

Learn more about FriendliAI at our website, blogs, or by using the Friendli Suite!

Written by

FriendliAI Tech & Research

General FAQ

What is FriendliAI?

FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.

How does FriendliAI help my business?

Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing

Which models and modalities are supported?

Over 380,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models

Can I deploy models from Hugging Face directly?

Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Contact Sales — our experts (not a bot) will reply within one business day.