(function() { var utmInheritingDomain = "appstore.com", utmRegExp = /(&|\?)utm_[A-Za-z]+=[A-Za-z0-9]+/gi, links = document.getElementsByTagName("a"), utms = [ "utm_medium={{URL - utm_medium}}", "utm_source={{URL - utm_source}}", "utm_campaign={{URL - utm_campaign}}" ]; for (var index = 0; index < links.length; index += 1) { var tempLink = links[index].href, tempParts; if (tempLink.indexOf(utmInheritingDomain) > 0) { tempLink = tempLink.replace(utmRegExp, ""); tempParts = tempLink.split("#"); if (tempParts[0].indexOf("?") < 0 ) { tempParts[0] += "?" + utms.join("&"); } else { tempParts[0] += "&" + utms.join("&"); } tempLink = tempParts.join("#"); } links[index].href = tempLink; } }());
  • June 25, 2024
  • 3 min read

Level Up Your Client-Side Interactions with Friendli's gRPC Support

Level Up Your Client-Side Interactions with Friendli's gRPC Support thumbnail

Alongside HTTP methods, Friendli Engine, your favorite LLM serving engine for generating creative text formats, also offers ways to interact with its completion services through gRPC! This blog post dives into what gRPC is and how you can leverage it within the Friendli client for a more efficient and performant experience. You can also access relevant information from our documentation.

What is gRPC?

gRPC is a high performance Remote Procedure Call (RPC) framework. It's a modern open-source framework that allows applications to communicate efficiently by treating remote service calls like local function calls. This translates to advanced functionalities and more streamlined communication compared to traditional methods like REST APIs.

Friendli Containers Support HTTP and gRPC

Friendli Containers offer its functionalities through both the conventional HTTP and gRPC methods. The gRPC support for its completion services also supports response-streaming gRPC, allowing you to receive results in chunks as they become available, ideal for scenarios where responses might be lengthy.

Using gRPC with the friendli-client SDK

To utilize gRPC with Friendli Containers, you'll need the friendli-client SDK (version 1.4.1 or later). Here's a breakdown of how to integrate it into your code:

1. Enable gRPC when Launching Friendli Container:

Start the Friendli Container with the --grpc true flag to activate the gRPC server for completions.

sh
 # Fill the values of following variables.
 export HF_MODEL_NAME=""  # Hugging Face model name (e.g., "meta-llama/Meta-Llama-3-8B-Instruct")
 export FRIENDLI_CONTAINER_SECRET=""  # Friendli container secret
 export FRIENDLI_CONTAINER_IMAGE=""  # Friendli container image (e.g., "registry.friendli.ai/trial")
 export GPU_ENUMERATION=""  # GPUs (e.g., '"device=0,1"')

 docker run \
   --gpus $GPU_ENUMERATION \
   -p 8000:8000 \
   -v ~/.cache/huggingface:/root/.cache/huggingface \
   -e FRIENDLI_CONTAINER_SECRET=$FRIENDLI_CONTAINER_SECRET \
   $FRIENDLI_CONTAINER_IMAGE \
     --hf-model-name $HF_MODEL_NAME \
     // highlight-start
     --grpc true
     // highlight-end
     [LAUNCH_OPTIONS]

2. Choose Your Flavor: Sync or Async

This article provides examples for both synchronous and asynchronous programming styles. Assuming that the Friendli Container gRPC server is running on 0.0.0.0:8000:

- Synchronous Approach:

python
 from friendli import Friendli

 client = Friendli(base_url="0.0.0.0:8000", use_grpc=True)

 stream = client.completions.create(
     prompt="Explain what gRPC is.",
     stream=True,  # Should be True
     top_k=1,
 )

 for chunk in stream:
     print(chunk.text, end="", flush=True)

- Asynchronous Approach:

python
import asyncio
from friendli import AsyncFriendli

client = AsyncFriendli(base_url="0.0.0.0:8000", use_grpc=True)

async def run():
    stream = await client.completions.create(
        prompt="Explain what gRPC is.",
        stream=True,  # Should be True
        top_k=1,
    )

    async for chunk in stream:
        print(chunk.text, end="", flush=True)

asyncio.run(run())

3. Remember to Close Connections Properly

By default, the library closes the HTTP and gRPC connections when the client object is garbage-collected. However, for better resource management, it's recommended to explicitly close connections using the .close() method or employing a context manager with a with block. For example, you could implement the example with proper closing using a context manager:

- Synchronous Approach:

python
from friendli import Friendli

client = Friendli(base_url="0.0.0.0:8000", use_grpc=True)

with client:
    stream = client.completions.create(
        prompt="Explain what gRPC is. Also give me a Python code snippet of gRPC client.",
        stream=True,
        top_k=1,
        min_tokens=10,
    )

    for chunk in stream:
        print(chunk.text, end="", flush=True)

- Asynchronous Approach:

python
 import asyncio
 from friendli import AsyncFriendli

 client = AsyncFriendli(base_url="0.0.0.0:8000", use_grpc=True)

 async def run():
     async with client:
         stream = await client.completions.create(
             prompt="Explain what gRPC is.",
             stream=True,  # Should be True
             top_k=1,
         )

         async for chunk in stream:
             print(chunk.text, end="", flush=True)

 asyncio.run(run())

Embrace Streamlined Communication with Friendli's gRPC

gRPC offers a powerful alternative for interacting with Friendli's completion services. With its ability to handle streaming responses, gRPC provides an efficient and performant solution for various use cases. So, next time you're building applications that require real-time or chunked data from Friendli, consider leveraging the power of gRPC!

Learn more about FriendliAI at our website, blogs, or by using the Friendli Suite!


Written by

FriendliAI logo

FriendliAI Tech & Research


Share