- June 25, 2024
- 2 min read
Level Up Your Client-Side Interactions with Friendli's gRPC Support
Alongside HTTP methods, Friendli Inference, your favorite LLM serving engine for generating creative text formats, also offers ways to interact with its completion services through gRPC! This blog post dives into what gRPC is and how you can leverage it within the Friendli client for a more efficient and performant experience. You can also access relevant information from our documentation.
What is gRPC?
gRPC is a high performance Remote Procedure Call (RPC) framework. It's a modern open-source framework that allows applications to communicate efficiently by treating remote service calls like local function calls. This translates to advanced functionalities and more streamlined communication compared to traditional methods like REST APIs.
Friendli Containers Support HTTP and gRPC
Friendli Containers offer its functionalities through both the conventional HTTP and gRPC methods. The gRPC support for its completion services also supports response-streaming gRPC, allowing you to receive results in chunks as they become available, ideal for scenarios where responses might be lengthy.
Using gRPC with the friendli-client SDK
To utilize gRPC with Friendli Containers, you'll need the friendli-client
SDK (version 1.4.1 or later). Here's a breakdown of how to integrate it into your code:
1. Enable gRPC when Launching Friendli Container:
Start the Friendli Container with the --grpc true
flag to activate the gRPC server for completions.
sh# Fill the values of following variables. export HF_MODEL_NAME="" # Hugging Face model name (e.g., "meta-llama/Meta-Llama-3-8B-Instruct") export FRIENDLI_CONTAINER_SECRET="" # Friendli container secret export FRIENDLI_CONTAINER_IMAGE="" # Friendli container image (e.g., "registry.friendli.ai/trial") export GPU_ENUMERATION="" # GPUs (e.g., '"device=0,1"') docker run \ --gpus $GPU_ENUMERATION \ -p 8000:8000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ -e FRIENDLI_CONTAINER_SECRET=$FRIENDLI_CONTAINER_SECRET \ $FRIENDLI_CONTAINER_IMAGE \ --hf-model-name $HF_MODEL_NAME \ --grpc true
2. Choose Your Flavor: Sync or Async
This article provides examples for both synchronous and asynchronous programming styles. Assuming that the Friendli Container gRPC server is running on 0.0.0.0:8000:
- Synchronous Approach:
pythonfrom friendli import Friendli client = Friendli(base_url="0.0.0.0:8000", use_grpc=True) stream = client.completions.create( prompt="Explain what gRPC is.", stream=True, # Should be True top_k=1, ) for chunk in stream: print(chunk.text, end="", flush=True)
- Asynchronous Approach:
pythonimport asyncio from friendli import AsyncFriendli client = AsyncFriendli(base_url="0.0.0.0:8000", use_grpc=True) async def run(): stream = await client.completions.create( prompt="Explain what gRPC is.", stream=True, # Should be True top_k=1, ) async for chunk in stream: print(chunk.text, end="", flush=True) asyncio.run(run())
3. Remember to Close Connections Properly
By default, the library closes the HTTP and gRPC connections when the client
object is garbage-collected. However, for better resource management, it's recommended to explicitly close connections using the .close()
method or employing a context manager with a with
block. For example, you could implement the example with proper closing using a context manager:
- Synchronous Approach:
pythonfrom friendli import Friendli client = Friendli(base_url="0.0.0.0:8000", use_grpc=True) with client: stream = client.completions.create( prompt="Explain what gRPC is. Also give me a Python code snippet of gRPC client.", stream=True, top_k=1, min_tokens=10, ) for chunk in stream: print(chunk.text, end="", flush=True)
- Asynchronous Approach:
pythonimport asyncio from friendli import AsyncFriendli client = AsyncFriendli(base_url="0.0.0.0:8000", use_grpc=True) async def run(): async with client: stream = await client.completions.create( prompt="Explain what gRPC is.", stream=True, # Should be True top_k=1, ) async for chunk in stream: print(chunk.text, end="", flush=True) asyncio.run(run())
Embrace Streamlined Communication with Friendli's gRPC
gRPC offers a powerful alternative for interacting with Friendli's completion services. With its ability to handle streaming responses, gRPC provides an efficient and performant solution for various use cases. So, next time you're building applications that require real-time or chunked data from Friendli, consider leveraging the power of gRPC!
Learn more about FriendliAI at our website, blogs, or by using the Friendli Suite!
Written by
FriendliAI Tech & Research
Share