- October 27, 2023
- 3 min read
LangChain Integration with PeriFlow Cloud
In this article, we will show how to use PeriFlow Cloud with LangChain. PeriFlow Cloud is our SaaS service for deploying generative AI models that runs PeriFlow, our flagship LLM serving engine, on various cloud platforms. LangChain is a popular framework for building language model applications. It offers developers a convenient way of combining multiple components into a language model application. Using PeriFlow Cloud with LangChain allows developers to not only write language model applications easily, but also leverages the capabilities of PeriFlow, our flagship LLM serving engine, to enhance performance and cost-effectiveness of serving the LLM model.
Building a PeriFlow LLM interface for LangChain
LangChain provides various LLM model interfaces, and also allows defining a custom interface with ease by inheriting the LangChain’s base LLM model. First, to get started, you need a running PeriFlow deployment and its API key. Please refer to our quickstart for running a deployment on PeriFlow Cloud. Then, PeriFlow provides a Python SDK for running language completion, so we’ll use its completion API to implement our custom interface.
Here is our PeriFlow LLM interface for LangChain:
from langchain.llms.base import LLM from langchain.schema import LLMResult from periflow import Completion, V1CompletionOptions class PeriFlowEndpoint(LLM): """PeriFlow LLM interface api_key: PeriFlow Cloud API Key endpoint: PeriFlow Cloud deployment endpoint option: Text completion options. Please check out https://docs.periflow.ai/openapi/create-completions for full options """ api_key: str | None = None endpoint: str = "" options: dict = dict( max_tokens=200, top_p=0.8, temperature=0.5, no_repeat_ngram=3, ) @property def _llm_type(self) -> str: """Return type of llm.""" return "periflow" def _call( self, prompt: str, stop: list[str] | None = None, run_manager: CallbackManagerForLLMRun | None = None, **kwargs: Any, ) -> str: """LLM inference method.""" options = V1CompletionOptions( prompt=prompt, stop=stop, **self.options, ) # Define an API endpoint instance api = Completion(endpoint=self.endpoint, deployment_security_level="public") # Requests text generation to PeriFlow Cloud deployment completion = api.create(options=options, stream=False) return completion.choices.text # Returns generated text
Now we can simply create an instance and use it like any other LLMs in the LangChain framework:
pf_llm = PeriFlowEndpoint( api_key="PERIFLOW_API_KEY", endpoint="https://periflow-deployment-endpoint", ) pf_llm.predict("Python is a popular") # >> "general-purpose programming language that supports..."
PeriFlow also supports streaming response, so that instead of waiting for the full response, you can receive intermediate results during generation. The LangChain framework also supports the streaming interface as _stream and _astream method, so we’ll also implement them using PeriFlow’s stream option.
from langchain.schema.output import GenerationChunk class PeriFlowDeployement(LLM): ... def _stream( self, prompt: str, stop: list[str] | None = None, run_manager: CallbackManagerForLLMRun | None = None, **kwargs: Any, ) -> Iterator[GenerationChunk]: options = V1CompletionOptions( prompt=prompt, stop=stop, **self.options, ) """LLM inference method with streaming option.""" api = Completion(endpoint=self.endpoint, deployment_security_level="public") stream = api.create(options=options, stream=True) # Requests generation with streaming option for line in stream: # Receives and returns generated tokens in streaming fashion chunk = GenerationChunk(text=json.dumps(line.model_dump())) yield chunk if run_manager: # If the callback manager is given, invokes its token handler run_manager.on_llm_new_token(line.text, chunk=chunk)
With the streaming interface, you can display the response to the user as it’s being generated in real-time:
from periflow.schema.api.v1.completion import V1CompletionLine async for resp in pf_llm.astream("Tell me a story"): line = V1CompletionLine.model_validate_json(resp) print(line, end="") # Asynchronously prints generated tokens
In summary, we’ve implemented a custom PeriFlow LLM interface for LangChain and how it can be used with basic examples. In our next blog, we will see how to build more complex LLM applications using PeriFlow and LangChain. Get started today with PeriFlow!
FriendliAI Tech & Research