- October 27, 2023
- 3 min read
LangChain Integration with PeriFlow Cloud

In this article, we will show how to use PeriFlow Cloud with LangChain. PeriFlow Cloud is our SaaS service for deploying generative AI models that runs PeriFlow, our flagship LLM serving engine, on various cloud platforms. LangChain is a popular framework for building language model applications. It offers developers a convenient way of combining multiple components into a language model application. Using PeriFlow Cloud with LangChain allows developers to not only write language model applications easily, but also leverages the capabilities of PeriFlow, our flagship LLM serving engine, to enhance performance and cost-effectiveness of serving the LLM model.
Building a PeriFlow LLM interface for LangChain
LangChain provides various LLM model interfaces, and also allows defining a custom interface with ease by inheriting the LangChain’s base LLM model. First, to get started, you need a running PeriFlow deployment and its API key. Please refer to our quickstart for running a deployment on PeriFlow Cloud. Then, PeriFlow provides a Python SDK for running language completion, so we’ll use its completion API to implement our custom interface.
Here is our PeriFlow LLM interface for LangChain:
from langchain.llms.base import LLM
from langchain.schema import LLMResult
from periflow import Completion, V1CompletionOptions
class PeriFlowEndpoint(LLM):
"""PeriFlow LLM interface
api_key: PeriFlow Cloud API Key
endpoint: PeriFlow Cloud deployment endpoint
option: Text completion options.
Please check out https://docs.periflow.ai/openapi/create-completions for full options
"""
api_key: str | None = None
endpoint: str = ""
options: dict = dict(
max_tokens=200,
top_p=0.8,
temperature=0.5,
no_repeat_ngram=3,
)
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "periflow"
def _call(
self,
prompt: str,
stop: list[str] | None = None,
run_manager: CallbackManagerForLLMRun | None = None,
**kwargs: Any,
) -> str:
"""LLM inference method."""
options = V1CompletionOptions(
prompt=prompt,
stop=stop,
**self.options,
)
# Define an API endpoint instance
api = Completion(endpoint=self.endpoint, deployment_security_level="public")
# Requests text generation to PeriFlow Cloud deployment
completion = api.create(options=options, stream=False)
return completion.choices[0].text # Returns generated text
Now we can simply create an instance and use it like any other LLMs in the LangChain framework:
pf_llm = PeriFlowEndpoint(
api_key="PERIFLOW_API_KEY",
endpoint="https://periflow-deployment-endpoint",
)
pf_llm.predict("Python is a popular")
# >> "general-purpose programming language that supports..."
Streaming
PeriFlow also supports streaming response, so that instead of waiting for the full response, you can receive intermediate results during generation. The LangChain framework also supports the streaming interface as _stream and _astream method, so we’ll also implement them using PeriFlow’s stream option.
from langchain.schema.output import GenerationChunk
class PeriFlowDeployement(LLM):
...
def _stream(
self,
prompt: str,
stop: list[str] | None = None,
run_manager: CallbackManagerForLLMRun | None = None,
**kwargs: Any,
) -> Iterator[GenerationChunk]:
options = V1CompletionOptions(
prompt=prompt,
stop=stop,
**self.options,
)
"""LLM inference method with streaming option."""
api = Completion(endpoint=self.endpoint, deployment_security_level="public")
stream = api.create(options=options, stream=True) # Requests generation with streaming option
for line in stream:
# Receives and returns generated tokens in streaming fashion
chunk = GenerationChunk(text=json.dumps(line.model_dump()))
yield chunk
if run_manager:
# If the callback manager is given, invokes its token handler
run_manager.on_llm_new_token(line.text, chunk=chunk)
With the streaming interface, you can display the response to the user as it’s being generated in real-time:
from periflow.schema.api.v1.completion import V1CompletionLine
async for resp in pf_llm.astream("Tell me a story"):
line = V1CompletionLine.model_validate_json(resp)
print(line, end="") # Asynchronously prints generated tokens
In summary, we’ve implemented a custom PeriFlow LLM interface for LangChain and how it can be used with basic examples. In our next blog, we will see how to build more complex LLM applications using PeriFlow and LangChain. Get started today with PeriFlow!
Written by

FriendliAI Tech & Research
Share