LangChain Integration with PeriFlow Cloud

Blog post thumbnail

In this article, we will show how to use PeriFlow Cloud with LangChain. PeriFlow Cloud is our SaaS service for deploying generative AI models that runs PeriFlow, our flagship LLM serving engine, on various cloud platforms. LangChain is a popular framework for building language model applications. It offers developers a convenient way of combining multiple components into a language model application. Using PeriFlow Cloud with LangChain allows developers to not only write language model applications easily, but also leverages the capabilities of PeriFlow, our flagship LLM serving engine, to enhance performance and cost-effectiveness of serving the LLM model.

Building a PeriFlow LLM interface for LangChain

LangChain provides various LLM model interfaces, and also allows defining a custom interface with ease by inheriting the LangChain’s base LLM model. First, to get started, you need a running PeriFlow deployment and its API key. Please refer to our quickstart for running a deployment on PeriFlow Cloud. Then, PeriFlow provides a Python SDK for running language completion, so we’ll use its completion API to implement our custom interface.

Here is our PeriFlow LLM interface for LangChain:

from langchain.llms.base import LLM
from langchain.schema import LLMResult
from periflow import Completion, V1CompletionOptions

class PeriFlowEndpoint(LLM):
    """PeriFlow LLM interface

    api_key:   PeriFlow Cloud API Key
    endpoint:  PeriFlow Cloud deployment endpoint
    option:    Text completion options.
               Please check out for full options
    api_key: str | None = None
    endpoint: str = ""
    options: dict = dict(

    def _llm_type(self) -> str:
        """Return type of llm."""
        return "periflow"

    def _call(
        prompt: str,
        stop: list[str] | None = None,
        run_manager: CallbackManagerForLLMRun | None = None,
        **kwargs: Any,
    ) -> str:
    """LLM inference method."""
    options = V1CompletionOptions(
    # Define an API endpoint instance
    api = Completion(endpoint=self.endpoint, deployment_security_level="public")
    # Requests text generation to PeriFlow Cloud deployment
    completion = api.create(options=options, stream=False)
    return completion.choices[0].text  # Returns generated text

Now we can simply create an instance and use it like any other LLMs in the LangChain framework:

pf_llm = PeriFlowEndpoint(
pf_llm.predict("Python is a popular")
# >> "general-purpose programming language that supports..."


PeriFlow also supports streaming response, so that instead of waiting for the full response, you can receive intermediate results during generation. The LangChain framework also supports the streaming interface as _stream and _astream method, so we’ll also implement them using PeriFlow’s stream option.

from langchain.schema.output import GenerationChunk

class PeriFlowDeployement(LLM):
    def _stream(
        prompt: str,
        stop: list[str] | None = None,
        run_manager: CallbackManagerForLLMRun | None = None,
        **kwargs: Any,
    ) -> Iterator[GenerationChunk]:
        options = V1CompletionOptions(
        """LLM inference method with streaming option."""
        api = Completion(endpoint=self.endpoint, deployment_security_level="public")
        stream = api.create(options=options, stream=True) # Requests generation with streaming option
        for line in stream:
            # Receives and returns generated tokens in streaming fashion
            chunk = GenerationChunk(text=json.dumps(line.model_dump()))
            yield chunk
            if run_manager:
                # If the callback manager is given, invokes its token handler
                run_manager.on_llm_new_token(line.text, chunk=chunk)

With the streaming interface, you can display the response to the user as it’s being generated in real-time:

from periflow.schema.api.v1.completion import V1CompletionLine

async for resp in pf_llm.astream("Tell me a story"):
    line = V1CompletionLine.model_validate_json(resp)
    print(line, end="")   # Asynchronously prints generated tokens

In summary, we’ve implemented a custom PeriFlow LLM interface for LangChain and how it can be used with basic examples. In our next blog, we will see how to build more complex LLM applications using PeriFlow and LangChain. Get started today with PeriFlow!


Related Posts

  • October 27, 2023
  • 4 min read

Chat Docs: A RAG Application with PeriFlow and LangChain

Large Language Models
  • October 26, 2023
  • 3 min read

Retrieval-Augmented Generation: A Dive into Contextual AI

Large Language Models
Model Serving
See all from blog
We use cookiesWe use cookies to enhance your browsing experience on our website. By clicking “Accept all,” you consent to our use of cookies.
scroll to top