June 10, 2024
3 min read

Introducing Structured Output on Friendli Inference for Building LLM Agents

Large language models (LLMs) excel at creative text generation, but we often face a case where we need LLM outputs to be more structured. This is where our exciting new "structured output" feature comes in.

Why Structured Generation Matters

Imagine you're building a data pipeline. You want an LLM to analyze text and output the sentiment in a machine-readable format, like JSON, which can then be fed into other system components (e.g., for building LLM agents) that require data inputs to follow accurate syntaxes. Free-flowing text will probably generate parsing errors if naively pasted in programming languages, so you would probably need a specific structure for seamless integration with other tools. Structured output lets you achieve this by:

Enforcing Patterns (Regex): Specify a specific pattern (e.g., CSV), character sets, or specific language characters (e.g., Korean Hangul characters) for your LLM's output.
Enforcing Format (JSON): Specify a particular format, like JSON, to easily import them into your code. Structured output makes this happen seamlessly within your query.

The Challenge: Dealing with the Probability on LLMs

LLMs are probabilistic - they generate creative text, but following strict rules can be tricky. Even with careful prompt engineering, enforcing specific formats and patterns isn't always straightforward.

Structured Output: Providing Strict Syntaxes for the LLMs

Here's how we ensure structured output:

Token Filtering: We create a filter that allows only specific "tokens" to be generated by the LLM. Structured output lets you define this filter, ensuring the LLM's creations adhere to your format.
The response_format Query Parameter: Concretely, the Friendli Inference receives this option within your LLM queries to allow you to specify the desired output structure. This supports JSON schemas and regular expressions, which can be extended to many use cases including character restrictions or CSV-file generations!
Integrated in the Friendli Inference: The Friendli Inference powers all the Friendli products, including Friendli Container, Dedicated Endpoints, and Serverless Endpoints. Therefore, this feature is readily available in all of the different products.

Real-World Examples

Let's see structured output in action:

Example 1: Structured Sentiment Analysis (JSON Schema): Imagine analyzing customer reviews and needing sentiment scores in a specific JSON format for further analysis. Structured output allows you to define the exact JSON schema, ensuring the LLM outputs data perfectly formatted for your needs. Let’s check out the example code below, where the comments describe each of the components:

python
import json
import requests

# The json schema that we want our outputs to be. The "properties" describe each item to be included in our results and "descriptions" describe the role of each property.
json_schema = {
    "type": "object",
    "properties": {
        "rationale": {
            "type": "string",
            "description": "The rationale behind the sentiment classification, explaining why the text was classified in a certain way."
        },
        "sentiment": {
            "enum": ["positive", "neutral", "negative"],
            "description": "The sentiment classification of the text."
        },
        "confidence": {
            "type": "integer",
            "minimum": 1,
            "maximum": 5,
            "description": "The confidence level of the sentiment classification, ranging from 1 to 5. A value of 1 indicates low confidence or vagueness, while 5 indicates high confidence or obvious sentiment."
        }
    },
    "required": [
        "sentiment",
        "confidence"
    ],
    "description": "Schema for sentiment analysis result containing sentiment classification and its confidence level."
}

# Through the system prompt, we can specify the overall goal for the task of this agent, mentioning that it should pay more attention to sarcastic, ironic text, and to respond adhering to our json_schema format.
system_prompt = (f"You are a highly advanced data analyst API specializing in sentiment analysis, capable of detecting nuances in language. "
            f"Analyze the sentiment of the provided text, taking into account subtleties, irony, sarcasm, and figurative language. "
            f"Provide a detailed breakdown of the sentiment, including the overall sentiment, intensity, and emotional tone. "
            f"Please respond with your analysis in JSON, adhering to the following JSON Schema: {json.dumps(json_schema)}. "
            f"Be prepared to handle ambiguous or context-dependent texts, and provide explanations for your analysis when necessary.")

# This is the sentence that we wish to analyze for sentiment analysis.
user_prompt = "I recently bought this new TV, and I am very happy with my purchase. The picture quality is amazing and it was easy to set up!"

# By specifying the "response_format" within the request json, we can enforce our output schema to be in the json_schema format that we have defined earlier. While the system prompt can guide the model to follow a schema, this feature ensures that the output strictly follows our requirement.
data = {
    "model": "meta-llama-3.1-8b-instruct",
    "messages": [
      {
        "role": "system",
        "content": system_prompt,
      },
      {
        "role": "user",
        "content": user_prompt,
      },
    ],

    "response_format": {"type": "json_object", "schema": json.dumps(json_schema)},
    "frequency_penalty": 0,
    "presence_penalty": 0,
    "max_tokens": 512,
    "temperature": 0.0
}

url = 'https://api.friendli.ai/serverless/v1/chat/completions'
PAT = "flp_FILL_IN_YOUR_PERSONAL_ACCESS_TOKEN"
resp = requests.post(url, json=data, headers={"authorization": f"Bearer {PAT}"})
print(resp.json())

As you can see from the example, the output message contains a json-formatted LLM output, containing the three properties that we have specified in our json_schema.

python
{
  'choices': [
    {
      'finish_reason': 'stop',
      'index': 0,
      'logprobs': None,
      'message': {
        'content': '{"rationale":"The text expresses a positive sentiment due to the use of the word\'very happy\' and the description of the picture quality as \'amazing\'. The phrase \'easy to set up\' also contributes to the positive tone, as it implies a hassle-free experience. The text lacks any negative language or criticisms, further supporting the positive sentiment.","sentiment":"positive","confidence":5}',
        'role': 'assistant'
      }
    }
  ],
  'created': 1717568979,
  'usage': {
    'completion_tokens': 82,
    'prompt_tokens': 302,
    'total_tokens': 384
  }
}

Example 2: Language-Specific Results (Regex): If you need search results consisting of only a specific set of characters (e.g., Korean letters), structured output lets you define a regular expression that restricts the LLM's output to those characters (i.e., ensuring only Korean text is generated). Let’s check out the example code below:

python
import requests

url = 'https://api.friendli.ai/serverless/v1/chat/completions'
PAT = "flp_FILL_IN_YOUR_PERSONAL_ACCESS_TOKEN"
# By specifying the response_format, we can restrict our output to contain Korean characters only. Otherwise, it would produce results mixed with alphabets.
# Note that \uac00-\ud7af is the unicode range of Korean hangul characters.
data = {
    "model": "meta-llama-3.1-8b-instruct",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant that generates historical facts.",
      },
      {
        "role": "user",
        "content": "조선 왕조의 첫번째 왕은 누구입니까 (who is the first king of the Joseon Dynasty)?",
      }
    ],
    "response_format": {"type": "regex", "schema": r"[\n ,.?!0-9\uac00-\ud7af]*"},
    "frequency_penalty": 0,
    "presence_penalty": 0,
    "max_tokens": 512,
    "temperature": 0.0
}
resp = requests.post(url, json=data, headers={"authorization": f"Bearer {PAT}"})
print(resp.json())

You can see that the content of the generated output message only contains the characters restricted by our regex.

json
{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "조선 왕조의 첫번째 왕은 태조 이성계입니다.",
        "role": "assistant"
      }
    }
  ],
  "created": 1717567041,
  "usage": {
    "completion_tokens": 17,
    "prompt_tokens": 49,
    "total_tokens": 66
  }
}

Example 3: Data Wrangling with CSVs (Regex): One can easily scrape product information from websites and import it into a spreadsheet with AI. Structured output lets you define a CSV format, allowing the LLM to directly generate comma-separated data, ready for Excel's powerful functionalities. Let’s check out the example code:

python
import json

system_prompt = """You are a highly advanced data analyst API specializing in parsing a html from a website and summarizing it in CSV.
You must respond with your analysis only in CSV. You just print a clear, concise, minimal CSV.
"""
# We are going to parse this HTML file.
HTML = """
<div class="ant-row ant-row-start products-row" style="margin: -8px -20px 8px;">
  <div class="ant-col ant-col-md-8" style="padding: 8px 20px;">
    <div class="ant-card ant-card-bordered ant-card-hoverable" style="padding: 10px;">
      <div class="ant-card-cover">
        <img height="320px" width="280px" alt="example" src="https://static.nike.com/a/images/c_limit,w_318,f_auto/t_product_v1/i1-f4a2691f-131f-4d2a-855b-d5c62fe68ccd/react-miler-mens-running-shoe-DgF6nr.webp">
      </div>
      <div class="ant-card-body">
        <div class="ant-card-meta">
          <div class="ant-card-meta-detail">
            <div class="ant-card-meta-title">
              <h2>Nike React Miler</h2>
            </div>
            <div class="ant-card-meta-description">The Nike React Miler is a running shoe designed for long-distance running and provides a balance of comfort, stability, and durability.</div>
          </div>
        </div>
        <br>
        <div class="ant-divider ant-divider-horizontal ant-divider-with-text ant-divider-with-text-center" role="separator">
          <span class="ant-divider-inner-text">Price</span>
        </div>
        <p style="line-height: 28px; font-weight: lighter; font-size: 46px; color: rgb(46, 204, 113); text-align: center;">$130</p>
        <div class="ant-row add-cart-btn-row" style="margin-left: -5px; margin-right: -5px;">
          <div class="ant-col" style="padding-left: 5px; padding-right: 5px;">
            <button title="Add item to cart" type="button" class="ant-btn ant-btn-primary">
              <span>Add to cart</span>
            </button>
          </div>
          <div class="ant-col" style="padding-left: 5px; padding-right: 5px;">
            <button title="Remove item from cart" disabled="" type="button" class="ant-btn ant-btn-primary ant-btn-dangerous">
              <span role="img" aria-label="delete" class="anticon anticon-delete">
                <svg viewBox="64 64 896 896" focusable="false" class="" data-icon="delete" width="1em" height="1em" fill="currentColor" aria-hidden="true">
                  <path d="M360 184h-8c4.4 0 8-3.6 8-8v8h304v-8c0 4.4 3.6 8 8 8h-8v72h72v-80c0-35.3-28.7-64-64-64H352c-35.3 0-64 28.7-64 64v80h72v-72zm504 72H160c-17.7 0-32 14.3-32 32v32c0 4.4 3.6 8 8 8h60.4l24.7 523c1.6 34.1 29.8 61 63.9 61h454c34.2 0 62.3-26.8 63.9-61l24.7-523H888c4.4 0 8-3.6 8-8v-32c0-17.7-14.3-32-32-32zM731.3 840H292.7l-24.2-512h487l-24.2 512z"></path>
                </svg>
              </span>
            </button>
          </div>
        </div>
      </div>
    </div>
  </div>
  <div class="ant-col ant-col-md-8" style="padding: 8px 20px;">
    <div class="ant-card ant-card-bordered ant-card-hoverable" style="padding: 10px;">
      <div class="ant-card-cover">
        <img height="320px" width="280px" alt="example" src="https://static.nike.com/a/images/c_limit,w_318,f_auto/t_product_v1/i1-f6bf583f-a3f3-4e98-af0b-5450583870d5/react-infinity-run-flyknit-mens-running-shoe-zX42Nc.webp">
      </div>
      <div class="ant-card-body">
        <div class="ant-card-meta">
          <div class="ant-card-meta-detail">
            <div class="ant-card-meta-title">
              <h2>Nike React Infinity Run Flyknit</h2>
            </div>
            <div class="ant-card-meta-description">The Nike React Infinity Run Flyknit is a running shoe designed to reduce the risk of running-related injuries, while providing a highly cushioned, supportive, and responsive running experience.</div>
          </div>
        </div>
        <br>
        <div class="ant-divider ant-divider-horizontal ant-divider-with-text ant-divider-with-text-center" role="separator">
          <span class="ant-divider-inner-text">Price</span>
        </div>
        <p style="line-height: 28px; font-weight: lighter; font-size: 46px; color: rgb(46, 204, 113); text-align: center;">$160</p>
        <div class="ant-row add-cart-btn-row" style="margin-left: -5px; margin-right: -5px;">
          <div class="ant-col" style="padding-left: 5px; padding-right: 5px;">
            <button title="Add item to cart" type="button" class="ant-btn ant-btn-primary">
              <span>Add to cart</span>
            </button>
          </div>
          <div class="ant-col" style="padding-left: 5px; padding-right: 5px;">
            <button title="Remove item from cart" disabled="" type="button" class="ant-btn ant-btn-primary ant-btn-dangerous">
              <span role="img" aria-label="delete" class="anticon anticon-delete">
                <svg viewBox="64 64 896 896" focusable="false" class="" data-icon="delete" width="1em" height="1em" fill="currentColor" aria-hidden="true">
                  <path d="M360 184h-8c4.4 0 8-3.6 8-8v8h304v-8c0 4.4 3.6 8 8 8h-8v72h72v-80c0-35.3-28.7-64-64-64H352c-35.3 0-64 28.7-64 64v80h72v-72zm504 72H160c-17.7 0-32 14.3-32 32v32c0 4.4 3.6 8 8 8h60.4l24.7 523c1.6 34.1 29.8 61 63.9 61h454c34.2 0 62.3-26.8 63.9-61l24.7-523H888c4.4 0 8-3.6 8-8v-32c0-17.7-14.3-32-32-32zM731.3 840H292.7l-24.2-512h487l-24.2 512z"></path>
                </svg>
              </span>
            </button>
          </div>
        </div>
      </div>
    </div>
  </div>
  <div class="ant-col ant-col-md-8" style="padding: 8px 20px;">
    <div class="ant-card ant-card-bordered ant-card-hoverable" style="padding: 10px;">
      <div class="ant-card-cover">
        <img height="320px" width="280px" alt="example" src="https://c.static-nike.com/a/images/c_limit,w_318,f_auto/t_product_v1/jomszuqr8gw3h18xhpro/air-force-1-07-mens-shoe-JkTGzADv.webp">
      </div>
      <div class="ant-card-body">
        <div class="ant-card-meta">
          <div class="ant-card-meta-detail">
            <div class="ant-card-meta-title">
              <h2>Nike Air Force 1 '07</h2>
            </div>
            <div class="ant-card-meta-description">The Nike Air Force 1 '07 is a modern version of the classic Nike Air Force 1 sneaker, which was first introduced in 1982. The '07 version retains the iconic silhouette and design elements of the original while incorporating some updated materials and construction techniques.</div>
          </div>
        </div>
        <br>
        <div class="ant-divider ant-divider-horizontal ant-divider-with-text ant-divider-with-text-center" role="separator">
          <span class="ant-divider-inner-text">Price</span>
        </div>
        <p style="line-height: 28px; font-weight: lighter; font-size: 46px; color: rgb(46, 204, 113); text-align: center;">$90</p>
        <div class="ant-row add-cart-btn-row" style="margin-left: -5px; margin-right: -5px;">
          <div class="ant-col" style="padding-left: 5px; padding-right: 5px;">
            <button title="Add item to cart" type="button" class="ant-btn ant-btn-primary">
              <span>Add to cart</span>
            </button>
          </div>
          <div class="ant-col" style="padding-left: 5px; padding-right: 5px;">
            <button title="Remove item from cart" disabled="" type="button" class="ant-btn ant-btn-primary ant-btn-dangerous">
              <span role="img" aria-label="delete" class="anticon anticon-delete">
                <svg viewBox="64 64 896 896" focusable="false" class="" data-icon="delete" width="1em" height="1em" fill="currentColor" aria-hidden="true">
                  <path d="M360 184h-8c4.4 0 8-3.6 8-8v8h304v-8c0 4.4 3.6 8 8 8h-8v72h72v-80c0-35.3-28.7-64-64-64H352c-35.3 0-64 28.7-64 64v80h72v-72zm504 72H160c-17.7 0-32 14.3-32 32v32c0 4.4 3.6 8 8 8h60.4l24.7 523c1.6 34.1 29.8 61 63.9 61h454c34.2 0 62.3-26.8 63.9-61l24.7-523H888c4.4 0 8-3.6 8-8v-32c0-17.7-14.3-32-32-32zM731.3 840H292.7l-24.2-512h487l-24.2 512z"></path>
                </svg>
              </span>
            </button>
          </div>
        </div>
      </div>
    </div>
  </div>
</div>
"""

user_prompt = HTML
regex = r'Title,Price,Description\n(("((""|[^"]*)*)"|[^",\n]*)(,("((""|[^"]*)*)"|[^",\n]*)){2}\n)*'

url = 'https://api.friendli.ai/serverless/v1/chat/completions'
PAT = "flp_FILL_IN_YOUR_PERSONAL_ACCESS_TOKEN"
data = {
    "model": "meta-llama-3.1-8b-instruct",
    "messages": [
      {
        "role": "system",
        "content": system_prompt,
      },
      {
        "role": "user",
        "content": user_prompt,
      },
    ],
    "response_format": {"type": "regex", "schema": regex},
    "frequency_penalty": 0,
    "presence_penalty": 0,
    "max_tokens": 512,
    "temperature": 0.0
}
resp = requests.post(url, json=data, headers={"authorization": f"Bearer {PAT}"})
print(resp.json())

We can see that the result is ready to be imported to a spreadsheet in a csv format.


Title,Price,Description
Nike React Miler,$130,The Nike React Miler is a running shoe designed for long-distance running and provides a balance of comfort.
Nike React Infinity Run Flyknit,$160,The Nike React Infinity Run Flyknit is a running shoe designed to reduce the risk of running-related injuries.
Nike Air Force 1 '07,$90,The Nike Air Force 1 '07 is a modern version of the classic Nike Air Force 1 sneaker. The '07 version retains the iconic silhouette and design elements of the original while incorporating some updated materials and construction techniques.

Beyond Overcoming Probabilistic Errors

Structured output empowers you to create LLMs that are (almost) free from probabilistic errors when it comes to format and pattern adherence. This opens doors for building robust and reliable pipelines and LLM agents that leverage the power of LLMs with the control you crave.

Try it out today and unlock a whole new level of control over your LLM's creations! We offer three options to suit your preferences:

Friendli Container: Deploy the engine on your own infrastructure for ultimate control.
Friendli Dedicated Endpoints: Run any custom generative AI models on dedicated GPU instances in autopilot.
Friendli Serverless Endpoints: No setup required, simply call our APIs and let us handle the rest.

Visit https://friendli.ai/try-friendli to begin your journey into the world of high-performance LLM serving with the Friendli Inference!

Written by

FriendliAI Tech & Research

General FAQ

What is FriendliAI?

FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.

How does FriendliAI help my business?

Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing

Which models and modalities are supported?

Over 380,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models

Can I deploy models from Hugging Face directly?

Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Talk to an expert — our experts (not a bot) will reply within one business day.