- June 10, 2024
- 3 min read
Introducing Structured Output on Friendli Inference for Building LLM Agents

Large language models (LLMs) excel at creative text generation, but we often face a case where we need LLM outputs to be more structured. This is where our exciting new "structured output" feature comes in.
Why Structured Generation Matters
Imagine you're building a data pipeline. You want an LLM to analyze text and output the sentiment in a machine-readable format, like JSON, which can then be fed into other system components (e.g., for building LLM agents) that require data inputs to follow accurate syntaxes. Free-flowing text will probably generate parsing errors if naively pasted in programming languages, so you would probably need a specific structure for seamless integration with other tools. Structured output lets you achieve this by:
- Enforcing Patterns (Regex): Specify a specific pattern (e.g., CSV), character sets, or specific language characters (e.g., Korean Hangul characters) for your LLM's output.
- Enforcing Format (JSON): Specify a particular format, like JSON, to easily import them into your code. Structured output makes this happen seamlessly within your query.
The Challenge: Dealing with the Probability on LLMs
LLMs are probabilistic - they generate creative text, but following strict rules can be tricky. Even with careful prompt engineering, enforcing specific formats and patterns isn't always straightforward.
Structured Output: Providing Strict Syntaxes for the LLMs
Here's how we ensure structured output:
- Token Filtering: We create a filter that allows only specific "tokens" to be generated by the LLM. Structured output lets you define this filter, ensuring the LLM's creations adhere to your format.
- The
response_format
Query Parameter: Concretely, the Friendli Inference receives this option within your LLM queries to allow you to specify the desired output structure. This supports JSON schemas and regular expressions, which can be extended to many use cases including character restrictions or CSV-file generations! - Integrated in the Friendli Inference: The Friendli Inference powers all the Friendli products, including Friendli Container, Dedicated Endpoints, and Serverless Endpoints. Therefore, this feature is readily available in all of the different products.
Real-World Examples
Let's see structured output in action:
- Example 1: Structured Sentiment Analysis (JSON Schema): Imagine analyzing customer reviews and needing sentiment scores in a specific JSON format for further analysis. Structured output allows you to define the exact JSON schema, ensuring the LLM outputs data perfectly formatted for your needs. Let’s check out the example code below, where the comments describe each of the components:
As you can see from the example, the output message contains a json-formatted LLM output, containing the three properties that we have specified in our json_schema.
- Example 2: Language-Specific Results (Regex): If you need search results consisting of only a specific set of characters (e.g., Korean letters), structured output lets you define a regular expression that restricts the LLM's output to those characters (i.e., ensuring only Korean text is generated). Let’s check out the example code below:
You can see that the content of the generated output message only contains the characters restricted by our regex.
- Example 3: Data Wrangling with CSVs (Regex): One can easily scrape product information from websites and import it into a spreadsheet with AI. Structured output lets you define a CSV format, allowing the LLM to directly generate comma-separated data, ready for Excel's powerful functionalities. Let’s check out the example code:
We can see that the result is ready to be imported to a spreadsheet in a csv format.
Beyond Overcoming Probabilistic Errors
Structured output empowers you to create LLMs that are (almost) free from probabilistic errors when it comes to format and pattern adherence. This opens doors for building robust and reliable pipelines and LLM agents that leverage the power of LLMs with the control you crave.
Try it out today and unlock a whole new level of control over your LLM's creations! We offer three options to suit your preferences:
- Friendli Container: Deploy the engine on your own infrastructure for ultimate control.
- Friendli Dedicated Endpoints: Run any custom generative AI models on dedicated GPU instances in autopilot.
- Friendli Serverless Endpoints: No setup required, simply call our APIs and let us handle the rest.
Visit https://friendli.ai/try-friendli to begin your journey into the world of high-performance LLM serving with the Friendli Inference!
Written by
FriendliAI Tech & Research
Share
General FAQ
What is FriendliAI?
FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.
How does FriendliAI help my business?
Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing
Which models and modalities are supported?
Over 380,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models
Can I deploy models from Hugging Face directly?
Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership
Still have questions?
If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Talk to an expert — our experts (not a bot) will reply within one business day.