FriendliAI Secures $20M to Accelerate AI Inference Innovation — Read the Full Story

Research


OUR RESEARCH

Published by FriendliAI and SNU at OSDI 2022

Orca: A Distributed Serving System for Transformer-Based Generative Models

Orca is iteration level scheduling or selective batching (a.k.a., continuous batching) mechanism for large scale Transformer based models for generation tasks. ORCA can significantly outperform NVIDIA FasterTransformer with 36.9x throughput improvement at the same level of latency.

Read more

Published by FriendliAI and SNU at ICML ‘23

BPipe: Memory-Balanced Pipeline Parallelism for Training Large Language Models

BPipe balances memory utilization and enhances the training efficiency of LLM by eliminating recomputations or increasing the micro-batch size. BPipe employs an activation balancing method to transfer intermediate activations between GPUs during training, enabling all GPUs to utilize comparable amounts of memory.

Read more

Published by FriendliAI and SNU at Proceedings of VLDB

Hippo: Sharing Computations in Hyper-Parameter Optimization

Hippo is a hyper-parameter optimization system that reuses computation across trials to reduce the overall amount of computation significantly. Hippo breaks down the hyper-parameter sequences into stages and merges common stages to form a stage tree.

Read more

Products

Friendli Dedicated EndpointsFriendli Serverless EndpointsFriendli Container

Solutions

InferenceUse Cases
Models

Developers

DocsBlogResearch

Company

About usNewsCareersPatentsBrand ResourcesContact us
Pricing

Contact us:

contact@friendli.ai

FriendliAI Corp:

Redwood City, CA

Hub:

Seoul, Korea

Privacy Policy

Service Level Agreement

Terms of Service

CA Notice

Copyright © 2025 FriendliAI Corp. All rights reserved