Research

OUR RESEARCH

Published by FriendliAI and SNU at OSDI 2022

Orca: A Distributed Serving System for Transformer-Based Generative Models

Orca is iteration level scheduling or selective batching (a.k.a., continuous batching) mechanism for large scale Transformer based models for generation tasks. Orca can significantly outperform NVIDIA FasterTransformer with 36.9x throughput improvement at the same level of latency.

Published by FriendliAI and SNU at ICML ‘23

BPipe: Memory-Balanced Pipeline Parallelism for Training Large Language Models

BPipe balances memory utilization and enhances the training efficiency of LLM by eliminating recomputations or increasing the micro-batch size. BPipe employs an activation balancing method to transfer intermediate activations between GPUs during training, enabling all GPUs to utilize comparable amounts of memory.

Published by FriendliAI and SNU at Proceedings of VLDB

Hippo: Sharing Computations in Hyper-Parameter Optimization

Hippo is a hyper-parameter optimization system that reuses computation across trials to reduce the overall amount of computation significantly. Hippo breaks down the hyper-parameter sequences into stages and merges common stages to form a stage tree.

Published by FriendliAI at EuroSys 2026

Garen: Reliable Cluster Management with Atomic State Reconciliation

Garen is a cluster management system that implements Atomic State Reconciliation (ASR) to protect clusters against state inconsistencies. Garen automatically resolves consistency bugs in controllers without performance loss, significantly improving the reliability of cluster operations.

Research

Orca: A Distributed Serving System for Transformer-Based Generative Models

BPipe: Memory-Balanced Pipeline Parallelism for Training Large Language Models

Hippo: Sharing Computations in Hyper-Parameter Optimization

Garen: Reliable Cluster Management with Atomic State Reconciliation

Explore FriendliAI today

Research

Orca: A Distributed Serving System for Transformer-Based Generative Models

BPipe: Memory-Balanced Pipeline Parallelism for Training Large Language Models

Hippo: Sharing Computations in Hyper-Parameter Optimization

Garen: Reliable Cluster Management with Atomic State Reconciliation

Explore FriendliAI today