PeriFlow’s Enriched Coverage for Sought-After LLMs: MPT, LLaMA, and Dolly

Blog post thumbnail

We have some exciting news to share!

As you probably know, FriendliAI’s PeriFlow supports various LLMs, including GPT and T5. We further added support for three more highly sought-after open-source models: MPT [1], LLaMA [2], and Dolly [3].


MosaicML provides tools that streamline the process of training machine learning models and it has opened-sourced LLMs recently. Recognizing its value, Databricks recently announced the acquisition of MosaicML for $1.3B [4].

MosaicML’s MPT-7B [5] and MPT-30B [1] have been trained using state-of-the-art techniques such as Alibi and FlashAttention. MPT-30B especially supports a long-context inference by leveraging an 8K context window during training. Furthermore, it stands out as the first public model trained on an NVIDIA H100 cluster.


LLaMA stands as a collection of foundation models of Meta, providing various parameter sizes: 7B, 13B, 33B, and 65B. Remarkably, the LLaMA-13B model surpasses the GPT-3 175B model on certain tasks [2], despite having parameters an order of magnitude smaller.

The true value of LLaMA lies in its contribution to the research community — openly sharing the training methodology, including the model architecture and code. This transparency fosters a collaborative environment, where researchers can either fine-tune existing LLaMA models or create their models from scratch by adopting LLaMA’s insights. For example, Alpaca [6], Vicuna [7], Gorilla [8], and Koala [9] are fine-tuned derivatives from the LLaMA models, while RedPajama [10] is a fully open-source reproduction of LLaMA.


Dolly is an open-source language model developed by Datatbricks, based on the Pythia model of EleutherAI [11]. In addition to the model checkpoint, Databricks introduced ‘databricks-dolly-15k’ [12], a new high-quality human-generated instruction dataset that played a crucial role in fine-tuning Dolly. By virtue of the new dataset, Dolly is the first open-source instruction-following language model, catering to both research and commercial applications.

In summary, PeriFlow supports most of the LLMs — and can now serve MPT, LLaMA, and Dolly. PeriFlow moreover supports various data types including fp32, fp16, bf16, and int8 (for int8, please refer to our recent blog post!), and tensor/pipeline parallelisms for various serving environments. Enjoy PeriFlow’s high performances for serving LLM models including MPT, LLaMA, and Dolly!

For more information about FriendliAI, check the link.
About PeriFlow, check the


[2] Touvron, Hugo, et al. “Llama: Open and efficient foundation language models.” arXiv preprint arXiv:2302.13971 (2023).












Related Posts

  • July 13, 2023
  • 5 min read

Accelerating LLM Training with Memory-Balanced Pipeline Parallelism

Large Language Models
Distributed Systems
  • June 27, 2023
  • 3 min read

Get an Extra Speedup of LLM Inference with Integer Quantization on PeriFlow

Generative Model
See all from blog
We use cookiesWe use cookies to enhance your browsing experience on our website. By clicking “Accept all,” you consent to our use of cookies.
scroll to top