- July 3, 2023
- 2 min read
PeriFlow’s Enriched Coverage for Sought-After LLMs: MPT, LLaMA, and Dolly
We have some exciting news to share!
As you probably know, FriendliAI’s PeriFlow supports various LLMs, including GPT and T5. We further added support for three more highly sought-after open-source models: MPT , LLaMA , and Dolly .
MosaicML provides tools that streamline the process of training machine learning models and it has opened-sourced LLMs recently. Recognizing its value, Databricks recently announced the acquisition of MosaicML for $1.3B .
MosaicML’s MPT-7B  and MPT-30B  have been trained using state-of-the-art techniques such as Alibi and FlashAttention. MPT-30B especially supports a long-context inference by leveraging an 8K context window during training. Furthermore, it stands out as the first public model trained on an NVIDIA H100 cluster.
LLaMA stands as a collection of foundation models of Meta, providing various parameter sizes: 7B, 13B, 33B, and 65B. Remarkably, the LLaMA-13B model surpasses the GPT-3 175B model on certain tasks , despite having parameters an order of magnitude smaller.
The true value of LLaMA lies in its contribution to the research community — openly sharing the training methodology, including the model architecture and code. This transparency fosters a collaborative environment, where researchers can either fine-tune existing LLaMA models or create their models from scratch by adopting LLaMA’s insights. For example, Alpaca , Vicuna , Gorilla , and Koala  are fine-tuned derivatives from the LLaMA models, while RedPajama  is a fully open-source reproduction of LLaMA.
Dolly is an open-source language model developed by Datatbricks, based on the Pythia model of EleutherAI . In addition to the model checkpoint, Databricks introduced ‘databricks-dolly-15k’ , a new high-quality human-generated instruction dataset that played a crucial role in fine-tuning Dolly. By virtue of the new dataset, Dolly is the first open-source instruction-following language model, catering to both research and commercial applications.
In summary, PeriFlow supports most of the LLMs — and can now serve MPT, LLaMA, and Dolly. PeriFlow moreover supports various data types including fp32, fp16, bf16, and int8 (for int8, please refer to our recent blog post!), and tensor/pipeline parallelisms for various serving environments. Enjoy PeriFlow’s high performances for serving LLM models including MPT, LLaMA, and Dolly!
 Touvron, Hugo, et al. “Llama: Open and efficient foundation language models.” arXiv preprint arXiv:2302.13971 (2023).
FriendliAI Tech & Research