Introducing GPT-FAI 13B: A Large-scale Language Model Trained with FriendliAI’s PeriFlow

Introducing GPT-FAI 13B: A Large-scale Language Model Trained with FriendliAI’s PeriFlow thumbnail

We are happy to announce that we are releasing GPT-FAI 13B, a large-scale language model trained with FriendliAI’s PeriFlow. GPT-FAI 13B is a 13-billion parameter version of GPT-3 trained on publicly available datasets. The release allows AI model researchers to perform research on various topics in large-scale language models for non-commercial purposes.

We trained GPT-FAI 13B with FriendliAI’s PeriFlow, an end-to-end large-scale AI training and serving cloud service. Although training large models like GPT-FAI 13B requires lots of GPU power, with a cloud service, one can use the cloud to train to avoid the effort and costs of building and maintaining server clusters. However, such training requires lots of GPUs and training can take anywhere from a few days to a few months. Over long training periods, it becomes ever more paramount to optimize the execution and swiftly handle any faults that occur. Luckily, training GPT-FAI 13B was a breeze. :)

FriendliAI has built PeriFlow for any client who would like to build large-scale AI models. PeriFlow is FriendliAI’s product that realizes the company’s vision, “Make large-scale AI simple for the enterprise.” With PeriFlow, one can simplify the process of training large-scale AI models on hundreds of GPUs and serving them. PeriFlow employs various optimization technologies developed by FriendliAI. It can train faster and handle faults smoother. PeriFlow is multi-cloud; currently, it supports popular cloud vendors such as AWS, Azure, and GCP.

Once the trained model is available, one can deploy the model using our Friendli Engine for inference serving. Friendli Engine shows significant improvement over state-of-the-art inference systems such as NVIDIA FasterTransformer, a well-known inference system for Transformer models. We provide a comparison of performance results below. With Friendli Engine, anyone can create and serve large-scale AI models with ease.

GPT-FAI 13B performance
We evaluated our model on various downstream tasks with lm-evaluation-harness. Note that our model is not fine-tuned to the downstream tasks, nor did we use any sophisticated prompt engineering. The following zero-shot results may not exactly represent the performance of our model.

Evaluation results of GPT-FAI 13B

Illustrating training monitoring and fault handling
Figure 1 represents a metric collected during GPT-FAI 13B training with PeriFlow. The different colors in the graph demonstrate that despite various faults occuring, the training process was able to automatically recover.

Illustrating training monitoring and fault handling

Figure 1. Collected metrics during GPT-FAI 13B training

Serving performance of the Friendli Engine

Inference serving performance when using FasterTransformer versus the Friendli Engine on GPT 13B and 175B models with respect to text generation latency and throughput

Figure 2. Inference latency and throughput of GPT 13B/175B on PeriFlow Serving and FasterTransformer

Figure 2 compares inference serving performance when using FasterTransformer versus the Friendli Engine on GPT 13B and 175B models with respect to text generation latency and throughput. The experiments were conducted on a single A100 GPU for the 13B model and 16 A100 GPUs for the 175B model. Friendli Engine significantly outperforms FasterTransformer, showing an order of magnitude higher throughput at the same level of latency. We plan to share more details in a few months. Stay tuned!

FriendliAI

Contact us if you want to try out Friendli Engine!



Share

Related Posts

Friendli Engine: How to Serve Large-scale Transformer Models thumbnail
  • July 18, 2022
  • 7 min read

Friendli Engine: How to Serve Large-scale Transformer Models

AI
Machine Learning
System Architecture
Easily Migrating LLM Inference Serving from vLLM to Friendli Container thumbnail
  • April 12, 2024
  • 3 min read

Easily Migrating LLM Inference Serving from vLLM to Friendli Container

vLLM
Friendli Container
Serving
See all from blog