Introducing GPT-FAI 13B: A Large-scale Language Model Trained with FriendliAI’s PeriFlow

Blog post thumbnail

We are happy to announce that we are releasing GPT-FAI 13B, a large-scale language model trained with FriendliAI’s PeriFlow. GPT-FAI 13B is a 13-billion parameter version of GPT-3 trained on publicly available datasets. The release allows AI model researchers to perform research on various topics in large-scale language models for non-commercial purposes.

We trained GPT-FAI 13B with FriendliAI’s PeriFlow, an end-to-end large-scale AI training and serving cloud service. Although training large models like GPT-FAI 13B requires lots of GPU power, with a cloud service, one can use the cloud to train to avoid the efforts and costs of building and maintaining server clusters. However, such training requires lots of GPUs and training can take anywhere from a few days to a few months. Optimizing the execution and swiftly handling faults that occur become more paramount over such long training periods. Luckily, training GPT-FAI 13B was a breeze. :)

FriendliAI has built PeriFlow for any client who would like to build large-scale AI models. PeriFlow is FriendliAI’s product that realizes the company’s vision, “Make large-scale AI simple for the enterprise.” With PeriFlow, one can simplify the process of training large-scale AI models on hundreds of GPUs and serving them. PeriFlow employs various optimization technologies developed by FriendliAI. It can train faster and handle faults smoother. PeriFlow is multi-cloud; currently, it supports popular cloud vendors: AWS, Azure, and GCP.

After the trained model is available, one can deploy the model on PeriFlow Serving for inference. PeriFlow Serving shows significant improvement over state-of-the-art inference systems such as NVIDIA FasterTransformer, a well-known inference system for Transformer models. We provide a comparison of performance results below. With PeriFlow, anyone can create and serve large-scale AI models.

GPT-FAI 13B performance
We evaluated our model on various downstream tasks with lm-evaluation-harness. Note that our model is not fine-tuned to the downstream tasks, nor did we use any sophisticated prompt engineering. The following zero-shot results may not exactly represent the performance of our model.

Illustrating training monitoring and fault handling
Figure1 represents a metric collected during GPT-FAI 13B training with PeriFlow. The different colors in the graph demonstrate training is automatically recovered despite various faults that occurred during training.

PeriFlow Serving performance

Figure 2 compares FasterTransformer and PeriFlow Serving on GPT 13B and 175B models with respect to text generation latency and throughput. The experiments were conducted on a single A100 GPU for the 13B model and 16 A100 GPUs for the 175B model. PeriFlow significantly outperforms FasterTransformer, showing an order of magnitude higher throughput at the same level of latency. We plan to share more details in a few months. Stay tuned!

Contact us if you want to try out PeriFlow!


Related Posts

  • July 18, 2022
  • 7 min read

PeriFlow: How to Serve Large-scale Transformer Models

Machine Learning
System Architecture
  • December 11, 2023
  • 3 min read

Groundbreaking Performance of the Friendli Engine for LLM Serving on an NVIDIA H100 GPU

See all from blog
We use cookiesWe use cookies to enhance your browsing experience on our website. By clicking “Accept all,” you consent to our use of cookies.
scroll to top