May 20, 2022
2 min read

Introducing GPT-FAI 13B: A Large-scale Language Model Trained with FriendliAI’s Friendli Training

We are happy to announce that we are releasing GPT-FAI 13B, a large-scale language model trained with Friendli Training (formerly known as PeriFlow). GPT-FAI 13B is a 13-billion parameter version of GPT-3 trained on publicly available datasets. The release allows AI model researchers to perform research on various topics in large-scale language models for non-commercial purposes.

We trained GPT-FAI 13B with Friendli Training, an end-to-end large-scale AI training and serving cloud service. Although training large models like GPT-FAI 13B requires lots of GPU power, with a cloud service, one can use the cloud to train to avoid the effort and costs of building and maintaining server clusters. However, such training requires lots of GPUs and training can take anywhere from a few days to a few months. Over long training periods, it becomes ever more paramount to optimize the execution and swiftly handle any faults that occur. Luckily, training GPT-FAI 13B was a breeze. :)

FriendliAI has built Friendli Training for any client who would like to build large-scale AI models. Friendli Training is FriendliAI’s product that realizes the company’s vision, “Make large-scale AI simple for the enterprise.” With Friendli Training, one can simplify the process of training large-scale AI models on hundreds of GPUs and serving them. Friendli Training employs various optimization technologies developed by FriendliAI. It can train faster and handle faults smoother. Friendli Training is multi-cloud; currently, it supports popular cloud vendors such as AWS, Azure, and GCP.

Once the trained model is available, one can deploy the model using our Friendli Inference for inference serving. Friendli Inference shows significant improvement over state-of-the-art inference systems such as NVIDIA FasterTransformer, a well-known inference system for Transformer models. We provide a comparison of performance results below. With Friendli Inference, anyone can create and serve large-scale AI models with ease.

GPT-FAI 13B performance
We evaluated our model on various downstream tasks with lm-evaluation-harness. Note that our model is not fine-tuned to the downstream tasks, nor did we use any sophisticated prompt engineering. The following zero-shot results may not exactly represent the performance of our model.

Evaluation results of GPT-FAI 13B

Illustrating training monitoring and fault handling
Figure 1 represents a metric collected during GPT-FAI 13B training with Friendli Training. The different colors in the graph demonstrate that despite various faults occuring, the training process was able to automatically recover.

Collected metrics during GPT-FAI 13B training

Figure 1. Collected metrics during GPT-FAI 13B training

Serving performance of the Friendli Inference

Inference latency and throughput of GPT 13B/175B on Friendli Inference and FasterTransformer

Figure 2. Inference latency and throughput of GPT 13B/175B on Friendli Inference and FasterTransformer

Figure 2 compares inference serving performance when using FasterTransformer versus the Friendli Inference on GPT 13B and 175B models with respect to text generation latency and throughput. The experiments were conducted on a single A100 GPU for the 13B model and 16 A100 GPUs for the 175B model. Friendli Inference significantly outperforms FasterTransformer, showing an order of magnitude higher throughput at the same level of latency. We plan to share more details in a few months. Stay tuned!

Friendli Inference

Written by

FriendliAI Tech & Research

General FAQ

What is FriendliAI?

FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.

How does FriendliAI help my business?

Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing

Which models and modalities are supported?

Over 380,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models

Can I deploy models from Hugging Face directly?

Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Contact Sales — our experts (not a bot) will reply within one business day.