(function() { var utmInheritingDomain = "appstore.com", utmRegExp = /(&|\?)utm_[A-Za-z]+=[A-Za-z0-9]+/gi, links = document.getElementsByTagName("a"), utms = [ "utm_medium={{URL - utm_medium}}", "utm_source={{URL - utm_source}}", "utm_campaign={{URL - utm_campaign}}" ]; for (var index = 0; index < links.length; index += 1) { var tempLink = links[index].href, tempParts; if (tempLink.indexOf(utmInheritingDomain) > 0) { tempLink = tempLink.replace(utmRegExp, ""); tempParts = tempLink.split("#"); if (tempParts[0].indexOf("?") < 0 ) { tempParts[0] += "?" + utms.join("&"); } else { tempParts[0] += "&" + utms.join("&"); } tempLink = tempParts.join("#"); } links[index].href = tempLink; } }());
  • May 20, 2022
  • 3 min read

Introducing GPT-FAI 13B: A Large-scale Language Model Trained with FriendliAI’s Friendli Training

Introducing GPT-FAI 13B: A Large-scale Language Model Trained with FriendliAI’s Friendli Training thumbnail

We are happy to announce that we are releasing GPT-FAI 13B, a large-scale language model trained with Friendli Training (formerly known as PeriFlow). GPT-FAI 13B is a 13-billion parameter version of GPT-3 trained on publicly available datasets. The release allows AI model researchers to perform research on various topics in large-scale language models for non-commercial purposes.

We trained GPT-FAI 13B with Friendli Training, an end-to-end large-scale AI training and serving cloud service. Although training large models like GPT-FAI 13B requires lots of GPU power, with a cloud service, one can use the cloud to train to avoid the effort and costs of building and maintaining server clusters. However, such training requires lots of GPUs and training can take anywhere from a few days to a few months. Over long training periods, it becomes ever more paramount to optimize the execution and swiftly handle any faults that occur. Luckily, training GPT-FAI 13B was a breeze. :)

FriendliAI has built Friendli Training for any client who would like to build large-scale AI models. Friendli Training is FriendliAI’s product that realizes the company’s vision, “Make large-scale AI simple for the enterprise.” With Friendli Training, one can simplify the process of training large-scale AI models on hundreds of GPUs and serving them. Friendli Training employs various optimization technologies developed by FriendliAI. It can train faster and handle faults smoother. Friendli Traning is multi-cloud; currently, it supports popular cloud vendors such as AWS, Azure, and GCP.

Once the trained model is available, one can deploy the model using our Friendli Engine for inference serving. Friendli Engine shows significant improvement over state-of-the-art inference systems such as NVIDIA FasterTransformer, a well-known inference system for Transformer models. We provide a comparison of performance results below. With Friendli Engine, anyone can create and serve large-scale AI models with ease.

GPT-FAI 13B performance
We evaluated our model on various downstream tasks with lm-evaluation-harness. Note that our model is not fine-tuned to the downstream tasks, nor did we use any sophisticated prompt engineering. The following zero-shot results may not exactly represent the performance of our model.

Evaluation results of GPT-FAI 13B

Illustrating training monitoring and fault handling
Figure 1 represents a metric collected during GPT-FAI 13B training with Friendli Training. The different colors in the graph demonstrate that despite various faults occuring, the training process was able to automatically recover.

Illustrating training monitoring and fault handling

Figure 1. Collected metrics during GPT-FAI 13B training

Serving performance of the Friendli Engine

Inference serving performance when using FasterTransformer versus the Friendli Engine on GPT 13B and 175B models with respect to text generation latency and throughput

Figure 2. Inference latency and throughput of GPT 13B/175B on Friendli Engine and FasterTransformer

Figure 2 compares inference serving performance when using FasterTransformer versus the Friendli Engine on GPT 13B and 175B models with respect to text generation latency and throughput. The experiments were conducted on a single A100 GPU for the 13B model and 16 A100 GPUs for the 175B model. Friendli Engine significantly outperforms FasterTransformer, showing an order of magnitude higher throughput at the same level of latency. We plan to share more details in a few months. Stay tuned!

FriendliAI

Contact us if you want to try out Friendli Engine!


Written by

FriendliAI logo

FriendliAI Tech & Research


Share