FriendliAI Secures $20M to Accelerate AI Inference Innovation — Read the Full Story

  • January 17, 2023
  • 3 min read

Fine-tuning and Serving CodeGen, a Code Generation Model, with Friendli Inference

Fine-tuning and Serving CodeGen, a Code Generation Model, with Friendli Inference thumbnail

CodeGen, unveiled in 2022 by Salesforce, is a language model that allows users to create programs with natural language without extensive programming knowledge. CodeGen is an exciting tool as it enables humans and AI to program together, making programming easier and faster than ever before.

As an example of its capabilities, CodeGen can take a “Return n-th Fibonacci number” request and quickly generate the corresponding JavaScript code. This implies that developers can quickly convert complex concepts into lines of code without having to manually write the code themselves, saving valuable time and resources. Let’s take a look as below.

Input to CodeGen:

Output generated by CodeGen:

CodeGen is multilingual, meaning the model can support popular programming languages such as Python, Java, Javascript, C++, and Go. CodeGen chooses the appropriate language according to clues from users just like the above example; in this case, it generated JavaScript code by looking at the style of comments and function signature.

Another example is test generation. CodeGen quickly generates tests, which is extremely helpful in the process of testing program validity. By having tests automatically generated, software developers can improve the quality of their code with much less effort. The following shows an example of test code generation.

Input to CodeGen:

Output generated by CodeGen:

Users of CodeGen can now ensure code quality with minimal time and resource commitment, allowing them more time to focus on the core logic of their program, resulting in faster delivery time, improved efficiency, and greater productivity.

With Friendli Inference, one can adapt (i.e., perform fine-tuning or parameter-efficient training) CodeGen and serve CodeGen effortlessly and quickly for their applications, further emphasizing FriendliAI’s commitment towards making generative AI serving and training simple and cost-efficient. With Friendli Inference, our clients can perform CodeGen inference much faster than any other solution available.

We compared the serving performance on Friendli Inference (a.k.a. PeriFlow or Orca) against NVIDIA Triton + FasterTransformer in serving CodeGen with 16B parameters on a NVIDIA A100 80GB GPU. In the experiments, the range of the input token length is between 128 and 512, and the range of the output token length is between 32 and 256. The below figure shows throughput (req/s) and mean latency (ms).

Throughput and mean latency comparison on Friendli Inference

Our findings reveal that the serving performance of the Friendli Inference (Orca) outperforms Triton + FasterTransformer by achieving an astonishing 30X higher throughput at the same latency level, thanks to its novel architecture. Note that the actual gain can vary depending on workloads and hardware. In addition to our previous posts about Orca’s speedup for both large and small-scale GPT3 and T5 models, we demonstrate that Friendli Inference is an ideal choice for minimizing costs when it comes to serving CodeGen.

In summary, CodeGen opens up an exciting opportunity for humans and AI to program together. With Friendli Inference, users can quickly adapt CodeGen and serve the model much more efficiently. We are thrilled about the potential to open doors for any users who wish to leverage Friendli Inference to combine the best of human creativity with AI-driven coding capabilities!

For more information about FriendliAI, check the link.
About Friendli Inference, check the
link.


Written by

FriendliAI Tech & Research


Share


General FAQ

What is FriendliAI?

FriendliAI is a GPU-inference platform that lets you deploy, scale, and monitor large language and multimodal models in production, without owning or managing GPU infrastructure. We offer three things for your AI models: Unmatched speed, cost efficiency, and operational simplicity. Find out which product is the best fit for you in here.

How does FriendliAI help my business?

Our Friendli Inference allows you to squeeze more tokens-per-second out of every GPU. Because you need fewer GPUs to serve the same load, the true metric—tokens per dollar—comes out higher even if the hourly GPU rate looks similar on paper. View pricing

Which models and modalities are supported?

Over 380,000 text, vision, audio, and multi-modal models are deployable out of the box. You can also upload custom models or LoRA adapters. Explore models

Can I deploy models from Hugging Face directly?

Yes. A one-click deploy by selecting “Friendli Endpoints” on the Hugging Face Hub will take you to our model deployment page. The page provides an easy-to-use interface for setting up Friendli Dedicated Endpoints, a managed service for generative AI inference. Learn more about our Hugging Face partnership

Still have questions?

If you want a customized solution for that key issue that is slowing your growth, contact@friendli.ai or click Talk to an expert — our experts (not a bot) will reply within one business day.


Related Posts

Get an Extra Speedup of LLM Inference with Integer Quantization on Friendli Inference thumbnail
  • June 27, 2023
  • 3 min read

Get an Extra Speedup of LLM Inference with Integer Quantization on Friendli Inference

Quantization
LLM inference
INT8
Save on Training Costs of Generative AI with Friendli Training thumbnail
  • November 1, 2022
  • 1 min read

Save on Training Costs of Generative AI with Friendli Training

GenAI
LLM training
See all from blog

Products

Friendli Dedicated EndpointsFriendli Serverless EndpointsFriendli Container

Solutions

InferenceUse Cases
Models

Developers

DocsBlogResearch

Company

About usNewsCareersPatentsBrand ResourcesContact us
Pricing

Contact us:

contact@friendli.ai

FriendliAI Corp:

Redwood City, CA

Hub:

Seoul, Korea

Privacy PolicyService Level AgreementTerms of ServiceCA Notice

Copyright © 2025 FriendliAI Corp. All rights reserved

javascript
/* Return n-th Fibonacci number .
  >>> fib(10)
  55
  >>> fib(1)
  1
  >>> fib(8)
  21
  */
  const fib = (n) => {
javascript
 if (n < 2) {
   return n;
 }
 return fib(n - 1) + fib(n - 2);
};
javascript
// sumToN is a function that sums numbers from 1 to n.
const sumToN = (n) => {
 return n \* (n + 1) / 2;
}
// write a test code that assets the correctness of sumToN()
const testSumToN = () => {
javascript
  assert.equal(sumToN(1), 1);
  assert.equal(sumToN(2), 3);
  assert.equal(sumToN(3), 6);
  assert.equal(sumToN(4), 10);
  assert.equal(sumToN(5), 15);
  assert.equal(sumToN(6), 21);
  assert.equal(sumToN(7), 28);
  assert.equal(sumToN(8), 36);
  assert.equal(sumToN(9), 45);
  assert.equal(sumToN(10), 55);
}