Fine-tuning and Serving CodeGen, a Code Generation Model, with PeriFlow

Blog post thumbnail

CodeGen, unveiled in 2022 by Salesforce, is a language model that allows users to create programs with natural language instead of having to use extensive programming knowledge. CodeGen is an exciting tool as it enables humans and AI to program together, making programming easier and faster than ever before.

As an example of its capabilities, CodeGen can take a “Return n-th Fibonacci number” request and quickly generate the corresponding JavaScript code. This implies that developers can quickly convert complex concepts into lines of code without having to manually write the code themselves, saving valuable time and resources. Let’s take a look as below.

Input to CodeGen:

/*Return n-th Fibonacci number .
  >>> fib(10)
  >>> fib(1)
  >>> fib(8)
  const fib = (n) => {

Output generated by CodeGen:

 if(n < 2) {
   return n;
 return fib(n - 1) + fib(n - 2);

CodeGen is multilingual, meaning the model can support popular programming languages such asd Python, Java, Javascript, C++, or Go. CodeGen chooses the appropriate language according to clues from users just like the above example; in this case, it generated JavaScript code by looking at the style of comments and function signature.

Another example is test generation. CodeGen quickly generates tests, which is extremely helpful in the process of testing program validity. By having tests automatically generated, software developers can improve the quality of their code but with much less effort. The following shows an example of test code generation.

Input to CodeGen:

// sumToN is a function that sums numbers from 1 to n.
const sumToN = (n) => {
 return n \* (n + 1) / 2;
// write a test code that assets the correctness of sumToN()
const testSumToN = () => {

Output generated by CodeGen:

  assert.equal(sumToN(1), 1);
  assert.equal(sumToN(2), 3);
  assert.equal(sumToN(3), 6);
  assert.equal(sumToN(4), 10);
  assert.equal(sumToN(5), 15);
  assert.equal(sumToN(6), 21);
  assert.equal(sumToN(7), 28);
  assert.equal(sumToN(8), 36);
  assert.equal(sumToN(9), 45);
  assert.equal(sumToN(10), 55);

Users of CodeGen can now ensure quality of code with a minimal time and resource commitment thanks to its great ability. This sets those who use it ahead by allowing them more time to focus on the core logic of their program, which results in a faster delivery time, improved efficiency and greater productivity.

With PeriFlow, one can adapt (i.e., perform fine-tuning or parameter-efficient training) CodeGen and serve CodeGen effortlessly and quickly for the applications, which further emphasizes FriendliAI’s commitment towards making generative AI serving and training simple and cost-efficient. Our clients can use PeriFlow to perform the inference of CodeGen much faster than any other solution available.

We compared PeriFlow Serving (Orca) against the NVIDIA Triton + FasterTransformer in serving CodeGen with 16B parameters on a NVIDIA A100 80GB GPU. In the experiments, the range of the input token length is between 128 and 512, and the range of the output token length is between 32 and 256. The below figure shows throughput (req/s) and mean latency (ms).

Our findings reveal that PeriFlow Serving (Orca) outperforms Triton + FasterTransformer by achieving an astonishing 30X higher throughput at the same latency level thanks to its novel architecture. Note that the actual gain can vary depending on workloads and hardware. In addition to our previous posts about Orca’s speedup for large- and small-scale GPT3 and T5 models, we demonstrate that PeriFlow is an ideal choice for minimizing costs when it comes to serving the CodeGen.

In summary, CodeGen opens up an exciting opportunity for humans and AI to collaborate to program together. With PeriFlow, one can quickly adapt CodeGen and serve the model much more efficiently. We are thrilled about what we can provide as it has the potential to open the doors for many users who wish to leverage PeriFlow for combining the best of human creativity with AI-driven coding capabilities!

For more information about FriendliAI, check the link.
About PeriFlow, check the


Related Posts

  • June 27, 2023
  • 3 min read

Get an Extra Speedup of LLM Inference with Integer Quantization on PeriFlow

Generative Model
  • November 1, 2022
  • 1 min read

Save on Training Costs of Generative AI with PeriFlow

Machine Learning
See all from blog
We use cookiesWe use cookies to enhance your browsing experience on our website. By clicking “Accept all,” you consent to our use of cookies.
scroll to top