(function() { var utmInheritingDomain = "appstore.com", utmRegExp = /(&|\?)utm_[A-Za-z]+=[A-Za-z0-9]+/gi, links = document.getElementsByTagName("a"), utms = [ "utm_medium={{URL - utm_medium}}", "utm_source={{URL - utm_source}}", "utm_campaign={{URL - utm_campaign}}" ]; for (var index = 0; index < links.length; index += 1) { var tempLink = links[index].href, tempParts; if (tempLink.indexOf(utmInheritingDomain) > 0) { tempLink = tempLink.replace(utmRegExp, ""); tempParts = tempLink.split("#"); if (tempParts[0].indexOf("?") < 0 ) { tempParts[0] += "?" + utms.join("&"); } else { tempParts[0] += "&" + utms.join("&"); } tempLink = tempParts.join("#"); } links[index].href = tempLink; } }());
  • January 17, 2023
  • 3 min read

Fine-tuning and Serving CodeGen, a Code Generation Model, with Friendli Engine

Fine-tuning and Serving CodeGen, a Code Generation Model, with Friendli Engine thumbnail

CodeGen, unveiled in 2022 by Salesforce, is a language model that allows users to create programs with natural language without extensive programming knowledge. CodeGen is an exciting tool as it enables humans and AI to program together, making programming easier and faster than ever before.

As an example of its capabilities, CodeGen can take a “Return n-th Fibonacci number” request and quickly generate the corresponding JavaScript code. This implies that developers can quickly convert complex concepts into lines of code without having to manually write the code themselves, saving valuable time and resources. Let’s take a look as below.

Input to CodeGen:

javascript
/* Return n-th Fibonacci number .
  >>> fib(10)
  55
  >>> fib(1)
  1
  >>> fib(8)
  21
  */
  const fib = (n) => {

Output generated by CodeGen:

javascript
 if (n < 2) {
   return n;
 }
 return fib(n - 1) + fib(n - 2);
};

CodeGen is multilingual, meaning the model can support popular programming languages such as Python, Java, Javascript, C++, and Go. CodeGen chooses the appropriate language according to clues from users just like the above example; in this case, it generated JavaScript code by looking at the style of comments and function signature.

Another example is test generation. CodeGen quickly generates tests, which is extremely helpful in the process of testing program validity. By having tests automatically generated, software developers can improve the quality of their code with much less effort. The following shows an example of test code generation.

Input to CodeGen:

javascript
// sumToN is a function that sums numbers from 1 to n.
const sumToN = (n) => {
 return n \* (n + 1) / 2;
}
// write a test code that assets the correctness of sumToN()
const testSumToN = () => {

Output generated by CodeGen:

javascript
  assert.equal(sumToN(1), 1);
  assert.equal(sumToN(2), 3);
  assert.equal(sumToN(3), 6);
  assert.equal(sumToN(4), 10);
  assert.equal(sumToN(5), 15);
  assert.equal(sumToN(6), 21);
  assert.equal(sumToN(7), 28);
  assert.equal(sumToN(8), 36);
  assert.equal(sumToN(9), 45);
  assert.equal(sumToN(10), 55);
}

Users of CodeGen can now ensure code quality with minimal time and resource commitment, allowing them more time to focus on the core logic of their program, resulting in faster delivery time, improved efficiency, and greater productivity.

With Friendli Engine, one can adapt (i.e., perform fine-tuning or parameter-efficient training) CodeGen and serve CodeGen effortlessly and quickly for their applications, further emphasizing FriendliAI’s commitment towards making generative AI serving and training simple and cost-efficient. With Friendli Engine, our clients can perform CodeGen inference much faster than any other solution available.

We compared the serving performance on Friendli Engine (a.k.a. PeriFlow or Orca) against NVIDIA Triton + FasterTransformer in serving CodeGen with 16B parameters on a NVIDIA A100 80GB GPU. In the experiments, the range of the input token length is between 128 and 512, and the range of the output token length is between 32 and 256. The below figure shows throughput (req/s) and mean latency (ms).

Throughput and mean latency comparison on Friendli Engine

Our findings reveal that the serving performance of the Friendli Engine (Orca) outperforms Triton + FasterTransformer by achieving an astonishing 30X higher throughput at the same latency level, thanks to its novel architecture. Note that the actual gain can vary depending on workloads and hardware. In addition to our previous posts about Orca’s speedup for both large and small-scale GPT3 and T5 models, we demonstrate that Friendli Engine is an ideal choice for minimizing costs when it comes to serving CodeGen.

In summary, CodeGen opens up an exciting opportunity for humans and AI to program together. With Friendli Engine, users can quickly adapt CodeGen and serve the model much more efficiently. We are thrilled about the potential to open doors for any users who wish to leverage Friendli Engine to combine the best of human creativity with AI-driven coding capabilities!

For more information about FriendliAI, check the link.
About Friendli Engine, check the
link.


Written by

FriendliAI logo

FriendliAI Tech & Research


Share