(function() { var utmInheritingDomain = "appstore.com", utmRegExp = /(&|\?)utm_[A-Za-z]+=[A-Za-z0-9]+/gi, links = document.getElementsByTagName("a"), utms = [ "utm_medium={{URL – utm_medium}}", "utm_source={{URL – utm_source}}", "utm_campaign={{URL – utm_campaign}}" ]; for (var index = 0; index < links.length; index += 1) { var tempLink = links[index].href, tempParts; if (tempLink.indexOf(utmInheritingDomain) > 0) { tempLink = tempLink.replace(utmRegExp, ""); tempParts = tempLink.split("#"); if (tempParts[0].indexOf("?") < 0 ) { tempParts[0] += "?" + utms.join("&"); } else { tempParts[0] += "&" + utms.join("&"); } tempLink = tempParts.join("#"); } links[index].href = tempLink; } }());
Latest post
  • July 24, 2024
  • 4 min read

Llama 3 70B Outperforms GPT-4o in Function Calling with Friendli Tools

Read full article
Llama 3 70B Outperforms GPT-4o in Function Calling with Friendli Tools thumbnail
  • July 24, 2024
  • 4 min read

Llama 3 70B Outperforms GPT-4o in Function Calling with Friendli Tools

Read full article

Building AI Agents Using Function Calling with LLMs thumbnail
  • July 22, 2024
  • 11 min read

Building AI Agents Using Function Calling with LLMs

Function calling
AI agents
Function Calling: Connecting LLMs with Functions and APIs thumbnail
  • July 18, 2024
  • 6 min read

Function Calling: Connecting LLMs with Functions and APIs

Function calling
LLMs
Showcasing FriendliAI’s Integration with LiteLLM thumbnail
  • July 12, 2024
  • 7 min read

Showcasing FriendliAI’s Integration with LiteLLM

LiteLLM
Building AI-Powered Web Applications In 20 Minutes with FriendliAI, Vercel AI SDK, and Next.js thumbnail
  • June 27, 2024
  • 11 min read

Building AI-Powered Web Applications In 20 Minutes with FriendliAI, Vercel AI SDK, and Next.js

Vercel
Vercel AI SDK
Next.js
Level Up Your Client-Side Interactions with Friendli's gRPC Support thumbnail
  • June 25, 2024
  • 3 min read

Level Up Your Client-Side Interactions with Friendli's gRPC Support

gRPC
Deploying Weights & Biases Model Checkpoints on Friendli Dedicated Endpoints thumbnail
  • June 20, 2024
  • 4 min read

Deploying Weights & Biases Model Checkpoints on Friendli Dedicated Endpoints

Weights & Biases
Deploying Your Inference Endpoints on AWS Sagemaker with Friendli Container thumbnail
  • June 12, 2024
  • 5 min read

Deploying Your Inference Endpoints on AWS Sagemaker with Friendli Container

AWS
Sagemaker
Container
Introducing Structured Output on Friendli Engine for Building LLM Agents thumbnail
  • June 10, 2024
  • 8 min read

Introducing Structured Output on Friendli Engine for Building LLM Agents

LLM agents
Friendli Engine
Measuring LLM Serving Performance with LLMServingPerfEvaluator thumbnail
  • May 22, 2024
  • 11 min read

Measuring LLM Serving Performance with LLMServingPerfEvaluator

LLM
Serving
Building a RAG Chatbot with Friendli, MongoDB Atlas, and LangChain thumbnail
  • May 3, 2024
  • 5 min read

Building a RAG Chatbot with Friendli, MongoDB Atlas, and LangChain

RAG
LangChain
MongoDB
Meta Llama 3 now available on Friendli thumbnail
  • April 29, 2024
  • 3 min read

Meta Llama 3 now available on Friendli

Llama3
Meta
Easily Migrating LLM Inference Serving from vLLM to Friendli Container thumbnail
  • April 12, 2024
  • 3 min read

Easily Migrating LLM Inference Serving from vLLM to Friendli Container

vLLM
Friendli Container
Serving
Building Your RAG Application on LlamaIndex with Friendli Engine: A Step-by-Step Guide thumbnail
  • April 8, 2024
  • 3 min read

Building Your RAG Application on LlamaIndex with Friendli Engine: A Step-by-Step Guide

RAG
LlamaIndex
Improve Latency and Throughput with Weight-Activation Quantization in FP8 thumbnail
  • April 3, 2024
  • 6 min read

Improve Latency and Throughput with Weight-Activation Quantization in FP8

WAQ
FP8
Running Quantized Mixtral 8x7B on a Single GPU thumbnail
  • February 28, 2024
  • 3 min read

Running Quantized Mixtral 8x7B on a Single GPU

Mixtral
AWQ
Serving Performances of Mixtral 8x7B, a Mixture of Experts (MoE) Model thumbnail
  • February 20, 2024
  • 4 min read

Serving Performances of Mixtral 8x7B, a Mixture of Experts (MoE) Model

Mixtral
MoE
Which Quantization to Use to Reduce the Size of LLMs? thumbnail
  • February 15, 2024
  • 4 min read

Which Quantization to Use to Reduce the Size of LLMs?

AWQ
Quantization
LLM
Friendli TCache: Optimizing LLM Serving by Reusing Computations thumbnail
  • February 7, 2024
  • 2 min read

Friendli TCache: Optimizing LLM Serving by Reusing Computations

LLM
Serving
Grouped Query Attention (GQA) vs. Multi Head Attention (MHA): Optimizing LLM Inference Serving thumbnail
  • February 2, 2024
  • 4 min read

Grouped Query Attention (GQA) vs. Multi Head Attention (MHA): Optimizing LLM Inference Serving

GQA
MHA
MQA
Faster and Cheaper Mixtral 8×7B on Friendli Serverless Endpoints thumbnail
  • January 24, 2024
  • 3 min read

Faster and Cheaper Mixtral 8×7B on Friendli Serverless Endpoints

LLM
Serving
The LLM Serving Engine Showdown: Friendli Engine Outshines thumbnail
  • January 12, 2024
  • 3 min read

The LLM Serving Engine Showdown: Friendli Engine Outshines

LLM
Serving Engine
Friendli Serverless Endpoints: Unleashing Generative AI for Everyone thumbnail
  • January 4, 2024
  • 2 min read

Friendli Serverless Endpoints: Unleashing Generative AI for Everyone

inference
generative AI models
Groundbreaking Performance of the Friendli Engine for LLM Serving on an NVIDIA H100 GPU thumbnail
  • December 11, 2023
  • 3 min read

Groundbreaking Performance of the Friendli Engine for LLM Serving on an NVIDIA H100 GPU

LLM
NVIDIA H100
Simultaneously Serving Multiple LoRAs on a single GPU with Friendli Engine thumbnail
  • November 16, 2023
  • 3 min read

Simultaneously Serving Multiple LoRAs on a single GPU with Friendli Engine

LoRA
multi-LoRA
Faster serving of the 4-bit quantized Llama 2 70B model with fewer GPUs: Friendli Engine vs. vLLM thumbnail
  • November 7, 2023
  • 2 min read

Faster serving of the 4-bit quantized Llama 2 70B model with fewer GPUs: Friendli Engine vs. vLLM

Quantization
Large Language Models
Comparing two LLM serving frameworks: Friendli Engine vs. vLLM thumbnail
  • October 30, 2023
  • 3 min read

Comparing two LLM serving frameworks: Friendli Engine vs. vLLM

LLM
Inference
Serving
Chat Docs: A RAG Application with Friendli Engine and LangChain thumbnail
  • October 27, 2023
  • 4 min read

Chat Docs: A RAG Application with Friendli Engine and LangChain

Langchain
Large Language Models
LLM
LangChain Integration with Friendli Dedicated Endpoints thumbnail
  • October 27, 2023
  • 3 min read

LangChain Integration with Friendli Dedicated Endpoints

Langchain
Large Language Models
Model Serving
Retrieval-Augmented Generation: A Dive into Contextual AI thumbnail
  • October 26, 2023
  • 3 min read

Retrieval-Augmented Generation: A Dive into Contextual AI

Large Language Models
Model Serving
Langchain
Unlocking Efficiency of Serving LLMs with Activation-aware Weight Quantization (AWQ) on Friendli Engine thumbnail
  • October 23, 2023
  • 3 min read

Unlocking Efficiency of Serving LLMs with Activation-aware Weight Quantization (AWQ) on Friendli Engine

Quantization
Large Language Models
Transformers
Understanding Activation-Aware Weight Quantization (AWQ): Boosting Inference Serving Efficiency in LLMs thumbnail
  • October 16, 2023
  • 4 min read

Understanding Activation-Aware Weight Quantization (AWQ): Boosting Inference Serving Efficiency in LLMs

Quantization
Large Language Models
Transformers
Iteration batching (a.k.a. continuous batching) to increase LLM inference serving throughput thumbnail
  • September 27, 2023
  • 2 min read

Iteration batching (a.k.a. continuous batching) to increase LLM inference serving throughput

Llm
Llm Serving
Generative AI Tools
Accelerating LLM Training with Memory-Balanced Pipeline Parallelism thumbnail
  • July 13, 2023
  • 5 min read

Accelerating LLM Training with Memory-Balanced Pipeline Parallelism

Large Language Models
Transformers
Distributed Systems
Friendli Engine's Enriched Coverage for Sought-After LLMs: MPT, LLaMA, and Dolly thumbnail
  • July 3, 2023
  • 2 min read

Friendli Engine's Enriched Coverage for Sought-After LLMs: MPT, LLaMA, and Dolly

Transformers
Generative Model
Large Model
Get an Extra Speedup of LLM Inference with Integer Quantization on Friendli Engine thumbnail
  • June 27, 2023
  • 3 min read

Get an Extra Speedup of LLM Inference with Integer Quantization on Friendli Engine

Quantization
Transformers
Generative Model
Fine-tuning and Serving CodeGen, a Code Generation Model, with Friendli Engine thumbnail
  • January 17, 2023
  • 3 min read

Fine-tuning and Serving CodeGen, a Code Generation Model, with Friendli Engine

Codegen
Mlops
Transformers
Save on Training Costs of Generative AI with Friendli Traning thumbnail
  • November 1, 2022
  • 1 min read

Save on Training Costs of Generative AI with Friendli Traning

Machine Learning
AI
VC
Serve generative AI models like T5 faster than ever with Friendli Engine (32.8x faster for T5–3B) thumbnail
  • October 8, 2022
  • 2 min read

Serve generative AI models like T5 faster than ever with Friendli Engine (32.8x faster for T5–3B)

Generative AI
Transformers
Mlops
Friendli Engine: How Good is it on Small Models? thumbnail
  • August 4, 2022
  • 2 min read

Friendli Engine: How Good is it on Small Models?

Machine Learning
Transformers
Generative Model
Friendli Engine: How to Serve Large-scale Transformer Models thumbnail
  • July 18, 2022
  • 7 min read

Friendli Engine: How to Serve Large-scale Transformer Models

AI
Machine Learning
System Architecture
Introducing GPT-FAI 13B: A Large-scale Language Model Trained with FriendliAI’s Friendli Training thumbnail
  • May 20, 2022
  • 3 min read

Introducing GPT-FAI 13B: A Large-scale Language Model Trained with FriendliAI’s Friendli Training

Gpt 3
Mlops
Mlops Platform