Latest post
- July 24, 2024
- 4 min read
Llama 3 70B Outperforms GPT-4o in Function Calling with Friendli Tools
Read full article![Llama 3 70B Outperforms GPT-4o in Function Calling with Friendli Tools thumbnail](/images/0fa8cf-thumbnail.png)
- July 24, 2024
- 4 min read
Llama 3 70B Outperforms GPT-4o in Function Calling with Friendli Tools
Read full article![Building AI Agents Using Function Calling with LLMs thumbnail](/images/06c3d3-thumbnail.png)
- July 22, 2024
- 11 min read
Building AI Agents Using Function Calling with LLMs
Function calling
AI agents
![Function Calling: Connecting LLMs with Functions and APIs thumbnail](/images/1d68d8-thumbnail.png)
- July 18, 2024
- 6 min read
Function Calling: Connecting LLMs with Functions and APIs
Function calling
LLMs
![Showcasing FriendliAI’s Integration with LiteLLM thumbnail](/images/85357a-thumbnail.png)
- July 12, 2024
- 7 min read
Showcasing FriendliAI’s Integration with LiteLLM
LiteLLM
![Building AI-Powered Web Applications In 20 Minutes with FriendliAI, Vercel AI SDK, and Next.js thumbnail](/images/e0b63a-thumbnail.png)
- June 27, 2024
- 11 min read
Building AI-Powered Web Applications In 20 Minutes with FriendliAI, Vercel AI SDK, and Next.js
Vercel
Vercel AI SDK
Next.js
![Level Up Your Client-Side Interactions with Friendli's gRPC Support thumbnail](/images/dcf30d-thumbnail.jpg)
- June 25, 2024
- 3 min read
Level Up Your Client-Side Interactions with Friendli's gRPC Support
gRPC
![Deploying Weights & Biases Model Checkpoints on Friendli Dedicated Endpoints thumbnail](/images/88125d-thumbnail.jpg)
- June 20, 2024
- 4 min read
Deploying Weights & Biases Model Checkpoints on Friendli Dedicated Endpoints
Weights & Biases
![Deploying Your Inference Endpoints on AWS Sagemaker with Friendli Container thumbnail](/images/2995d8-thumbnail.jpg)
- June 12, 2024
- 5 min read
Deploying Your Inference Endpoints on AWS Sagemaker with Friendli Container
AWS
Sagemaker
Container
![Introducing Structured Output on Friendli Engine for Building LLM Agents thumbnail](/images/37654e-thumbnail.jpg)
- June 10, 2024
- 8 min read
Introducing Structured Output on Friendli Engine for Building LLM Agents
LLM agents
Friendli Engine
![Measuring LLM Serving Performance with LLMServingPerfEvaluator thumbnail](/images/a77d5e-thumbnail.jpg)
- May 22, 2024
- 11 min read
Measuring LLM Serving Performance with LLMServingPerfEvaluator
LLM
Serving
![Building a RAG Chatbot with Friendli, MongoDB Atlas, and LangChain thumbnail](/images/6c16e1-thumbnail.jpeg)
- May 3, 2024
- 5 min read
Building a RAG Chatbot with Friendli, MongoDB Atlas, and LangChain
RAG
LangChain
MongoDB
![Meta Llama 3 now available on Friendli thumbnail](/images/530bcb-thumbnail.jpeg)
- April 29, 2024
- 3 min read
Meta Llama 3 now available on Friendli
Llama3
Meta
![Easily Migrating LLM Inference Serving from vLLM to Friendli Container thumbnail](/images/c5cf76-thumbnail.jpeg)
- April 12, 2024
- 3 min read
Easily Migrating LLM Inference Serving from vLLM to Friendli Container
vLLM
Friendli Container
Serving
![Building Your RAG Application on LlamaIndex with Friendli Engine: A Step-by-Step Guide thumbnail](/images/667b26-thumbnail.jpeg)
- April 8, 2024
- 3 min read
Building Your RAG Application on LlamaIndex with Friendli Engine: A Step-by-Step Guide
RAG
LlamaIndex
![Improve Latency and Throughput with Weight-Activation Quantization in FP8 thumbnail](/images/702b48-thumbnail.jpeg)
- April 3, 2024
- 6 min read
Improve Latency and Throughput with Weight-Activation Quantization in FP8
WAQ
FP8
![Running Quantized Mixtral 8x7B on a Single GPU thumbnail](/images/17509f-thumbnail.jpeg)
- February 28, 2024
- 3 min read
Running Quantized Mixtral 8x7B on a Single GPU
Mixtral
AWQ
![Serving Performances of Mixtral 8x7B, a Mixture of Experts (MoE) Model thumbnail](/images/4bc0a3-thumbnail.jpeg)
- February 20, 2024
- 4 min read
Serving Performances of Mixtral 8x7B, a Mixture of Experts (MoE) Model
Mixtral
MoE
![Which Quantization to Use to Reduce the Size of LLMs? thumbnail](/images/3fda89-thumbnail.jpeg)
- February 15, 2024
- 4 min read
Which Quantization to Use to Reduce the Size of LLMs?
AWQ
Quantization
LLM
![Friendli TCache: Optimizing LLM Serving by Reusing Computations thumbnail](/images/99f820-thumbnail.jpeg)
- February 7, 2024
- 2 min read
Friendli TCache: Optimizing LLM Serving by Reusing Computations
LLM
Serving
![Grouped Query Attention (GQA) vs. Multi Head Attention (MHA): Optimizing LLM Inference Serving thumbnail](/images/d3da1b-thumbnail.jpeg)
- February 2, 2024
- 4 min read
Grouped Query Attention (GQA) vs. Multi Head Attention (MHA): Optimizing LLM Inference Serving
GQA
MHA
MQA
![Faster and Cheaper Mixtral 8×7B on Friendli Serverless Endpoints thumbnail](/images/630c1d-thumbnail.jpeg)
- January 24, 2024
- 3 min read
Faster and Cheaper Mixtral 8×7B on Friendli Serverless Endpoints
LLM
Serving
![The LLM Serving Engine Showdown: Friendli Engine Outshines thumbnail](/images/df2ca8-thumbnail.jpeg)
- January 12, 2024
- 3 min read
The LLM Serving Engine Showdown: Friendli Engine Outshines
LLM
Serving Engine
![Friendli Serverless Endpoints: Unleashing Generative AI for Everyone thumbnail](/images/aacd7c-thumbnail.jpeg)
- January 4, 2024
- 2 min read
Friendli Serverless Endpoints: Unleashing Generative AI for Everyone
inference
generative AI models
![Groundbreaking Performance of the Friendli Engine for LLM Serving on an NVIDIA H100 GPU thumbnail](/images/8c6428-thumbnail.jpeg)
- December 11, 2023
- 3 min read
Groundbreaking Performance of the Friendli Engine for LLM Serving on an NVIDIA H100 GPU
LLM
NVIDIA H100
![Simultaneously Serving Multiple LoRAs on a single GPU with Friendli Engine thumbnail](/images/c51924-thumbnail.jpeg)
- November 16, 2023
- 3 min read
Simultaneously Serving Multiple LoRAs on a single GPU with Friendli Engine
LoRA
multi-LoRA
![Faster serving of the 4-bit quantized Llama 2 70B model with fewer GPUs: Friendli Engine vs. vLLM thumbnail](/images/fa66a5-thumbnail.jpeg)
- November 7, 2023
- 2 min read
Faster serving of the 4-bit quantized Llama 2 70B model with fewer GPUs: Friendli Engine vs. vLLM
Quantization
Large Language Models
![Comparing two LLM serving frameworks: Friendli Engine vs. vLLM thumbnail](/images/e4b1cc-thumbnail.jpeg)
- October 30, 2023
- 3 min read
Comparing two LLM serving frameworks: Friendli Engine vs. vLLM
LLM
Inference
Serving
![Chat Docs: A RAG Application with Friendli Engine and LangChain thumbnail](/images/dbdfda-thumbnail.jpeg)
- October 27, 2023
- 4 min read
Chat Docs: A RAG Application with Friendli Engine and LangChain
Langchain
Large Language Models
LLM
![LangChain Integration with Friendli Dedicated Endpoints thumbnail](/images/2dfb33-thumbnail.jpeg)
- October 27, 2023
- 3 min read
LangChain Integration with Friendli Dedicated Endpoints
Langchain
Large Language Models
Model Serving
![Retrieval-Augmented Generation: A Dive into Contextual AI thumbnail](/images/207382-thumbnail.jpeg)
- October 26, 2023
- 3 min read
Retrieval-Augmented Generation: A Dive into Contextual AI
Large Language Models
Model Serving
Langchain
![Unlocking Efficiency of Serving LLMs with Activation-aware Weight Quantization (AWQ) on Friendli Engine thumbnail](/images/f8bfb9-thumbnail.jpeg)
- October 23, 2023
- 3 min read
Unlocking Efficiency of Serving LLMs with Activation-aware Weight Quantization (AWQ) on Friendli Engine
Quantization
Large Language Models
Transformers
![Understanding Activation-Aware Weight Quantization (AWQ): Boosting Inference Serving Efficiency in LLMs thumbnail](/images/638109-thumbnail.jpeg)
- October 16, 2023
- 4 min read
Understanding Activation-Aware Weight Quantization (AWQ): Boosting Inference Serving Efficiency in LLMs
Quantization
Large Language Models
Transformers
![Iteration batching (a.k.a. continuous batching) to increase LLM inference serving throughput thumbnail](/images/22006d-thumbnail.jpeg)
- September 27, 2023
- 2 min read
Iteration batching (a.k.a. continuous batching) to increase LLM inference serving throughput
Llm
Llm Serving
Generative AI Tools
![Accelerating LLM Training with Memory-Balanced Pipeline Parallelism thumbnail](/images/e40a44-thumbnail.jpeg)
- July 13, 2023
- 5 min read
Accelerating LLM Training with Memory-Balanced Pipeline Parallelism
Large Language Models
Transformers
Distributed Systems
![Friendli Engine's Enriched Coverage for Sought-After LLMs: MPT, LLaMA, and Dolly thumbnail](/images/6d7612-thumbnail.jpeg)
- July 3, 2023
- 2 min read
Friendli Engine's Enriched Coverage for Sought-After LLMs: MPT, LLaMA, and Dolly
Transformers
Generative Model
Large Model
![Get an Extra Speedup of LLM Inference with Integer Quantization on Friendli Engine thumbnail](/images/8ac6e0-thumbnail.jpeg)
- June 27, 2023
- 3 min read
Get an Extra Speedup of LLM Inference with Integer Quantization on Friendli Engine
Quantization
Transformers
Generative Model
![Fine-tuning and Serving CodeGen, a Code Generation Model, with Friendli Engine thumbnail](/images/5f1b45-thumbnail.jpeg)
- January 17, 2023
- 3 min read
Fine-tuning and Serving CodeGen, a Code Generation Model, with Friendli Engine
Codegen
Mlops
Transformers
![Save on Training Costs of Generative AI with Friendli Traning thumbnail](/images/a49019-thumbnail.jpeg)
- November 1, 2022
- 1 min read
Save on Training Costs of Generative AI with Friendli Traning
Machine Learning
AI
VC
![Serve generative AI models like T5 faster than ever with Friendli Engine (32.8x faster for T5–3B) thumbnail](/images/dd182a-thumbnail.jpeg)
- October 8, 2022
- 2 min read
Serve generative AI models like T5 faster than ever with Friendli Engine (32.8x faster for T5–3B)
Generative AI
Transformers
Mlops
![Friendli Engine: How Good is it on Small Models? thumbnail](/images/8934b3-thumbnail.jpeg)
- August 4, 2022
- 2 min read
Friendli Engine: How Good is it on Small Models?
Machine Learning
Transformers
Generative Model
![Friendli Engine: How to Serve Large-scale Transformer Models thumbnail](/images/81801a-thumbnail.jpeg)
- July 18, 2022
- 7 min read
Friendli Engine: How to Serve Large-scale Transformer Models
AI
Machine Learning
System Architecture
![Introducing GPT-FAI 13B: A Large-scale Language Model Trained with FriendliAI’s Friendli Training thumbnail](/images/dbee9b-thumbnail.jpeg)
- May 20, 2022
- 3 min read
Introducing GPT-FAI 13B: A Large-scale Language Model Trained with FriendliAI’s Friendli Training
Gpt 3
Mlops
Mlops Platform