

- August 8, 2025
- 2 min read
Introducing N-gram Speculative Decoding: Faster Inference for Structured Tasks
Speculative Decoding
Inference
Dedicated Endpoints

- August 6, 2025
- 2 min read
WBA: The Community-Driven Platform for Blind Testing the World’s Best AI Models
WBA
AI Comparison
Model Evaluation

- July 25, 2025
- 2 min read
Announcing Online Quantization: Faster, Cheaper Inference with Same Accuracy
Quantization
Inference
Dedicated Endpoints

- July 15, 2025
- 2 min read
LG AI Research Partners with FriendliAI to Launch EXAONE 4.0 for Fast, Scalable API
LG AI Research
EXAONE 4.0
Partnership

- July 1, 2025
- 5 min read
The Essential Checklist: Fix 6 Common Errors When Sharing Models on Hugging Face
Hugging Face
Models

- June 5, 2025
- 3 min read
One Click from W&B to FriendliAI: Deploy Models as Live Endpoints
Weights & Biases
W&B
AI DevOps

- May 15, 2025
- 4 min read
Cut Latency for Image & Video AI Models : A guide to Multimodal Caching
Multimodal
Inference
Optimization

- May 14, 2025
- 3 min read
Explore 370K+ AI Models on FriendliAI's Models Page
Multimodal Models
Model Deployment
Hugging Face Integration

- May 2, 2025
- 2 min read
How to Use Hugging Face Multi-LoRA Adapters
Multi-LoRa
LoRA Adapter
Hugging Face LoRA