Skip to main content

Introduction

This guide explores the steps to serve Mixture of Experts (MoE) models such as Mixtral 8x7B using Friendli Container.

Search optimal policy and run Friendli Container

To serve MoE models efficiently, you need to run a policy search to find the optimal execution policy. Learn how to run the policy search at Running Policy Search. Once the search finds an optimal policy, it compiles the policy into a file that you can use to create serving endpoints. The engine then serves the endpoint using the optimal policy.