Introduction

This guide explores the steps to serve Mixture of Experts (MoE) models such as Mixtral 8x7B using Friendli Container.

Search Optimal Policy and Running Friendli Container

To serve MoE models efficiently, it is required to run a policy search to explore the optimal execution policy. Learn how to run the policy search at Running Policy Search. When the optimal policy is successfully searched, the policy is compiled into a policy file, which can be used for creating serving endpoints. And the engine starts to serve the endpoint using the optimal policy.