To serve MoE models efficiently, you need to run a policy search to find the optimal execution policy.
Learn how to run the policy search at Running Policy Search.
Once the search finds an optimal policy, it compiles the policy into a file that you can use to create serving endpoints.
The engine then serves the endpoint using the optimal policy.