- March 11, 2025
- 9 min read
The Complete Guide to Friendli Container AWS EKS Add-On

Maximize Your Gen AI Inference with Friendli Container on AWS EKS
Are you an enterprise using AWS and looking to optimize your generative AI inference at scale? Look no further than the Friendli Container AWS EKS Add-On. Adding this add-on instantly integrates Friendli Container into your Amazon EKS workflow, with the convenience of consolidated AWS billing. It unlocks reduced inference costs, faster scaling, and improved throughput for your workloads.
Read on to discover how easy it is to set up and unleash the full potential of this powerful microservice for your enterprise.
Friendli Container: The Ultimate Inference Supercharger
The Friendli Container is a Docker image designed to bring our cutting-edge Friendli Inference solution into your environment. It provides a microservice-based container that incorporates key optimizations from our managed service, allowing you to leverage the fastest AI inference engine in the market, tailored to work seamlessly within your setup. While it doesn’t include all the optimizations from our managed service, it brings some of the important performance-enhancing features for high-performance inference in your infrastructure.
Optimized to reduce latency, minimize GPU usage, and maximize cost-efficiency, the Friendli Container provides scalable, isolated environments for AI model deployment, helping you achieve superior performance.
- Over 50% reduction in GPU usage
- Over 2x lower latency
- Over 2x higher throughput
While Friendli Container unlocks unprecedented power, to truly harness its full capabilities, it's essential to have a supporting infrastructure that efficiently manages GPU resources and orchestrates operations.
Amazon EKS: Simplifying Kubernetes Operations
Kubernetes (K8S) is the de facto industry standard for managing containerized applications, enabling businesses to deploy, scale, and manage workloads across environments. With its powerful features like automated scaling, load balancing, and self-healing, Kubernetes simplifies the management of complex applications at scale. However, managing Kubernetes efficiently requires deep understanding and expertise. This is where Amazon EKS steps in.
Amazon EKS is a fully managed service that simplifies the deployment, management, and scaling of containerized applications using Kubernetes on AWS. EKS eliminates the complexity of Kubernetes cluster management, offering a secure, scalable, and highly available platform for running containerized workloads. Moreover, it integrates seamlessly with other AWS services, providing a comprehensive solution for orchestrating containers in the cloud.
Thus, many organizations have adopted Amazon EKS for scalable generative AI inference, including:
- Adobe, a leading digital creativity SaaS company, built its generative AI solution, Adobe Firefly, using Amazon EKS.
- Mobileye, an autonomous driving technology company, leverages Amazon EKS for computer vision and AI applications.
- Omi, a startup providing AI-powered 3D rendering solutions, utilizes Amazon EKS to fuel its generative AI models.
Key Benefits of Amazon EKS:
- Fully Managed Kubernetes: AWS takes care of the Kubernetes control plane, removing the need for manual setup and maintenance. This allows teams to focus on applications rather than infrastructure.
- Seamless AWS Integration: EKS integrates smoothly with AWS services like EC2, IAM, S3, and CloudWatch, enabling you to easily enhance your applications with the full range of AWS features.
- Scalability and Flexibility: EKS automatically scales your cluster and workloads based on demand. It supports running applications across multiple AWS Availability Zones, ensuring high availability and resilience.
- Enhanced Security: EKS benefits from AWS's security infrastructure, offering built-in encryption, IAM roles, and network policies to control access and protect your applications.
Figure 1: A Kubernetes cluster in action. Reference: Amazon EKS. [Online] Available: https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-concepts.html [Accessed Feb. 26, 2025]
In short, AWS EKS simplifies Kubernetes management, letting you focus on what matters most — building great applications.
Why Deploy Friendli Container on AWS EKS?
If you’re looking to slash costs and boost performance immediately, deploying Friendli Container as an EKS add-on lets you do just that—right within your existing EKS workflow. Here's how:
- Instant Cost Savings: Friendli Container leverages proprietary technologies to reduce inference GPU costs by over 50%, maximizing ROI and delivering exceptional performance.
- Streamlined Billing: Simplify your accounting with consolidated billing. All AWS services, including the Friendli Container add-on, are grouped into a single invoice for easy tracking and budgeting.
- Effortless Subscription: AWS handles subscriptions for you, ensuring minimal administrative overhead.
- Automated Updates: Regular updates to Friendli Container add-on are automatically applied to keep your system secure and optimized, eliminating the need for manual intervention.
By deploying Friendli Container on AWS EKS, you can quickly and easily enhance your Generative AI workflows with a secure, scalable, and cost-efficient platform that ensures immediate cost savings.
How to Use Friendli Container on AWS EKS
We will walk you through setting up an EKS cluster, deploying Friendli Container, and provide the expected output for each step. By following these steps, you will have a working inference service successfully deployed on your EKS cluster.
1. Prerequisite: Add GPU Node Group to your EKS Cluster
Before proceeding, ensure you have an active AWS EKS cluster. If you haven't created one yet, please follow the AWS EKS documentation to set up your EKS cluster.
If you have already added a GPU Node Group to your EKS cluster, you can skip this part.
Click to expand how to add GPU Node Group to your EKS cluster
When selecting the AWS region for your new EKS cluster, the availability of GPU instances is one of the key factors to consider. As of February 2025, Friendli Container supports NVIDIA H100, A100, A10G, and L4 devices. You can check the instance availability here.
NVIDIA Device | AWS EC2 Instance Type |
---|---|
H100 | p5.48xlarge |
A100 | p4d.24xlarge |
A10G | g5.xlarge g5.2xlarge g5.4xlarge g5.8xlarge g5.16xlarge g5.12xlarge g5.24xlarge g5.48xlarge |
L4 | g6.xlarge g6.2xlarge g6.4xlarge g6.8xlarge g6.16xlarge gr6.4xlarge gr6.8xlarge g6.12xlarge g6.24xlarge g6.48xlarge |
If you’re going to use multi-GPU VM instance types, installing the NVIDIA GPU Operator is highly recommended for proper resource management. You can consult the guide from NVIDIA GPU Operator, and an example of installing a GPU operator using helm can be found here.
Now let’s add GPU Node Group to your EKS cluster.
- Open Amazon EKS console and choose the cluster that you want to create a node group in.
- Select the “Compute” tab and click “Add node group”.
- Configure the new node group by entering the name, Node IAM role, and other information. You can click “Create recommended role” to create IAM role. Click “Next”.
- On the next page, select “Amazon Linux 2023 (x86_64) Nvidia” for AMI type.
- Select the appropriate instance type for the GPU device of your choice.
- Configure the disk size. It should be large enough to download the model you want to deploy. (To execute the example in this guide, it is recommended to set the size of the disk to 60GB.)
- Configure the desired node group size.
- Go through the rest of the steps, review the changes and click “Create”.
2. Configure Friendli Container EKS add-on
- Open Amazon EKS console and choose the cluster that you want to configure.
- Select the “Add-ons” tab and click “Get more add-ons”.
- Scroll down and under the section “AWS Marketplace add-ons”, search and check “Friendli Container”, and click “Next”.
- Now you’ll need an active subscription to Friendli Container. The number of license units you need to purchase is determined by the number of GPU devices you want to use for running Friendli Container.
- Click “Next”, Review your settings, and click “Create”
For the details of the pricing, check Friendli Container on AWS Marketplace. For trials, custom offers, and inquiries, please visit here for contacts.
Now you need to allow Kubernetes ServiceAccounts to contact AWS License Manager, so that your Friendli Inference Deployments can be activated properly.
Before you continue, please make sure “Amazon EKS Pod Identity Agent” EKS add-on is installed in your cluster. You can click “Get more add-ons” and enable “Amazon EKS Pod Identity Agent” under the “AWS add-ons” section.
- Open Amazon EKS console and choose the cluster that you want to configure.
- Select the “Access” tab.
- Under the “Pod Identity associations” section, click “Create”.
“Create Pod Identity association” page will appear. Now let’s configure the IAM role, Kubernetes namespace, and Kubernetes service account.
- IAM Role
- Click “Create recommended role”.
- In step 1 (Select trusted entity), “EKS - Pod Identity” should be selected for the use case. Leave it as is and click “Next”.
- In step 2 (Add permissions), search for “AWSLicenseManagerConsumptionPolicy” and enable it. Click “Next”.
- In step 3 (Name, review, and create), give the appropriate Role name and click “Create”.
- Go back to the “Create Pod Identity association” page and select the IAM role you just created.
- Kubernetes namespace.
- This is the Kubernetes namespace where you want to create Friendli Inference Deployments. When in doubt, you can use “default”.
- Later on, if you are going to create Friendli Inference Deployments in another namespace, you should create the Pod Identity association for that namespace.
- Kubernetes service account.
- For most cases, this should be “default”.
- Later on, if you are going to configure Friendli Inference Deployments to use custom service accounts, you should create the Pod Identity association for that service account.
Click “Create”, then under the “Pod Identity associations” section, you should be able to see the association you just created.
3. Create Friendli Deployment
You need to be able to use the “kubectl” CLI tool to access your EKS cluster. Consult this guide from AWS for more details.
To deploy a private or gated model in the HuggingFace model hub, you need to create a HuggingFace access token with “read” permission. Then create a Kubernetes secret.
shellkubectl create secret generic hf-secret --from-literal token=YOUR_TOKEN_HERE
FriendliDeployment is Kubernetes custom resource that lets you easily create Friendli Inference Deployments without configuring Kubernetes low-level resources like pods, services, and deployments.
Below is a sample FriendliDeployment to deploy Meta Llama 3.1 8b on one g6.2xlarge instance.
yamlapiVersion: friendli.ai/v1alpha1 kind: FriendliDeployment metadata: namespace: default name: friendlideployment-sample spec: model: huggingFace: repository: meta-llama/Llama-3.1-8B-Instruct # "token:" section is not needed if the model is # a public one. token: name: hf-secret key: token resources: nodeSelector: # Use the name of the node group you want to use. eks.amazonaws.com/nodegroup: gpu-l4 numGPUs: 1 requests: cpu: '6' ephemeral-storage: 30Gi memory: 25Gi limits: cpu: '6' ephemeral-storage: 30Gi memory: 25Gi deploymentStrategy: type: RollingUpdate rollingUpdate: maxSurge: 0 maxUnavailable: 1 service: inferencePort: 6000
You can modify this YAML file for your use case.
- The “token:” section under spec.model.huggingFace refers to the Kubernetes secret you created for storing the HuggingFace access token. If accessing your model does not require an access token, you can omit the “token:” section entirely.
- In the example above, nodeSelector is “eks.amazonaws.com/nodegroup: gpu-l4”. This assumes that the name of the GPU node group is “gpu-l4”. You need to edit the node selector to match the name of your node group.
- CPU and memory resource requirements are adjusted to g6.2xlarge instance and you may need to edit those values if you used different instance type.
If your cluster has NVIDIA GPU Operator installed, you need to put “nvidia.com/gpu” resource in “requests:” and “limits:” section, as GPU nodes will advertise that they have “nvidia.com/gpu” resource alongside ordinary resources like “cpu” and “memory”. You can omit “numGPUs” from your FriendliDeployment. Below is the equivalent example as above for the GPU Operator-enabled cluster.
yamlresources: nodeSelector: eks.amazonaws.com/nodegroup: gpu-l4 requests: cpu: '6' ephemeral-storage: 30Gi memory: 25Gi nvidia.com/gpu: '1' limits: cpu: '6' ephemeral-storage: 30Gi memory: 25Gi nvidia.com/gpu: '1'
Save your YAML file as “friendlideployment.yaml”, and execute “kubectl apply -f friendlideployment.yaml”.
$ kubectl apply -f friendlideployment.yaml friendlideployment.friendli.ai/friendlideployment-sample created $ kubectl get pods NAME READY STATUS RESTARTS AGE friendlideployment-sample-7d7b877c77-zjgqq 2/2 Running 0 3m18s $ kubectl get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE friendlideployment-sample ClusterIP 172.20.95.224 <none> 6000/TCP 18m kubernetes ClusterIP 172.20.0.1 <none> 443/TCP 28h
Now you can port-forward to the service to connect to the service from your PC.
$ kubectl port-forward svc/friendlideployment-sample 6000 Forwarding from 127.0.0.1:6000 -> 6000 Forwarding from [::1]:6000 -> 6000
In another terminal, use the curl tool to send an inference request.
$ curl http://localhost:6000/v1/completions -H 'Content-Type: application/json' --data-raw '{"prompt": "Hi!", max_tokens: 10, stream: false}' {"choices":[{"finish_reason":"length","index":0,"seed":15349211611234757311,"text":" I'm Alex, and I'm excited to share","tokens":[358,2846,8683,11,323,358,2846,12304,311,4430]}],"id":"cmpl-b2e4b4cba711448c847ab89d763588da","object":"text_completion","usage":{"completion_tokens":10,"prompt_tokens":3,"total_tokens":13}}
For more information about Friendli Container usage, check our documentation and contact us for inquiries.
Cleanup
You can remove the FriendliDeployment using the kubectl CLI tool.
$ kubectl delete friendlideployment friendlideployment-sample friendlideployment.friendli.ai "friendlideployment-sample" deleted
You may also want to scale down or delete your GPU node group to avoid being charged for unused GPU instances.
That’s it! You’ve now learned how to effectively utilize Friendli Container on AWS EKS to optimize your LLM inference workflows. If you’d like to explore more, feel free to refer to the detailed guide here. This will help you dive deeper into the deployment process and take full advantage of the benefits AWS EKS has to offer.
Conclusion
The Friendli Container AWS EKS Add-On delivers a high-performance, scalable, and cost-effective solution for deploying AI models in production environments. By leveraging AWS EKS and Friendli Container’s powerful optimizations, you can dramatically reduce inference costs and improve throughput for AI inference workloads.
If you're looking for a completely automated, further optimized solution that goes beyond Friendli Container Amazon EKS Add-on and handles absolutely everything for you, consider exploring Friendli Dedicated Endpoints.
If you have any questions or need support, don't hesitate to reach out to us or consult our documentation.
Written by
FriendliAI Tech & Research
Share