(function() { var utmInheritingDomain = "appstore.com", utmRegExp = /(&|\?)utm_[A-Za-z]+=[A-Za-z0-9]+/gi, links = document.getElementsByTagName("a"), utms = [ "utm_medium={{URL - utm_medium}}", "utm_source={{URL - utm_source}}", "utm_campaign={{URL - utm_campaign}}" ]; for (var index = 0; index < links.length; index += 1) { var tempLink = links[index].href, tempParts; if (tempLink.indexOf(utmInheritingDomain) > 0) { tempLink = tempLink.replace(utmRegExp, ""); tempParts = tempLink.split("#"); if (tempParts[0].indexOf("?") < 0 ) { tempParts[0] += "?" + utms.join("&"); } else { tempParts[0] += "&" + utms.join("&"); } tempLink = tempParts.join("#"); } links[index].href = tempLink; } }());
  • June 12, 2024
  • 2 min read

Deploying Your Inference Endpoints on AWS Sagemaker with Friendli Container

Deploying Your Inference Endpoints on AWS Sagemaker with Friendli Container thumbnail

This blog post will guide you through creating an Amazon SageMaker Model from model artifacts stored in an S3 bucket and leveraging Friendli ECR container images. We'll then configure and deploy this model as a SageMaker Endpoint for real-time inference. You could then use this Endpoint to invoke your models and receive generative AI inference responses.

By utilizing Friendli Containers in your SageMaker pipeline, you'll benefit from the Friendli Engine's speed and resource efficiency. We'll explore how to create inference endpoints using both the AWS Console and the boto3 Python SDK.

The General Workflow

  1. Create a Model: Within SageMaker Inference, define a new model by specifying the model artifacts in your S3 bucket and the Friendli container image from ECR.
  2. Configure the Endpoint: Create a SageMaker Inference endpoint configuration by selecting the instance type and the number of instances required.
  3. Create the Endpoint: Utilize the configured settings to launch a SageMaker Inference endpoint.
  4. Invoke the Endpoint: Once deployed, send requests to your endpoint to receive inference responses.

Prerequisite

Before beginning, you need to push the Friendli Container image to an ECR repository on AWS. First, prepare the Friendli Container image by following the instructions in "Pulling Friendli Container Image." Then, tag and push the image to the Amazon ECR repository as guided in "Pushing a Docker image to an Amazon ECR private repository."

Using the AWS Console

Let's delve into the step-by-step instructions for creating an inference endpoint using the AWS Console.

Step 1: Creating a Model

  1. You can start creating model by clicking on the “Create model” button under SageMaker > Inference > Models

  1. Configure the model

Step 2: Creating an Endpoint Configuration

Step 3: Creating SageMaker Inference Endpoint

  1. You can start by clicking the “Create endpoint” button under Sagemaker > Inference > Endpoints.
  2. Select “Use an existing endpoint configuration”.
  3. Select the endpoint configuration created in Step 2.
  4. Finish by clicking on the “Create endpoint” button.

Step 4: Invoking Endpoint

When the endpoint status becomes “In Service”, you can invoke the endpoint with the following script, after filling in the endpoint name and the region name:

python
import boto3
import json
import time

endpoint_name = "FILL OUT ENDPOINT NAME"
region_name = "FILL OUT AWS REGION"

sagemaker_runtime = boto3.client("sagemaker-runtime", region_name=region_name)

prompt = "Story title: 3 llamas go for a walk\nSummary: The 3 llamas crossed a bridge and something unexpected happened\n\nOnce upon a time"
payload = {
"prompt": prompt,
"max_tokens": 512,
"temperature": 0.8,
}

response = sagemaker_runtime.invoke_endpoint(
EndpointName=endpoint_name,
Body=json.dumps(payload),
ContentType="application/json",
)

print(response['Body'].read().decode('utf-8'))

Using the boto3 SDK

Next, let’s discover the process for creating a Sagemaker endpoint using the boto3 Python SDK. You can achieve this by using the code snippet below. Be sure to fill in the custom fields, customized for your specific use case:

python
import boto3
from sagemaker import get_execution_role

sm_client = boto3.client(service_name='sagemaker')
runtime_sm_client = boto3.client(service_name='sagemaker-runtime')

account_id = boto3.client('sts').get_caller_identity()['Account']
region = boto3.Session().region_name
role = get_execution_role()

endpoint_name="FILL OUT ENDPOINT NAME"

model_name="FILL OUT MODEL NAME"
container = "FILL OUT ECR IMAGE NAME"  # <ACCOUNT>.dkr.ecr.<REGION>.amazonaws.com/IMAGE
instance_type = "ml.g5.12xlarge" # instance type


container = {
'Image': container,
'Environment': {

You can invoke this endpoint by following Step 4 described in the “Using the AWS Console” section. By following these guides, you'll be able to seamlessly deploy your models using Friendli Containers on SageMaker endpoints and leverage their capabilities for real-time inference.

Learn more about Friendli Container and the Friendli Engine on our website!


Written by

FriendliAI logo

FriendliAI Tech & Research


Share