I've trained more ML models than I've deployed. There's something comforting about the local loop—model.fit()
, model.evaluate()
, hitting 94% accuracy, then staring at the screen wondering, "Okay, how do I make this actually useful?"
If you're stuck there right now, this guide will help.
Note: I wrote this based on AWS documentation and standard SageMaker patterns. If you try it, drop a comment about what worked (or broke).
model.pkl
(or .joblib
)requirements.txt
with your dependenciesaws configure
)
import joblib
joblib.dump(model, 'model.pkl')
Create a requirements.txt
file:
sklearn==1.2.0
pandas==1.5.0
numpy==1.23.0`
Keep both files in the same folder.
import boto3
s3 = boto3.client('s3')
bucket_name = 'my-unique-ml-bucket-12345' # Make this unique
s3.create_bucket(Bucket=bucket_name, CreateBucketConfiguration={
'LocationConstraint': 'us-east-1'
})
s3.upload_file('model.pkl', bucket_name, 'models/model.pkl')
s3.upload_file('requirements.txt', bucket_name, 'models/requirements.txt')
model_s3_path = f's3://{bucket_name}/models/model.pkl'
Save this as inference.py
:
import json
import joblib
import numpy as np
import os
model = None
def model_fn(model_dir):
return joblib.load(os.path.join(model_dir, 'model.pkl'))
def input_fn(input_data, content_type):
if content_type == 'application/json':
data = json.loads(input_data)
return np.array(data['features'])
raise ValueError(f"Unsupported content type: {content_type}")
def predict_fn(input_data, model):
return model.predict(input_data)
def output_fn(prediction, content_type):
return json.dumps({'predictions': prediction.tolist()})
These four functions are what SageMaker calls when someone hits your endpoint.
Run this in a Python script:
from sagemaker.sklearn.model import SKLearnModel
from sagemaker import get_execution_role
sklearn_model = SKLearnModel(
model_data=model_s3_path,
role=get_execution_role(),
instance_type='ml.m5.large',
entry_point='inference.py',
py_version='py3'
)
sklearn_model.deploy(
initial_instance_count=1,
instance_type='ml.m5.large',
endpoint_name='my-model-endpoint'
)
This takes 5–10 minutes. You'll see Creating
→ In Service
.
import boto3
import json
runtime = boto3.client('sagemaker-runtime')
response = runtime.invoke_endpoint(
EndpointName='my-model-endpoint',
ContentType='application/json',
Body=json.dumps({'features': [[5.1, 3.5, 1.4, 0.2]]})
)
result = json.loads(response['Body'].read().decode())
print(result)
If you see {'predictions': [...]}
, it worked.
Endpoints cost money even when idle:
aws sagemaker delete-endpoint --endpoint-name my-model-endpoint
aws sagemaker delete-endpoint-config --endpoint-config-name my-model-endpoint
Error |
Fix |
|---|---|
NoCredentialsError |
Run aws configure again |
InvalidRoleException |
IAM role needs S3 + SageMaker permissions |
ModelError |
Check inference.py for missing imports |
Endpoint stuck on Creating
|
Wait 5–10 more minutes |
Your IAM role needs:
s3:GetObject
, s3:PutObject
sagemaker:CreateModel
, sagemaker:CreateEndpoint
Resource |
Cost |
|---|---|
ml.m5.large |
~$0.20/hour (~$6/month if 24/7) |
| S3 storage | ~$0.02/GB/month |
Delete when not using. I've seen $50 surprises from idle endpoints.
If you're following this, check:
pip show boto3 sagemaker
os
, joblib
, numpy
are installedIf something breaks, comment below with the error. I'll update this guide.
Deploying ML feels intimidating until you do it once. SageMaker handles most of the complexity. You just upload your model to S3, point SageMaker at it, and deploy.
I've trained models that sat on my laptop for months because I didn't know how to deploy them. Now I tell people: "Just run this script, it's not that hard."
If you're building something with this, drop a comment. I love seeing what people deploy.