End-to-end encrypted ML inference with Amazon SageMaker AI and FHE Amazon has introduced a method for running machine learning inference on Amazon SageMaker AI using fully homomorphic encryption (FHE), which keeps data encrypted throughout the entire process—including queries, responses, and intermediate values. The approach, detailed in a new blog post, allows organizations in healthcare, energy, and telecommunications to process sensitive data in the cloud without exposing it to third parties, including the cloud provider itself. Artificial Intelligence https://aws.amazon.com/blogs/machine-learning/ End-to-end encrypted ML inference with Amazon SageMaker AI and FHE Machine learning ML inference often requires processing sensitive data—medical records, proprietary business information, or personal communications. What if you could run ML inference in the cloud while hiding your data from the cloud itself? More specifically, what if you could enforce that your data stayed encrypted throughout the entire ML inference process? This post will show you how to use Amazon SageMaker AI https://aws.amazon.com/sagemaker/ai/ with fully homomorphic encryption FHE to perform ML inference. Using FHE, we present an approach to ML inference that’s designed to keep queries, responses, and intermediate values encrypted and unreadable by observers—including SageMaker AI itself. FHE is a form of encryption that allows encrypted data to be processed in encrypted form without decryption. In the ML inference setting, you can use it to apply a model to an encrypted query without decryption, producing an encrypted prediction. Consider these scenarios where such a capability would provide value: Healthcare : A health insurance company wants to provide doctors with an ML model that predicts medical procedure outcomes based on diagnostic data. Publishing the model in the cloud simplifies deployment, but doctors can’t expose patient medical information to third parties due to privacy regulations. Energy sector : An oil and gas corporation uses ML to evaluate satellite photos of potential drill sites and select photos for further expert evaluation. They want to host the model in the cloud for cost savings but can’t expose photographs of politically sensitive locations to third parties. Telecommunications : A telecom operator wants to process customer emails to detect spam and phishing. They need cloud-based ML for scalability, but data protection regulations require that customer messages remain encrypted at third parties. This blog has previously discussed FHE for ML inference in the post Enable fully homomorphic encryption with Amazon SageMaker endpoints for secure, real-time inferencing https://aws.amazon.com/blogs/machine-learning/enable-fully-homomorphic-encryption-with-amazon-sagemaker-endpoints-for-secure-real-time-inferencing/ , but this post goes a little further. That previous post showed how to implement FHE-based inference ‘from scratch’ by hand-crafting a linear-regression algorithm using a low-level library called SEAL https://www.microsoft.com/en-us/research/project/microsoft-seal/ . Instead, this post shows a much more flexible and higher-level approach based on concrete-ml https://docs.zama.org/concrete-ml , a high-level library built specifically for FHE-based inference. It supports several common types of models ‘out of the box’ and is even API compatible with the well-known ML library scikit-learn. In this post, you will learn how to: - Train a concrete-ml model in SageMaker AI using a custom container - Deploy that model to a SageMaker AI inference endpoint - Create a custom client for concrete-ml inference - Use that client to make queries to your inference endpoint When finished you will have a system that uses concrete-ml in SageMaker AI designed to perform end-to-end encrypted ML inference. Solution overview Using concrete-ml in SageMaker AI works as follows: - The model owner prepares their data for training. Concrete-ml works well when all features have been normalized to the same scale, such as -1, 1 . - The model owner uses this data to train an FHE-enabled version of their model. This model is designed to perform computations over encrypted data instead of plaintext. - The model owner hosts this model in SageMaker AI. - Clients encrypt their queries using the FHE scheme supported by the model. - Clients send encrypted queries to the FHE-enabled model in the cloud. - The model transforms the encrypted query into an encrypted prediction without decrypting values during the FHE computation. - The model returns the encrypted response to the client, who decrypts it to retrieve the prediction. This differs from, and complements, confidential computing environments like those provided by the Amazon Web Services AWS Nitro System https://aws.amazon.com/ec2/nitro/ in Amazon Elastic Compute Cloud Amazon EC2 https://aws.amazon.com/ec2/ . With AWS Nitro Enclaves, queries are decrypted and processed in plaintext within hardened, isolated environments that provide CPU and memory isolation. With FHE, queries remain encrypted throughout; security relies on mathematics rather than hardware or software. Prerequisites To implement this solution, you need: - A local development environment with Python https://www.python.org/ 3.12 installed, the ability to install packages using pip https://pip.pypa.io/en/stable/ , and Docker https://www.docker.com/ or other container-building software installed locally. In addition, these instructions will recommend that you work in virtual environments https://virtualenv.pypa.io/en/latest/ , but this isn’t strictly necessary. - An AWS account, containing: - Repositories in Amazon Elastic Container Registry Amazon ECR https://aws.amazon.com/ecr/ to hold the images for training and inference containers, - Locations in Amazon Simple Storage Service Amazon S3 https://aws.amazon.com/s3/ to hold:- The model - The training code if you wish it to be stored in a separate bucket from the model - Keys and ciphertexts - Repositories in We suggest you follow the security best practices for Amazon S3 https://docs.aws.amazon.com/AmazonS3/latest/userguide/security-best-practices.html . - Roles in AWS Identity and Access Management IAM for - The model creator - The inference endpoint creator - The inference endpoint itself - The clients Find IAM policies for these roles, along with a worked example for the MNIST corpus of handwritten digits, https://www.kaggle.com/datasets/hojjatk/mnist-dataset in the repository of sample code. https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/tree/main Before starting, note that at the time of writing, concrete-ml is available from Zama for prototyping or non-commercial use https://community.zama.org/t/about-the-zama-open-source-licenses/223 without requiring a paid license. However, you may require a commercial license for commercial use. https://www.zama.org/post/open-source Training Build and deploy the training container To build the training container: - Assume the model-trainer role. - Create a file locally. Dockerfile.training https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/blob/main/training/Dockerfile.training - Add the following content to Dockerfile.training :Verify that the version numbers match across the entire system. The concrete-ml library requires version parity across the entire system for Python, the concrete-ml package, and the concrete-python package. - Build the container image: Push the image to Amazon ECR https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-ecr-image.html :- Run the authentication command to log in Docker to your Amazon ECR registry: - Tag the image with your repository name: - Push the tagged image: Verify that the container is available You should see JSON output containing your image with a non-empty imageDigest field and the latest tag. Train the model To train the model, complete the following. Note: in these steps, concrete-ml is no different from any other ML framework and the training container is no different from any other custom training container https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-training-container.html . Note that training occurs over plaintext data. That is, concrete-ml doesn’t require pre-processing of this data beyond normalization. But if additional pre-processing is necessary for regular training, it remains necessary here and must occur before, or as part of, the training job . Create the training script - Create a file named . training script.py https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/blob/main/training/training script.py - Add the following template code to training script.py : - Implement the data loading logic in the do training function. - Implement the model training logic in the do training function. Create a custom framework For convenience, we recommend that you create a custom framework https://docs.aws.amazon.com/sagemaker/latest/dg/frameworks.html to integrate your training container into SageMaker AI. To do so: - Create a file named framework.py https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/blob/main/training/framework.py . - Add the following content to framework.py : - Update the image uri value with your Amazon ECR training container location. Launch the training job This section will show how to launch the training job with a python script, but it can also be done using the console or the AWS Command Line Interface AWS CLI . Note: training jobs incur charges based on instance type and duration. - Create a virtual environment for Python 3.12. - Activate the virtual environment. - Install the following packages https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/blob/main/requirements txt files/requirements training.txt using pip: - Create a file named . start training.py https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/blob/main/training/start training.py - Add the following content to start training.py : - Update the instance type , role , output path , code location , and inputs values with your specific configuration. - Execute this file: - Verify that the training completed successfully by checking the training job status: Look for TrainingJobStatus: Completed . Then verify that the output files exist:Confirm server.zip and client.zip are present. After training completes, the training container saves two files to the model bucket: server.zip used by the inference endpoint and client.zip used by clients to encrypt queries . Inference Build and deploy the inference container FHE-based ML inference will be more complex than standard ML inference because of some new technical constraints: - Clients need model-specific information from client.zip to generate cryptographic keys. - FHE ciphertexts can exceed SageMaker AI query size limits, so the client and service need to communicate them outside of SageMaker AI API calls. - FHE evaluation might take longer than SageMaker AI timeouts, and so inference will use the SageMaker AI mechanisms for asynchronous inference. https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference.html - The endpoint needs an evaluation key a type of public key from the client to perform FHE evaluation. To accommodate these new requirements and to streamline the user’s experience, we show you how to build a system in which - A custom client encrypts queries and attaches evaluation keys to them - A custom training endpoint retrieves client.zip when needed, and uses it to evaluate the FHE model - The same custom client decrypts predictions from the training endpoint - The client and endpoint communicate ciphertexts and keys to each other using Amazon S3 To deploy and use this system, complete the following sections. Write your predictor Create a file named with the following content. predictor.py https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/blob/main/inference/endpoint/container files/predictor.py This predictor expects the ‘query’ to contain three Amazon S3 locations: two for where to find the encrypted query and the associated evaluation key, and one for where to write the prediction. It downloads the query and key, evaluates the FHE model on them, and writes the prediction back to Amazon S3. Package the predictor into a container To package this predictor into a container: - Assume the endpoint-creator role. - Create a new directory for the container files. - Copy predictor.py into the new directory. - Obtain the required boilerplate files , nginx.conf https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/blob/main/inference/endpoint/container files/nginx.conf , and serve https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/blob/main/inference/endpoint/container files/serve by downloading them from the sample repository or copying them from the SageMaker AI documentation for wsgi.py https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/blob/main/inference/endpoint/container files/wsgi.py custom inference containers https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-inference-container.html . Note: the latter, increase the timeout value in nginx.conf to allow FHE evaluation to complete. - Create a in that directory. Dockerfile.inference - Add the following content to the Dockerfile.inference file: - Build the container image: Push the image to Amazon ECR https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-ecr-image.html .- Run the authentication command to log in Docker to your Amazon ECR registry: - Tag the image with your repository name: - Push the tagged image: - Verify the container is available: You should see JSON output containing your image with a non-empty imageDigest field and the latest tag. Deploy the inference endpoint Important: endpoints incur ongoing charges until deleted, and costs will vary based on instance type, training duration, and endpoint uptime. For detailed pricing information, see Amazon SageMaker AI Pricing https://aws.amazon.com/sagemaker/pricing/ . Remember to delete the endpoint when finished to avoid unnecessary costs. Continuing to use the endpoint-creator role: - Create a virtual environment. - Activate this virtual environment. - Use pip to install the following packages https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/blob/main/requirements txt files/requirements endpoint.txt : - Create a file with the following content: start inference endpoint.py - Execute the script: - Verify the endpoint is in service: Wait until EndpointStatus shows InService before proceeding. This might take several minutes. The script will print out the name of the endpoint. Record this name for the client. Create the client The user shouldn’t need to know anything about FHE to use your system. Therefore, the client will hide all FHE details. Specifically, the client will: - Retrieve client.zip from Amazon S3. - Use client.zip to generate keys. - Encrypt the query with those keys. - Write the encrypted query and associated evaluation key to Amazon S3. - Send these locations to the inference endpoint and receive back the Amazon S3 location of the encrypted prediction. - Retrieve the encrypted prediction and decrypt it. To create this client: - Create a file named . client.py https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/blob/main/inference/client/run inference.py - Add the following template code to client.py : - Implement the get query function to retrieve your plaintext query. - Update the placeholder values for Amazon S3 locations, endpoint name, and model location. - Add exception handling code for the placeholder