{"slug": "end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe", "title": "End-to-end encrypted ML inference with Amazon SageMaker AI and FHE", "summary": "Amazon has introduced a method for running machine learning inference on Amazon SageMaker AI using fully homomorphic encryption (FHE), which keeps data encrypted throughout the entire process—including queries, responses, and intermediate values. The approach, detailed in a new blog post, allows organizations in healthcare, energy, and telecommunications to process sensitive data in the cloud without exposing it to third parties, including the cloud provider itself.", "body_md": "[Artificial Intelligence](https://aws.amazon.com/blogs/machine-learning/)\n\n# End-to-end encrypted ML inference with Amazon SageMaker AI and FHE\n\nMachine learning (ML) inference often requires processing sensitive data—medical records, proprietary business information, or personal communications. What if you could run ML inference in the cloud while hiding your data from the cloud itself? More specifically, what if you could enforce that your data stayed encrypted throughout the entire ML inference process? This post will show you how to use [Amazon SageMaker AI](https://aws.amazon.com/sagemaker/ai/) with fully homomorphic encryption (FHE) to perform ML inference. Using FHE, we present an approach to ML inference that’s designed to keep queries, responses, and intermediate values encrypted and unreadable by observers—including SageMaker AI itself.\n\nFHE is a form of encryption that allows encrypted data to be processed in encrypted form without decryption. In the ML inference setting, you can use it to apply a model to an encrypted query without decryption, producing an encrypted prediction. Consider these scenarios where such a capability would provide value:\n\n**Healthcare**: A health insurance company wants to provide doctors with an ML model that predicts medical procedure outcomes based on diagnostic data. Publishing the model in the cloud simplifies deployment, but doctors can’t expose patient medical information to third parties due to privacy regulations.**Energy sector**: An oil and gas corporation uses ML to evaluate satellite photos of potential drill sites and select photos for further expert evaluation. They want to host the model in the cloud for cost savings but can’t expose photographs of politically sensitive locations to third parties.**Telecommunications**: A telecom operator wants to process customer emails to detect spam and phishing. They need cloud-based ML for scalability, but data protection regulations require that customer messages remain encrypted at third parties.\n\nThis blog has previously discussed FHE for ML inference in the post [Enable fully homomorphic encryption with Amazon SageMaker endpoints for secure, real-time inferencing](https://aws.amazon.com/blogs/machine-learning/enable-fully-homomorphic-encryption-with-amazon-sagemaker-endpoints-for-secure-real-time-inferencing/), but this post goes a little further. That previous post showed how to implement FHE-based inference ‘from scratch’ by hand-crafting a linear-regression algorithm using a low-level library called [SEAL](https://www.microsoft.com/en-us/research/project/microsoft-seal/). Instead, this post shows a much more flexible and higher-level approach based on [concrete-ml](https://docs.zama.org/concrete-ml), a high-level library built specifically for FHE-based inference. It supports several common types of models ‘out of the box’ and is even API compatible with the well-known ML library scikit-learn.\n\nIn this post, you will learn how to:\n\n- Train a concrete-ml model in SageMaker AI using a custom container\n- Deploy that model to a SageMaker AI inference endpoint\n- Create a custom client for concrete-ml inference\n- Use that client to make queries to your inference endpoint\n\nWhen finished you will have a system that uses concrete-ml in SageMaker AI designed to perform end-to-end encrypted ML inference.\n\n## Solution overview\n\nUsing concrete-ml in SageMaker AI works as follows:\n\n- The model owner prepares their data for training. Concrete-ml works well when all features have been normalized to the same scale, such as [-1, 1].\n- The model owner uses this data to train an FHE-enabled version of their model. This model is designed to perform computations over encrypted data instead of plaintext.\n- The model owner hosts this model in SageMaker AI.\n- Clients encrypt their queries using the FHE scheme supported by the model.\n- Clients send encrypted queries to the FHE-enabled model in the cloud.\n- The model transforms the encrypted query into an encrypted prediction without decrypting values during the FHE computation.\n- The model returns the encrypted response to the client, who decrypts it to retrieve the prediction.\n\nThis differs from, and complements, confidential computing environments like those provided by the Amazon Web Services (AWS) [Nitro System](https://aws.amazon.com/ec2/nitro/) in [Amazon Elastic Compute Cloud (Amazon EC2)](https://aws.amazon.com/ec2/). With AWS Nitro Enclaves, queries are decrypted and processed in plaintext within hardened, isolated environments that provide CPU and memory isolation. With FHE, queries remain encrypted throughout; security relies on mathematics rather than hardware or software.\n\n## Prerequisites\n\nTo implement this solution, you need:\n\n- A local development environment with\n[Python](https://www.python.org/)3.12 installed, the ability to install packages using[pip](https://pip.pypa.io/en/stable/), and[Docker](https://www.docker.com/)or other container-building software installed locally. In addition, these instructions will recommend that you work in[virtual environments](https://virtualenv.pypa.io/en/latest/), but this isn’t strictly necessary. - An AWS account, containing:\n- Repositories in\n[Amazon Elastic Container Registry (Amazon ECR)](https://aws.amazon.com/ecr/)to hold the images for training and inference containers, - Locations in\n[Amazon Simple Storage Service (Amazon S3)](https://aws.amazon.com/s3/)to hold:- The model\n- The training code (if you wish it to be stored in a separate bucket from the model)\n- Keys and ciphertexts\n\n- Repositories in\n\nWe suggest you follow the [security best practices for Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/security-best-practices.html).\n\n- Roles in AWS Identity and Access Management (IAM) for\n- The model creator\n- The inference endpoint creator\n- The inference endpoint itself\n- The clients\n\nFind IAM policies for these roles, along with a worked example for the [MNIST corpus of handwritten digits,](https://www.kaggle.com/datasets/hojjatk/mnist-dataset) in the repository of [sample code.](https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/tree/main)\n\nBefore starting, note that at the time of writing, concrete-ml is available from Zama for [prototyping or non-commercial use](https://community.zama.org/t/about-the-zama-open-source-licenses/223) without requiring a paid license. However, you may require a [commercial license for commercial use.](https://www.zama.org/post/open-source)\n\n## Training\n\n### Build and deploy the training container\n\nTo build the training container:\n\n- Assume the model-trainer role.\n- Create a\n\nfile locally.[Dockerfile.training](https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/blob/main/training/Dockerfile.training) - Add the following content to\n`Dockerfile.training`\n\n:Verify that the version numbers match across the entire system. The\n\n`concrete-ml`\n\nlibrary requires version parity across the entire system for Python, the`concrete-ml`\n\npackage, and the`concrete-python`\n\npackage. - Build the container image:\n[Push the image to Amazon ECR](https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-ecr-image.html):- Run the authentication command to log in Docker to your Amazon ECR registry:\n\n- Tag the image with your repository name:\n\n- Push the tagged image:\n\n### Verify that the container is available\n\nYou should see JSON output containing your image with a non-empty `imageDigest`\n\nfield and the `latest`\n\ntag.\n\n### Train the model\n\nTo train the model, complete the following.\n\nNote: in these steps, concrete-ml is no different from any other ML framework and the training container is no different from any other [custom training container](https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-training-container.html). Note that training occurs over *plaintext* data. That is, concrete-ml doesn’t require pre-processing of this data beyond normalization. But if additional pre-processing is necessary for regular training, it remains necessary here (and must occur before, or as part of, the training job).\n\n#### Create the training script\n\n- Create a file named\n\n.[training_script.py](https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/blob/main/training/training_script.py) - Add the following template code to\n`training_script.py`\n\n: - Implement the data loading logic in the\n`do_training`\n\nfunction. - Implement the model training logic in the\n`do_training`\n\nfunction.\n\n#### Create a custom framework\n\nFor convenience, we recommend that you create a custom [framework](https://docs.aws.amazon.com/sagemaker/latest/dg/frameworks.html) to integrate your training container into SageMaker AI. To do so:\n\n- Create a file named\n[framework.py](https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/blob/main/training/framework.py). - Add the following content to\n`framework.py`\n\n: - Update the\n`image_uri`\n\nvalue with your Amazon ECR training container location.\n\n#### Launch the training job\n\nThis section will show how to launch the training job with a python script, but it can also be done using the console or the AWS Command Line Interface (AWS CLI). (Note: training jobs incur charges based on instance type and duration.)\n\n- Create a virtual environment for Python 3.12.\n- Activate the virtual environment.\n- Install the\n[following packages](https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/blob/main/requirements_txt_files/requirements_training.txt)using pip: - Create a file named\n\n.[start_training.py](https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/blob/main/training/start_training.py) - Add the following content to\n`start_training.py`\n\n: - Update the\n`instance_type`\n\n,`role`\n\n,`output_path`\n\n,`code_location`\n\n, and`inputs`\n\nvalues with your specific configuration. - Execute this file:\n- Verify that the training completed successfully by checking the training job status:\nLook for\n\n`TrainingJobStatus: Completed`\n\n. Then verify that the output files exist:Confirm\n\n`server.zip`\n\nand`client.zip`\n\nare present.\n\nAfter training completes, the training container saves two files to the model bucket: `server.zip`\n\n(used by the inference endpoint) and `client.zip`\n\n(used by clients to encrypt queries).\n\n## Inference\n\n### Build and deploy the inference container\n\nFHE-based ML inference will be more complex than standard ML inference because of some new technical constraints:\n\n- Clients need model-specific information from\n`client.zip`\n\nto generate cryptographic keys. - FHE ciphertexts can exceed SageMaker AI query size limits, so the client and service need to communicate them outside of SageMaker AI API calls.\n- FHE evaluation might take longer than SageMaker AI timeouts, and so inference will use the SageMaker AI mechanisms for\n[asynchronous inference.](https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference.html) - The endpoint needs an evaluation key (a type of public key) from the client to perform FHE evaluation.\n\nTo accommodate these new requirements and to streamline the user’s experience, we show you how to build a system in which\n\n- A custom client encrypts queries and attaches evaluation keys to them\n- A custom training endpoint retrieves client.zip when needed, and uses it to evaluate the FHE model\n- The same custom client decrypts predictions from the training endpoint\n- The client and endpoint communicate ciphertexts and keys to each other using Amazon S3\n\nTo deploy and use this system, complete the following sections.\n\n#### Write your predictor\n\nCreate a file named\n\nwith the following content.[predictor.py](https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/blob/main/inference/endpoint/container_files/predictor.py)\n\nThis predictor expects the ‘query’ to contain three Amazon S3 locations: two for where to find the encrypted query and the associated evaluation key, and one for where to write the prediction. It downloads the query and key, evaluates the FHE model on them, and writes the prediction back to Amazon S3.\n\n#### Package the predictor into a container\n\nTo package this predictor into a container:\n\n- Assume the endpoint-creator role.\n- Create a new directory for the container files.\n- Copy\n`predictor.py`\n\ninto the new directory. - Obtain the required boilerplate files (\n\n,[nginx.conf](https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/blob/main/inference/endpoint/container_files/nginx.conf)\n\n, and[serve](https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/blob/main/inference/endpoint/container_files/serve)\n\n) by downloading them from the sample repository or copying them from the SageMaker AI documentation for[wsgi.py](https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/blob/main/inference/endpoint/container_files/wsgi.py)[custom inference containers](https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-inference-container.html). (Note: the latter, increase the timeout value in`nginx.conf`\n\nto allow FHE evaluation to complete.) - Create a\nin that directory.`Dockerfile.inference`\n\n- Add the following content to the\n`Dockerfile.inference`\n\nfile: - Build the container image:\n[Push the image to Amazon ECR](https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-ecr-image.html).- Run the authentication command to log in Docker to your Amazon ECR registry:\n\n- Tag the image with your repository name:\n\n- Push the tagged image:\n\n- Verify the container is available:\n\nYou should see JSON output containing your image with a non-empty\n\n`imageDigest`\n\nfield and the`latest`\n\ntag.\n\n#### Deploy the inference endpoint\n\n(Important: endpoints incur ongoing charges until deleted, and costs will vary based on instance type, training duration, and endpoint uptime. For detailed pricing information, see [Amazon SageMaker AI Pricing](https://aws.amazon.com/sagemaker/pricing/). Remember to delete the endpoint when finished to avoid unnecessary costs.) Continuing to use the endpoint-creator role:\n\n- Create a virtual environment.\n- Activate this virtual environment.\n- Use pip to install the\n[following packages](https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/blob/main/requirements_txt_files/requirements_endpoint.txt): - Create a file\nwith the following content:`start_inference_endpoint.py`\n\n- Execute the script:\n- Verify the endpoint is in service:\nWait until\n\n`EndpointStatus`\n\nshows`InService`\n\nbefore proceeding. This might take several minutes.\n\nThe script will print out the name of the endpoint. Record this name for the client.\n\n### Create the client\n\nThe user shouldn’t need to know anything about FHE to use your system. Therefore, the client will hide all FHE details. Specifically, the client will:\n\n- Retrieve\n`client.zip`\n\nfrom Amazon S3. - Use\n`client.zip`\n\nto generate keys. - Encrypt the query with those keys.\n- Write the encrypted query and associated evaluation key to Amazon S3.\n- Send these locations to the inference endpoint and receive back the Amazon S3 location of the encrypted prediction.\n- Retrieve the encrypted prediction and decrypt it.\n\nTo create this client:\n\n- Create a file named\n\n.[client.py](https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/blob/main/inference/client/run_inference.py) - Add the following template code to\n`client.py`\n\n: - Implement the\n`get_query()`\n\nfunction to retrieve your plaintext query. - Update the placeholder values for Amazon S3 locations, endpoint name, and model location.\n- Add exception handling code for the placeholder\n`<handle exception>`\n\nblocks to manage`TimeoutError`\n\n,`FileNotFoundError`\n\n, and`TarError`\n\naccording to your application requirements.\n\n(You might have noticed that the client and endpoint treat encrypted queries and responses differently. Clients send encrypted queries to endpoints by manually writing them to Amazon S3 and submitting the Amazon S3 location as the actual query. Endpoints submit encrypted results directly, allowing SageMaker AI to handle the write to / read from Amazon S3. Why the difference? The encrypted response is a single byte-string, which SageMaker AI can handle naturally. The client’s query, however, is a JSON structure that must contain the location of the evaluation keys. The encrypted query would need to be encoded (such as with [Base64](https://en.wikipedia.org/wiki/Base64)) to be embedded in the same JSON, which add unnecessary processing and network time. Hence, the sample code bypasses this encoding step by handling the encrypted queries itself.)\n\nThen:\n\n- Create a virtual environment.\n- Activate the virtual environment.\n- Install the\n[required packages](https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/blob/main/requirements_txt_files/requirements_client.txt):\n\nFinally:\n\n- Assume the client role.\n- Execute this script:\n`python client.py`\n\n- Verify that the FHE encryption is working correctly by comparing the prediction output to expected results.\n\n## Clean up resources\n\nTo avoid incurring future charges, delete the resources that you created:\n\n- Delete the inference endpoint through the SageMaker AI console or SDK.\n- Verify that the endpoint was deleted:\nThis should return an error indicating that the endpoint doesn’t exist.\n\n- Delete the endpoint configuration through the SageMaker AI console or SDK.\n- Verify that the endpoint configuration has been deleted:\nThis should show no matching endpoint configuration.\n\n- Delete the SageMaker AI model through the SageMaker AI console or SDK.\n- Verify that the model has been deleted:\nThis should show no matching models.\n\n- Delete the model artifacts, encrypted queries, encrypted responses, and evaluation keys from Amazon S3 through the Amazon S3 console or AWS CLI.\n- Verify that Amazon S3 objects were deleted:\nThis should show empty or no matching objects.\n\n- Delete the container images from Amazon ECR through the Amazon ECR console or AWS CLI.\n- Verify that the container images were deleted:\nThis should show no matching images.\n\n## Common issues\n\n- TimeoutError during inference: Increase WaiterConfig max_attempts or use larger instance type.\n- AccessDenied errors: Verify IAM roles have correct S3 and SageMaker AI permissions.\n- Container build failures: Verify Docker has sufficient memory (over 8 GB).\n- Server errors during inference: Verify version parity across concrete-ml packages.\n\n## Performance and security considerations\n\nFHE provides cryptographic protection but comes with performance tradeoffs. The overhead depends on the model, but you can typically expect slowdowns of up to 100,000X compared to plaintext inference. You can reduce this slowdown in a few ways. The first is to increase the number of vCPUs in the instance. Another is to use a standard ML technique called ‘quantization’ which reduces the numeric precision used in model inference. Because the running time of concrete-ml increases with numeric precision, quantization might assist performance here even more than it would in normal ML inference. Quantization can reduce model accuracy, which isn’t otherwise affected by the conversion to FHE. However, quantization in the [model code](https://github.com/aws-samples/sample-end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/blob/main/training/training_script.py) reduced overhead to 2800X (67ms to 187s on a ml.m5.xlarge instance) with no observable loss in accuracy. By increasing the number of vCPUs, you can reduce that further to 500X (46s on a ml.m5.24xlarge instance).\n\nThis is still a significant slowdown for some applications. Because of this overhead, FHE isn’t yet suitable for interactive, latency-sensitive applications. However, it can be practical for asynchronous or batch processing workloads where privacy requirements outweigh latency concerns. For example, consider the use cases from the start of this post:\n\n- Providing doctors with an ML model that predicts medical procedure outcomes based on diagnostic data.\n- Evaluating satellite photos of potential oil/gas drill sites to select photos for further expert evaluation.\n- Detecting spam and phishing in email messages.\n\nEach of these use cases can tolerate a few additional seconds of latency.\n\nIt’s [important that clients keep decrypted queries and predictions secret](https://docs.zama.org/concrete-ml/explanations/security_and_correctness), as a concrete-ml encryption and its plaintext decryption (when combined) could reveal information about the secret encryption key. Also, it’s important to know that this system doesn’t protect the secrecy of the model. The queries and responses will be encrypted and opaque to SageMaker AI, but concrete-ml doesn’t encrypt the model itself. The model might still be visible to Sagemaker AI. It also might be susceptible to ‘model stealing’ attacks by those who can see plaintext queries and responses. Lastly, concrete-ml doesn’t provide circuit privacy: it’s possible that information about the model can be revealed by cipertexts. However, customers can still protect model and ciphertexts with the standard security mechanisms that AWS provides for Amazon S3 and SageMaker AI. Remember: security is a [shared responsibility](https://aws.amazon.com/compliance/shared-responsibility-model) between AWS and each customer. In keeping with best practices, customers should:\n\n- Follow the principle of least privilege when creating IAM roles. Grant only the minimum permissions required for each role to perform its function. Review the sample IAM policies in the repository and adjust resource ARNs and actions to match your specific use case.\n- Enable Amazon S3 bucket encryption for values which are not FHE ciphertexts. This includes enabling default encryption on all Amazon S3 buckets that store models, data, and evaluation keys to protect data at rest.\n- Reduce Amazon S3 bucket permissions to the minimum required by the system.\n\n## Conclusion\n\nYou can use FHE-based tools in SageMaker AI to perform inference on encrypted data designed to remain unreadable throughout the entire process. This approach can give you the benefits of SageMaker AI—agility, scale, and managed infrastructure—while helping you maintain cryptographic protection from query all the way through response.\n\nTo learn more about security and encryption in AWS, refer to the following resources:\n\n[What cryptographic services does AWS provide to customers?](https://aws.amazon.com/what-is/cryptography/#what-cryptographic-services-does-aws-provide-to-customers--1x0wovp)[AWS Crypto Tools Documentation](https://docs.aws.amazon.com/aws-crypto-tools/)[Cryptographic computing at AWS](https://aws.amazon.com/security/cryptographic-computing/)[An overview of fully homomorphic encryption](https://homomorphicencryption.org/introduction/)[Enable fully homomorphic encryption with Amazon SageMaker endpoints for secure, real-time inferencing](https://aws.amazon.com/blogs/machine-learning/enable-fully-homomorphic-encryption-with-amazon-sagemaker-endpoints-for-secure-real-time-inferencing/)[Security in Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/security.html)[Security in Amazon Elastic Container Service](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/security.html)[Configure security in Amazon SageMaker AI](https://docs.aws.amazon.com/sagemaker/latest/dg/security.html)[Security best practices in IAM](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html)\n\nIf you have questions or comments, contact us at aws-crypto-compute@amazon.com.", "url": "https://wpnews.pro/news/end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe", "canonical_source": "https://aws.amazon.com/blogs/machine-learning/end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe/", "published_at": "2026-06-08 16:14:14+00:00", "updated_at": "2026-06-11 17:30:44.287933+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "ai-infrastructure", "ai-products", "ai-tools"], "entities": ["Amazon SageMaker AI", "Amazon Web Services", "FHE", "fully homomorphic encryption"], "alternates": {"html": "https://wpnews.pro/news/end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe", "markdown": "https://wpnews.pro/news/end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe.md", "text": "https://wpnews.pro/news/end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe.txt", "jsonld": "https://wpnews.pro/news/end-to-end-encrypted-ml-inference-with-amazon-sagemaker-ai-and-fhe.jsonld"}}