A100 and H100 #
Cloud Pricing and Availability
An analysis of the availability and pricing of A100 and H100 GPU-based compute instances across AWS, Microsoft Azure, and Google Cloud Platform
Introduction #
The rapid expansion of AI and machine learning created unprecedented demand for high-performance compute infrastructure, particularly in the public cloud. According to internal and external research, within this landscape, H100 and A100 are two of the most sought-after chips due to their exceptional performance, high memory capacity, and robust software support.
This report examines the pricing and availability of compute instances with A100 and H100 GPU chips across the leading cloud providers: Amazon Web Services, Microsoft Azure, and Google Cloud Platform.
A note from our Co-Founder & President #
Our report focuses on the use of A100 and H100 GPUs for inference, recognizing that training remains the domain of a select few with the resources to pursue it. Beneath the industry hype, we believe that inference is where real economic impact and meaningful action reside.
GPU prices don’t follow logic; GPU availability and pricing are a mess. They follow hype and scarcity, challenging enterprises that cling to static contracts or single regions. Spot markets – when they exist – rise and fall like a rollercoaster, delivering cost efficiency shifts of even 8x, while access to A100s and H100s remains patchy, inconsistent, and often gated by region or provider.
Hyperscalers – AWS, Google, Azure, and Oracle Cloud – as well as a handful of neocloud players like Coreweave, Crusoe, and others remain true believers in the public, elastic, on-demand GPU market. Yet even now, most H100s and B200/300s are already pre-sold, tied to financial commitments spanning one to three years.
The GPU market is shaped by Reserved Instances (RIs), which have become the preferred model due to the prohibitively high cost of on-demand instances. With such high investment stakes, the industry has effectively reverted to glorified data centers; cloud elasticity is largely an illusion unless you can leverage automation and agents to stay ultra-agile in locating and provisioning what you need.
Many neoclouds build GPU capacity only after securing multi-year financial commitments from customers, effectively pre-selling their infrastructure before it exists. I cannot blame them: the cost – especially NVIDIA’s latest GPUs – is very high. Vendor financing is often used to pay for those, with the often-used term ‘circular investments.’
Outside of a handful of top-tier organizations with access to elite talent, GPU selection is driven by human subjectivity. Humans tend to chase the latest shiny object, and GPUs are no exception. H100s and B200/300 are often chosen based on a self-fulfilling prophecy: “If I don’t buy now, I might not find any later.” This mindset fuels the hype, which may obscure reality. As NVIDIA’s founder and CEO Jensen Huang noted at GTC 2025, new chip generations rapidly displace demand for prior ones, and performance-per-watt improvements do justify the excitement. But this cycle also fuels over-ordering and speculation, distorting true market needs.
The anticipated higher availability of NVIDIA’s A100 and H100 units next year is expected to shift dynamics significantly. As the Spot market gains momentum, substantial cost savings are on the horizon, including for On-Demand or reservations. We are closely monitoring the inflection point – likely to emerge within the next two quarters – that could redefine procurement strategies and unlock new efficiencies across the industry.
The winners will be those who remain agile: hopping across regions, moving between clouds and neoclouds, and letting automation carry out the repetitive tasks of selecting and provisioning the best GPU options.
There is an uncomfortable truth to this: adapt or overpay.
Laurent Gil Co-Founder and President of Cast AI
Methodology #
This report draws on Cast AI’s proprietary global GPU intelligence, using live provider APIs, catalogs, marketplace feeds, and telemetry from millions of scheduling events observed across cloud-native applications. From January 2024 through September 2025, Cast AI continuously tracked A100, H100, and mainstream GPUs (T4, V100, L4, and A10G) in every cloud region, normalizing prices per GPU and aligning availability by region, zone, and time. This extract focuses on North America and Europe. The underlying dataset is global, continuously updated, and cross-validated against real outcomes. This 2026 edition expands distribution and accessibility of the original report; data coverage runs from January 2024 through September 2025.
Compute instances used in regional pricing comparison
NVIDIA A100
NVIDIA H100
Key findings #
Being flexible about cloud region choice makes a tremendous price difference
AWS
By continuously provisioning in the most favorable US region during specific time periods, teams could achieve savings ranging from 2x to nearly 5x compared to average Spot Instance prices.
Price evolution for the EC2 instance p4d.24large with 8 A100 GPUs (Spot Instance)
Azure
Teams using the A100 instance Standard_ND96amsr_A100_v4 and willing to move workloads across US regions could cut costs between 7% and 32%, with the best periods yielding almost 1.5x efficiency.
Google Cloud Platform
Teams running workloads on the a3-highgpu-8g (H100) instance in Europe could reduce costs by up to 48%, unlocking almost 2x savings power during optimal periods.
GPU procurement should be viewed as a fluid, evolving market – one that demands agility, not rigid contracts.
GPU prices change dynamically, even for the H100 GPU chip
The price of the H100 GPU-powered p5.48xlarge Spot Instance fell by 88% between January 2024 ($105.20) and September 2025 ($12.16). This translates to an 8.65x improvement in cost efficiency and 4.35x savings power vs. average Spot Instance pricing.
**Price evolution for an AWS EC2 instance p5.48xlarge with 8 H100 GPUs in eu-north-1 region
(Spot Instance)** Teams that are agile and can move workloads dynamically will capture outsized savings, while those locked into static regions or contracts will miss them. Adopting automated workload migration and multi-region strategies isn’t optional anymore; it’s the only way to turn cloud GPU volatility into a long-term cost advantage.
Availability of instances running on specific GPU chips varies by cloud provider and changes over time #
In the US East region, A100 availability shows very different trajectories across providers:
100%
AWS significantly expanded coverage, moving from just 50% in June 2024 to full 100% availability by September 2025, reflecting investment in scaling GPU capacity.
100%
Azure, by contrast, maintained steady full coverage at 100% throughout the period, indicating maturity and stability in its A100 offering.
44%
GCP, however, remained flat at 44% availability, suggesting limited regional support and no meaningful growth year over year.
Teams need to build multi-cloud strategies that take advantage of cloud providers’ growing GPU footprint while avoiding single-provider lock-in.
Availability of A100 instance types differs by region
- AWS delivers full A100 availability in us-east-1 and us-west-2, but only partial support in ca-central-1 and us-east-2.
- Azure provides broad coverage in key US regions.
- GCP’s support may be more fragmented outside us-central1 (note that while AWS offers two A100 instance types, GCP offers nine).
GPU procurement is a multidimensional challenge: it’s not enough to secure access to A100s from a preferred provider – you must also account for regional distribution, redundancy, latency, and compliance. Teams that ignore these gaps risk bottlenecks or capacity shortages, while those who plan with a multi-region, multi-cloud mindset will be able to scale reliably and cost-effectively.
Mid-range to low-range legacy GPUs still dominate the AWS and GCP offerings
The most available GPU instances across cloud providers are T4 and L4 (AWS and GCP) and A10 (Azure).
Securing A100 or H100 capacity is far more challenging, often requiring quotas or enterprise agreements. This uneven availability is leading organizations to use a mix of strategies, where they train models on high-end clusters when they can and then use more common mid-tier GPUs for deployment to keep costs down.
The GPU landscape: Which hyperscaler cloud provider offers the GPUs you need? #
The availability of GPU instances varies significantly across cloud providers, regions, and Availability Zones. Understanding where these GPUs are available is essential for planning AI/ML workloads, as it affects the ability to scale, the risk of interruptions, and overall operational efficiency.
This section examines how GPU availability differs across providers and regions to help organizations identify the best locations to access the resources they need.
Availability of A100 instance types differs by region
The tables below display the percentage of A100 instance types that a cloud provider offers in a specific region. For example, AWS has two A100 instance types: a 50% availability means that only one of these instance types is available in the region.
In North America, AWS delivers full A100 availability in us-east-1 and us-west-2, but partial support in ca-central-1 and us-east-2. Azure provides broad coverage in key US regions, while GCP’s support may be more fragmented outside us-central1. Note that while AWS offers two A100 instance types, GCP offers nine.
In Europe, GPU shopping decisions require even closer attention to regional disparities in availability. AWS offers full A100 coverage in eu-central-1 but only half the options in eu-west-1, with no visible support elsewhere, limiting flexibility for customers outside those hubs. Azure shows broader distribution, with full coverage in westeurope and great partial support (80%) in secondary regions like francecentral, italynorth, polandcentral, and swedencentral, giving users more regional choice but not always full instance variety. GCP, meanwhile, centralizes A100 availability for Europe exclusively in europe-west4, leaving other EU regions uncovered.
North America: US and Canada
#### Europe (GDPR regions)
For teams, this means Europe-based workloads may face heavier regional concentration and fewer options than in the US, making capacity planning, latency trade-offs, and multi-cloud strategies more important when securing GPUs.
The key takeaway is that GPU procurement is a multidimensional challenge: it’s not enough to secure access to A100s from a preferred provider; you must also account for regional distribution, redundancy, latency, and compliance. Teams that ignore these gaps risk bottlenecks or capacity shortages, while those who plan with a multi-region, multi-cloud mindset will be able to scale reliably and cost-effectively.
Availability of instances running on specific GPU chips varies by cloud provider and changes over time
The table below displays the percentage of A100 GPU instance types offered by the cloud service provider in the US East region.
A100 in the US East region: Availability of A100 GPUs in September 2025 in the US East region
In the US East region, A100 availability shows very different trajectories across providers:
- AWS significantly expanded coverage, moving from just 50% in June 2024 to full 100% availability by September 2025, reflecting a clear investment in scaling GPU capacity.
- Azure, by contrast, maintained steady full coverage at 100% throughout the period, indicating maturity and stability in its A100 offering.
- GCP, however, remained flat at 44% availability, suggesting limited regional support and no meaningful growth year over year.
For teams, this means AWS is rapidly catching up to Azure in A100 flexibility, while GCP still offers fewer options, making it less attractive for workloads needing diverse or scalable A100 instances in the US East. Mid-range to low-range legacy GPUs still dominate the cloud offerings
Cloud providers have broadly deployed legacy and mid-tier GPUs such as T4, L4, or A10 because these architectures are mature, proven, and relatively inexpensive.
The picture looks different for NVIDIA’s flagship A100 and H100. These chips remain in exceptionally high demand and short supply, except on Azure and Google Cloud, where they rank among the top three most available GPUs. However many cloud providers reserve them primarily for enterprise customers and AI startups under contract.
For users, the implications are clear. Securing A100 or H100 capacity is far more challenging, often requiring quotas or enterprise agreements. This uneven availability is leading organizations to use a mix of strategies, where they train models on high-end clusters when they can and then use more common mid-tier GPUs for deployment to keep costs down.
Building a solid foundation with On-Demand GPUs: Low regional price variability #
Identifying the best On-Demand regions is key for running GPU workloads efficiently and reliably: different regions vary in capacity, pricing, and likelihood of interruptions, which can directly impact performance, cost, and job completion times. This section dives into the price evolution of specific compute instances across AWS, Azure, and GCP to show the potential cost savings users may achieve by selecting a particular region in a particular timeframe.
US and Canada
In the US and Canada, AWS offers broad coverage but small cost differentials, with potential cost savings of up to 12% for A100. For Azure and GCP, savings are generally single digits (4-9%), with the exception of Azure’s H100 compute instance Standard_ND96isr_H100_v5 that offered 12% cost savings from August 2024 to September 2025 in the eastus region.
Europe (GDPR regions)
In Europe, AWS shows minimal savings potential and regional variability. Azure also delivers very limited regional arbitrage in Europe, with less than 5% of cost savings at best. The highest levels of potential savings await GCP users who need a compute instance a3-highgpu-8g running on an H100 chip, with almost 9% of potential cost savings.
When purchasing On-Demand or reserved GPUs for one- or three-year terms, pricing remains consistent across regions within the same cloud provider. All US regions share identical rates, as do all EU regions. Therefore, availability – not cost – should guide your regional selection. In contrast, Spot pricing varies significantly by region.
Optimizing GPU costs with Spot Instances: Best regions for running AI workloads #
Spot Instances are discounted cloud virtual machines that let you use unused capacity at much lower prices. However, the provider can interrupt and reclaim them at any time. Spot availability can vary widely across regions and AZs, affecting instance fulfillment, interruption rates, and pricing.
By targeting regions where Spot Instances are most consistently available and least prone to interruptions, organizations can maximize cost savings, minimize workflow disruptions, and ensure smoother execution of AI/ML workloads at scale.
Spot Instance GPU pricing in the US and Canada
AWS users stand to gain a lot from running workloads on Spot Instances and choosing their regions wisely – especially on A100, with savings as high as 80%. H100 Spot is more volatile but can still outperform significantly.
Azure tends to be more conservative, with modest Spot savings at lower volatility – with discounts reaching 32% in only a handful of months during the period analyzed. GCP Spot pricing tends to be more stable across regions, hence the smaller potential cost savings.
By continuously provisioning in the most favorable US region during each period, teams could achieve savings ranging from 2x to nearly 5x compared to average Spot Instance prices. In practice, this means a team willing to move workloads monthly could cut costs between 50% and 80%, with the best periods yielding almost five times the efficiency. The conclusion is clear: dynamic regional optimization unlocks massive cost advantages, turning cloud GPU economics from prohibitively expensive into sustainably efficient.
AWS
##### p4d.24xlarge (A100)
##### p5.48xlarge (H100)
Azure
##### Standard_ND96amsr_A100_v4 (A100)
##### Standard_ND96isr_H100_v5 (H100)
Google Cloud Platform
##### a2-highgpu-8g (A100)
##### a3-highgpu-8g (H100)
Spot Instance GPU pricing in Europe (GDPR regions)
Just like in the US, AWS is the best performer in Europe, consistently delivering 60–70% discounts for A100. The pricing of the provider’s H100 compute instance showed high volatility during the period analyzed, eventually reaching a significant level of cost savings. While GCP offers a varied level of savings (including uniform pricing for its A100 instance across European regions), Azure users can benefit from savings within the range of 25–40%.
Running GPUs on Spot in Europe can cut costs by 2-3x on AWS if workloads tolerate interruptions – or if you have an automation solution like Cast AI in place to handle your workloads if the provider reclaims the instance. Such solutions use ML-powered models to predict the likelihood of a given Spot Instance getting interrupted and move workloads to a different instance before that happens to ensure continued operation.
AWS
## p4d.24xlarge (A100)
##### p5.48xlarge (H100)
This instance is an intriguing case from the price evolution perspective:
- January to October 2024 – The best available Spot price stayed constant at ~$105.20/hour. That suggests limited regional competition or capacity at the time, a single “floor” price across the board.
- Until November 2024, the Spot and On-Demand prices were the same, likely due to tight capacity. Later, both started to fall.
- In November 2024, there was a sharp price drop, with the average falling to $57.86 per hour, which represents nearly a 45% decrease compared to the baseline from early 2024.
- December 2024 until September 2025 – The trend accelerates, hitting $22.78/hour in December 2024, and continuing into the $12–33/hour range throughout 2025. Net effect: Between Jan 2024 ($105.20) and Sep 2025 ($12.16), the Spot price fell by 88.4%, which translates to nearly an 8.65x improvement in cost efficiency (savings power of 4.35x vs. the average Spot Instance pricing).
The H100 Spot pricing curve in eu-north-1 shows how aggressively cloud providers use price as a demand lever. For much of 2024, prices held steady at an artificial floor of ~$105/hour, reflecting tight supply and limited regional competition. But once capacity expanded, AWS reduced Spot rates: first by nearly half in November 2024, then by more than 80% by mid-2025.
This 8.45x swing in cost efficiency makes one thing clear: Spot Instance markets are engineered more for supply-demand balancing than for price stability.
Azure
##### Standard_ND96amsr_A100_v4 (A100)
##### Standard_ND96isr_H100_v5 (H100)
Google Cloud Platform
##### a2-highgpu-8g (A100)
##### a3-highgpu-8g (H100)
For enterprises, the takeaway is that those who are agile and can move workloads dynamically will capture outsized savings, while those locked into static regions or contracts will miss them. Adopting automated workload migration and multi-region strategies isn’t optional anymore; it’s the only way to turn cloud GPU volatility into a long-term cost advantage.
Best practices for optimizing GPU availability, cost, and utilization #
1. Be flexible when choosing cloud regions and Availability Zones
The availability and pricing analysis in this report demonstrates that it’s important to stay flexible when choosing the cloud region or Availability Zone for running your GPU workloads. Prices fluctuate often, and for some providers, like Google Cloud Platform, AZ selection can make a huge difference.
For example, in June 2025, the difference in availability between the us-central1-c and us-central1-f regions was 45%. In practice, the difference meant that us-central1-f offered four chip types while us-central1-c offered ten. How do you pick the right instance, in the right location, and at the right time?
Solution: Use an automation platform that selects compute instances based on real-time price fluctuations
Yotpo, a SaaS provider of cutting-edge marketing solutions that help organizations accelerate growth, was looking to optimize its cloud costs. An investigation into cost patterns revealed that regional pricing differences could make a massive cost impact. However, manually moving workloads between Availability Zones quickly became unsustainable.
We saw that if we moved the same EC2 instance to another Availability Zone (AZ) within the same region, the cost would drop by 20%. And that’s quite an easy fix.
So we started moving workloads to cheaper instances. Then we also looked at the instance type itself. Is it the most cost-effective instance for our workload?
We’ve been doing all that work manually and at some point, we realized that we couldn’t keep doing it this way.It just didn’t make sense to spend so much time on this. That’s when we started looking for a tool that would do this job for us automatically.
Achi SolomonDirector of DevOps at Yotpo
Yotpo integrated Cast AI to automatically provision the most cost-effective instance type and size, taking into account real-time workload demand and changing provider pricing. The team realized significant cost savings immediately after they implemented Cast and automatically migrated workloads to Spot Instances.
The graph on the right compares Yotpo’s cloud costs within two time periods:
- Period A is a 3-day range during which the team onboarded Cast.
- Period B shows the level of savings Yotpo achieved after completing the implementation of Cast.
2. Run AI workloads on Spot Instances
Running AI workloads on Spot Instances carries risk because the cloud provider may reclaim capacity at any time, leading to unexpected interruptions and job failures. For long training runs or distributed workloads, this means progress can be lost. Spot users must design their pipelines to withstand sudden preemptions; otherwise, the savings can quickly be offset by lost time and wasted compute.
Solution: To run workloads on Spot Instances confidently, use an automation solution that provisions instances for you and has a fallback mechanism to keep your workloads running during Spot droughts
Foretellix, the leading provider of data automation for AI-powered autonomy, uses Cast’s automation to ensure that applications can automatically switch from Spot Instances to Reserved Instances during high-demand periods when Spot Instances may not be available.
When we switched to Cast AI, we saw the feature that automatically reverts to Reserved Instances when no Spot Instances were available. We actually were able to get rid of all these Reserved Instances that we kept and just use Cast as is. We now know that whenever there are no Spot Instances it will automatically switch to Reserved Instances without any effort on our part. It’s a no-brainer for us.
This was especially helpful at the end of last year when there’s typically a spike in demand for Spot Instances. Cast AI automatically switched to Reserved Instances on demand, without us having to do anything, which was a great benefit. We no longer had to move workloads from Spot Instances to Reserved Instances manually in preparation for a Spot drought.
Ron GrosbergVP, Research & Development at Foretellix
Cast eliminates the complexity of handling Spot Instance interruptions via an automatic fallback mechanism that moves workloads to on-demand instances when needed – all the while maintaining optimal cost and high service availability.
3. Consolidate workloads on a larger GPU instance
Here’s an example of the AWS G5 instance types that offer varying GPU configurations depending on their size.
In terms of price per GPU, larger multi-GPU instances provide better value than several smaller, single-GPU instances.
When you look at the prices per GPU, there isn’t much difference between the 4-GPU (g5.24xl) and 8-GPU (g5.48xl) instances. This means that larger instances are cheaper for tasks that use many GPUs.
How can teams choose the optimal instance size for multiple workloads?
Solution: Use automation to select the most optimal instance size and bin-pack workloads efficiently
Fairgen, the provider of a generative AI-powered market research solution, needed to optimize resource utilization across its SaaS and AI infrastructure without sacrificing performance or user experience. Provisioning cloud resources manually was impossible to scale, especially with AI workloads that demanded more powerful machines, larger clusters, and dynamically adjustable capacity.
Using Cast, Fairgen automated the management of its clusters, optimized resource allocation in real time, and ultimately cut operational costs by 70%.
For us, the ability to define and use different machine templates in the same environment for different stages of our AI pipeline has been a game-changer.
We prepare everything before running our major research workloads. Our system includes a preprocessor that determines, based on the specific step and complexity, which template should be used—small, medium, large, or even one we call the “master” (64 vCPUs, just in case). Interestingly, we rarely need the master, but when we do, it’s because we know it’s actually more cost-effective to run one large process than to scatter smaller tasks inefficiently.
Mati KonenVP of Engineering at Fairgen
The cluster autoscaler, workload autoscaling, and bin-packing combination significantly reduced CPU overprovisioning and dropped the cost per requested CPU.