# Neuralwatt: Energy-based pricing for AI inference. Efficient prompts cost less

> Source: <https://portal.neuralwatt.com/>
> Published: 2026-06-21 16:09:18+00:00

Neuralwatt Cloud

#
Run Inference with Real Visibility

into Power, Cost, and Efficiency

The first AI inference API with energy-based pricing. Know exactly what your AI costs —
in dollars *and* kilowatt-hours.

Use Neuralwatt Cloud as a hosted service, or bring Neuralwatt Deploy into your own data center.

## Try it now

Send a prompt and see energy-aware inference in action.

## Inference Priced by Energy Consumed

Token-based pricing hides the true cost of AI inference. We're changing that. Pay per kilowatt-hour and know exactly what resources your AI workloads consume.

### Transparent

See energy consumption per request. No hidden costs, no opaque token multipliers.

### Predictable

Energy costs are consistent. No surprises from model-specific pricing variations.

### Efficient

Optimize your AI workloads. Compare energy efficiency across models and make informed decisions.

## Why Neuralwatt?

Three pillars that define every layer of our platform.

### Energy Reporting

Every customer gets real-time energy metrics. Know exactly what your AI workloads consume.

- Per-request energy metrics
- Dashboard with usage trends
- Model efficiency comparisons

### Performance

State-of-the-art inference powered by vLLM with tensor parallelism, continuous batching, and advanced KV caching.

- As low as 15ms time to first token
- High throughput at scale
- Multi-GPU tensor parallelism

### Efficiency

More intelligence per kilowatt-hour. Optimized infrastructure for maximum compute efficiency.

- 40% more energy efficient
- Energy-aware scheduling
- Optimized GPU utilization

### Multi-Model API

Access multiple LLMs through a single API. Switch models seamlessly without managing separate connections.

### OpenAI Compatible

Drop-in replacement for OpenAI APIs. Just change your base URL and you're ready to go.

## The Neuralwatt Platform

Three integrated capabilities for high-performance, energy-efficient AI — from the data center to the API.

### Neuralwatt Cloud

YOU ARE HEREHosted Inference Service

The first AI inference service with energy-based pricing. OpenAI-compatible API with real-time energy transparency per request.

### Neuralwatt Deploy

On-Premise Optimization

Bring Neuralwatt's energy optimization directly into your data center. Full control over your hardware, security, and power consumption.

### Neuralwatt Optimize

Power Optimization Engine

Intelligent layer between AI workloads and GPUs that continuously tunes power consumption in real time with less than 0.1% performance overhead.

## Featured Models

Access the latest open-source models from leading providers. All with OpenAI-compatible APIs.

### GPT-OSS 120B

OpenAI

[Request Access](/models)

## Start with Energy-Transparent AI

Get started with $5 in free credits. Pay per kWh or per token — your choice. Real-time energy reporting included with every account.

### Enterprise & Dedicated Inference

Need dedicated GPU capacity, custom SLAs, or on-premises deployment? Our enterprise solutions offer guaranteed performance with full energy transparency.

- Dedicated GPU infrastructure
- SLA guarantees up to 99.9%
- Volume pricing & custom models

[Contact Enterprise Sales](/cdn-cgi/l/email-protection#c1a8afa7ae81afa4b4b3a0adb6a0b5b5efa2aeac)
