Zach Anderson
Apr 15, 2026 15:39
NVIDIA’s Blackwell structure delivers $0.12 per million tokens versus $4.20 on Hopper, reshaping AI infrastructure economics for enterprise deployments.
NVIDIA is pushing enterprises to desert conventional value metrics for AI infrastructure, arguing that value per token—not uncooked compute energy—determines whether or not corporations can profitably scale their AI operations. The chip large’s newest benchmarks present its Blackwell structure slashing token technology prices to $0.12 per million tokens, down from $4.20 on the earlier Hopper technology.
That is a 35x discount that essentially modifications the maths on AI deployment economics.
The Metric Shift
NVIDIA’s argument is simple: information facilities have change into “AI token factories,” and measuring them by FLOPS per greenback misses the purpose fully. Uncooked compute and precise token output aren’t the identical factor—a distinction that turns into stark when evaluating architectures.
Operating DeepSeek-R1, Blackwell’s GB300 NVL72 configuration generates 6,000 tokens per GPU versus simply 90 on HGX H200. The hourly value distinction? Blackwell runs about $2.65 per GPU hour in comparison with $1.41 for Hopper. Cheaper {hardware}, dramatically worse output.
The effectivity features compound at scale. Blackwell delivers 2.8 million tokens per megawatt—over 50x what Hopper manages. For enterprises constructing on-premises AI infrastructure the place energy prices are locked in for years, that throughput benefit issues greater than sticker value.
Why This Timing Issues
Gartner lately projected AI token prices may plummet greater than 90% by 2030, and NVIDIA’s information means that decline is already accelerating. The corporate emphasizes that its software program optimizations—together with TensorRT-LLM and the newly production-ready Dynamo serving layer—proceed enhancing token output on current {hardware}, which means prices hold dropping post-purchase.
Cloud companions CoreWeave, Nebius, Nscale, and Collectively AI have already deployed Blackwell infrastructure at scale. For enterprises weighing build-versus-buy selections, these suppliers now supply entry to sub-dollar-per-million-token economics with out the capital dedication.
The Hidden Complexity
NVIDIA’s “inference iceberg” framework highlights what specification sheets miss: FP4 precision help, speculative decoding, KV-cache offloading, and disaggregated serving all decide real-world output. A GPU missing these optimizations—no matter peak specs—delivers fewer tokens and better efficient prices.
The corporate is actually arguing that rivals providing cheaper {hardware} are promoting a false financial system. Whether or not that holds is dependent upon how rapidly various architectures can shut the token-output hole, and whether or not enterprises prioritize upfront financial savings over operational effectivity.
For now, NVIDIA’s benchmark information offers infrastructure patrons a concrete framework: cease evaluating hourly charges and begin calculating what every delivered token truly prices.
Picture supply: Shutterstock
