Nvidia cuts AI inference costs by up to tenfold with Blackwell architecture and open-source models

In a published blog post, Nvidia highlights that leading inference service providers — including Baseten, DeepInfra, Fireworks AI and Together AI — are able to reduce the unit cost of processing a single token by as much as 10× compared with previous hardware generations such as the Hopper platform, by combining Blackwell with optimized software stacks and open-source models.

The Blackwell platform, based on Nvidia’s newly designed microarchitecture, was built specifically to handle AI workloads while increasing both throughput and energy efficiency. As a result, a higher number of tokens can be processed using the same amount of infrastructure. It is this increase in throughput that directly drives down the operational cost per token.

Deployment examples show the broad economic impact of this approach. In healthcare, Sully.ai — using Blackwell together with open-source models — achieved a 90% reduction in inference costs while also shortening response times, improving the viability of automating tasks such as medical coding and clinical documentation workflows.

Other use cases include gaming platforms and customer-support tools, where companies reported token-cost reductions of between 4× and 10× when running Blackwell with low-precision formats (such as NVFP4) and open-source models instead of relying on expensive proprietary API providers.

This shift in the cost model is important not only for cloud providers, but also for enterprises and startups that want to scale AI-based applications without massive financial outlays. A substantial drop in cost per token could make AI less exclusive to the largest players and significantly more accessible to smaller organizations.

Industry analyses indicate that the cost reduction is driven not only by the hardware itself, but by the tight integration of hardware and software — optimized drivers, algorithms and open-source models run more efficiently on the Blackwell platform, maximizing utilization of compute resources.

The new inference cost structure could have a meaningful impact on the pace of commercialization of AI solutions in sectors such as healthcare, services and entertainment — especially in use cases where every processed token translates directly into operating expenses. Lower costs may also reduce barriers to entry for companies building products on top of large language models.

Pentagon pushes major AI companies to deploy their models on classified military networks

Microsoft patches Windows 10 — more than 50 bugs and 6 zero-day vulnerabilities fixed. This is a critical security update

Ukraine tests the “Sunray” laser system for neutralising unmanned aerial vehicles

Wearable Waste. The Ecological Price of Medical Devices

“Despite Automation, Medical AI Research Is Still About Talking to People.” Kaapana Platform Powers Medical AI Research Across University Clinics

CES 2026: how AI and smart devices redefined everyday life