Close Menu
    2digital.news2digital.news
    • Analytics
    • Interviews
    • Editorial board
    • About us
    2digital.news2digital.news
    Home»News»Nvidia cuts AI inference costs by up to tenfold with Blackwell architecture and open-source models
    News

    Nvidia cuts AI inference costs by up to tenfold with Blackwell architecture and open-source models

    Mikolaj LaszkiewiczBy Mikolaj LaszkiewiczFebruary 13, 20262 Mins Read
    LinkedIn Twitter Threads Reddit
    Share
    Twitter LinkedIn Threads Reddit

    In a published blog post, Nvidia highlights that leading inference service providers — including Baseten, DeepInfra, Fireworks AI and Together AI — are able to reduce the unit cost of processing a single token by as much as 10× compared with previous hardware generations such as the Hopper platform, by combining Blackwell with optimized software stacks and open-source models.

    The Blackwell platform, based on Nvidia’s newly designed microarchitecture, was built specifically to handle AI workloads while increasing both throughput and energy efficiency. As a result, a higher number of tokens can be processed using the same amount of infrastructure. It is this increase in throughput that directly drives down the operational cost per token.

    Deployment examples show the broad economic impact of this approach. In healthcare, Sully.ai — using Blackwell together with open-source models — achieved a 90% reduction in inference costs while also shortening response times, improving the viability of automating tasks such as medical coding and clinical documentation workflows.

    Other use cases include gaming platforms and customer-support tools, where companies reported token-cost reductions of between 4× and 10× when running Blackwell with low-precision formats (such as NVFP4) and open-source models instead of relying on expensive proprietary API providers.

    This shift in the cost model is important not only for cloud providers, but also for enterprises and startups that want to scale AI-based applications without massive financial outlays. A substantial drop in cost per token could make AI less exclusive to the largest players and significantly more accessible to smaller organizations.

    Industry analyses indicate that the cost reduction is driven not only by the hardware itself, but by the tight integration of hardware and software — optimized drivers, algorithms and open-source models run more efficiently on the Blackwell platform, maximizing utilization of compute resources.

    The new inference cost structure could have a meaningful impact on the pace of commercialization of AI solutions in sectors such as healthcare, services and entertainment — especially in use cases where every processed token translates directly into operating expenses. Lower costs may also reduce barriers to entry for companies building products on top of large language models.

    Related Posts

    News

    The Era of Gemini 3.5 and a Total Search Revolution: Google I/O 2026 Recap

    May 20, 2026
    News

    Jail Time for Hiding Content Origins. South Korea Announces Strict Digital Watermark Law

    May 19, 2026
    News

    Our Brain Tricks Us Into Thinking AI Has No Doubts

    May 18, 2026
    Read more

    What Is Cloud Computing in Healthcare and How Is It Used?

    May 13, 2026

    The Security Perimeter Is Gone: How Zero Trust Is Changing Corporate Cybersecurity

    May 12, 2026

    IT Worker Migration in 2026. Where Tech Talent Is Moving and Why

    May 8, 2026
    Demo
    X (Twitter) Instagram Threads LinkedIn Reddit
    • NEWS
    • ANALYTICS
    • INTERVIEWS
    • ABOUT US
    • EDITORIAL BOARD
    • EVENTS
    • CONTACT US
    • ©2026 2Digital. All rights reserved.
    • Privacy policy.

    Type above and press Enter to search. Press Esc to cancel.