Close Menu
    2digital.news2digital.news
    • News
    • Analytics
    • Interviews
    • About us
    • Editorial board
    • Events
    2digital.news2digital.news
    Home»News»Nvidia cuts AI inference costs by up to tenfold with Blackwell architecture and open-source models
    News

    Nvidia cuts AI inference costs by up to tenfold with Blackwell architecture and open-source models

    Mikolaj LaszkiewiczBy Mikolaj LaszkiewiczFebruary 13, 20262 Mins Read
    LinkedIn Twitter Threads Reddit
    Share
    Twitter LinkedIn Threads Reddit

    In a published blog post, Nvidia highlights that leading inference service providers — including Baseten, DeepInfra, Fireworks AI and Together AI — are able to reduce the unit cost of processing a single token by as much as 10× compared with previous hardware generations such as the Hopper platform, by combining Blackwell with optimized software stacks and open-source models.

    The Blackwell platform, based on Nvidia’s newly designed microarchitecture, was built specifically to handle AI workloads while increasing both throughput and energy efficiency. As a result, a higher number of tokens can be processed using the same amount of infrastructure. It is this increase in throughput that directly drives down the operational cost per token.

    Deployment examples show the broad economic impact of this approach. In healthcare, Sully.ai — using Blackwell together with open-source models — achieved a 90% reduction in inference costs while also shortening response times, improving the viability of automating tasks such as medical coding and clinical documentation workflows.

    Other use cases include gaming platforms and customer-support tools, where companies reported token-cost reductions of between 4× and 10× when running Blackwell with low-precision formats (such as NVFP4) and open-source models instead of relying on expensive proprietary API providers.

    This shift in the cost model is important not only for cloud providers, but also for enterprises and startups that want to scale AI-based applications without massive financial outlays. A substantial drop in cost per token could make AI less exclusive to the largest players and significantly more accessible to smaller organizations.

    Industry analyses indicate that the cost reduction is driven not only by the hardware itself, but by the tight integration of hardware and software — optimized drivers, algorithms and open-source models run more efficiently on the Blackwell platform, maximizing utilization of compute resources.

    The new inference cost structure could have a meaningful impact on the pace of commercialization of AI solutions in sectors such as healthcare, services and entertainment — especially in use cases where every processed token translates directly into operating expenses. Lower costs may also reduce barriers to entry for companies building products on top of large language models.

    Related Posts

    News

    US wants to spend more on drones than the defense budgets of entire nations

    April 22, 2026
    News

    European Commission approves the first mCOMBRIAX vaccine: Moderna’s combined COVID-19 and flu protection

    April 22, 2026
    News

    Zondacrypto chief disappears, employees receive termination emails as company descends into chaos

    April 21, 2026
    Read more

    From AI Picking to Robots by Subscription: How Industrial Robotics Is Changing

    April 15, 2026

    Sex toys got an upgrade. The kitchen didn’t. Maria Kardakova wants to fix that

    April 10, 2026

    The Biggest Bet in Commercial Aviation — Next Narrow-Body Aircraft

    April 8, 2026
    Demo
    X (Twitter) Instagram Threads LinkedIn Reddit
    • NEWS
    • ANALYTICS
    • INTERVIEWS
    • ABOUT US
    • EDITORIAL BOARD
    • EVENTS
    • CONTACT US
    • ©2026 2Digital. All rights reserved.
    • Privacy policy.

    Type above and press Enter to search. Press Esc to cancel.