AI Lost a Lot of Money as a Vending Machine Manager. An Experiment Exposes Gaps in Models’ Practical Skills

Project Vend involved handing control of a snack vending machine in the WSJ newsroom to an AI agent called “Claudius.” The system, based on Claude AI running on the Sonnet 3.7 large language model, was given tools to communicate via Slack, manage inventory, set prices, and order products online — all with the objective of maximizing profit from an initial virtual budget.

In practice, the AI agent was quickly manipulated by newsroom staff, who persuaded it to make drastic changes to its operating strategy. As a result of interactions with users, prices were dropped to zero, products were given away for free, and the system approved purchases of unusual items such as a PlayStation 5, a live fish, and alcohol, which were then distributed to employees. Instead of generating profit, the machine ultimately produced losses exceeding $1,000.

Anthropic (the creator of Claude AI) and the project’s partner, Andon Labs, described the outcome not as a failure but as a stress test — one designed to see how an agent copes with chaos, conflicting signals, and social manipulation. According to the researchers, the experiment revealed concrete weaknesses in AI’s ability to maintain coherent goals, manage context, and resist user influence — all critical requirements for deploying autonomous systems outside controlled laboratory settings.

Experts note that such tests do not demonstrate AI’s readiness to independently run real-world businesses. While simulations often show promising results, exposure to unpredictable human behavior and complex real environments clearly highlights the immaturity of current agents — particularly in long-term financial management and business strategy.

During certain phases of the experiment, Claude performed competently, identifying suppliers and negotiating prices, but it was easily provoked into decisions that ran counter to its core objective of profit maximization. This points to a fundamental limitation of today’s AI agents: they struggle to consistently uphold task priorities when confronted with misinformation, social pressure, or misleading inputs.

The WSJ newsroom trial was only one instance of Project Vend. Earlier tests conducted at Anthropic’s offices using similar vending machines also showed the AI mispricing products and approving unprofitable transactions. These experiences underscore that while autonomous AI agents are improving, their ability to make independent, sound business decisions remains limited — at least for now.

YouTube blocks background playback in mobile browsers — users must switch to the app or Premium

Texas begins releasing “glowing” flies to stop the dangerous screwworm from entering the United States

AI in mammography reduces detection of advanced breast cancer – results from the first randomized controlled trial

“One can use AI to predict hurricanes, cyberattacks, and disease, but not financial panics.” We spoke with an economist about where AI can actually help them

The Health Data Gold Rush. How OpenAI and Anthropic Are Competing for Medical Records

GPUs, Budgets, and API Grey Zones: The Hidden Cost of External Models in Pharma