AI Inference Market Heats Up as New Players Drive Prices Down

The artificial intelligence inference market is experiencing a dramatic transformation as a wave of new startups and established tech companies compete to offer inference-as-a-service, threatening to commoditize what many consider the next major battleground in AI computing. Companies like Foundry, led by Jared Quincy Davis, are pioneering new approaches by becoming cloud providers themselves rather than simply selling technology to existing clouds.

Inference—the process of generating outputs from trained AI models—has become the focus of numerous players across the AI ecosystem. Chip designers like Cerebras, Groq, and SambaNova Systems have all pivoted to selling inference as a service alongside their core hardware businesses. Groq was founded by two former Google engineers who recognized early that inference would capture the larger share of AI computing demand. Meanwhile, AI-focused data center operators including Lambda, CoreWeave, Together AI, and Crusoe—all Nvidia partners—are offering specialized inference services, competing directly with hyperscalers like Amazon Web Services and Microsoft Azure.

The proliferation of inference providers is driving expectations that prices will soon drop dramatically. Davis compares the inference market to electricity: while most customers simply want their service to work seamlessly, those willing to shop around can find numerous specialized providers. The market is complex, with customers prioritizing different factors—some need maximum speed (measured in time-to-first-token or tokens-per-second), while others focus on cost efficiency or total job-completion time.

Profit margins in inference vary widely depending on the business model. Mitesh Agrawal, head of cloud for Lambda, explains that fixed-capacity compute offers predictable margins, while usage-based pricing tied to model inputs and outputs creates less predictable returns. The challenge lies in efficiently organizing multiple users across finite server resources. Despite the risks, companies offer inference-as-a-service to acquire customers who may eventually become traditional compute clients.

However, not everyone fears the anticipated price war. Davis invokes the Jevons paradox—an economic principle suggesting that when something becomes cheaper or more efficient, total consumption actually increases. “If I make something 10 times cheaper, people won’t spend 10 times less,” Davis explained. “They’ll spend more.” This theory is supported by Nvidia CEO Jensen Huang’s recent statements that newer models like OpenAI’s o1 require significantly more compute for inference because they run multiple models to verify their work and “reason” through problems.

Investor Sriram Viswanathan of Celesta Capital, who backed SambaNova Systems, predicts a “wildly competitive” few years ahead but believes winners will emerge based on merit—specifically the performance and power efficiency of their underlying architectures rather than go-to-market strategies alone.

Key Quotes

Part of the reason inference is a little commoditizable is customers are kind of paying for tokens at the end of the day.

Jared Quincy Davis, founder of AI-computing startup Foundry, explains why the inference market is becoming commoditized, highlighting how the standardization around token-based pricing is making services increasingly interchangeable.

Some companies just want output and they don’t care about infrastructure.

Mitesh Agrawal, head of cloud for Lambda, describes the customer mindset driving inference-as-a-service adoption, emphasizing that many businesses prioritize simplicity over technical control when deploying AI applications.

If I make something 10 times cheaper, people won’t spend 10 times less, nor will they even hold their budgets the same. They’ll spend more.

Davis invokes the Jevons paradox to argue that falling inference prices won’t hurt the market but will instead drive increased consumption, as better ROI encourages companies to deploy AI more extensively.

The core innovation can’t be in the go-to-market but in the performance and power of the underlying architecture.

Sriram Viswanathan, founding managing partner at Celesta Capital and SambaNova investor, predicts that technical superiority rather than sales strategy will determine which inference providers survive the coming competitive shakeout.

Our Take

The inference wars signal AI’s transition from research novelty to industrial utility. What’s particularly striking is how companies across the entire stack—from chip designers to cloud providers—are converging on the same business model, suggesting inference represents the “last mile” of AI monetization. The Jevons paradox argument is compelling: historically, technologies that become cheaper and more accessible see explosive growth in usage, from computing power to data storage. If this pattern holds, the inference market could expand far faster than current projections suggest. However, the sustainability question looms large. Many current players are likely using inference as a loss leader or customer acquisition strategy rather than a profitable standalone business. The eventual consolidation could leave the market dominated by hyperscalers with economies of scale, or create opportunities for specialized providers with genuine technical differentiation. The real test will come when demand inevitably fluctuates—who can maintain profitability during the inevitable supply-demand mismatches?

Why This Matters

This development represents a critical inflection point in the AI industry’s maturation. As AI moves from experimental technology to production deployment at scale, inference—not training—is becoming the dominant cost for companies running AI applications. The commoditization of inference could dramatically lower barriers to entry for AI-powered products and services, potentially accelerating AI adoption across industries.

The competitive dynamics reveal how the AI value chain is evolving. Chip designers, cloud providers, and specialized startups are all converging on inference as a key revenue stream, suggesting this market could be worth hundreds of billions of dollars. For businesses deploying AI, falling inference costs mean improved ROI on AI investments and the ability to run more sophisticated models more frequently.

However, the “race to the bottom” on pricing raises questions about long-term sustainability for smaller players. As Viswanathan noted, the coming years will likely see consolidation, with winners determined by fundamental technical advantages rather than pricing alone. This shakeout will shape which companies control the infrastructure powering the AI economy for the next decade.

For those interested in learning more about artificial intelligence, machine learning, and effective AI communication, here are some excellent resources:

Source: https://www.businessinsider.com/new-players-startups-ai-inference-driving-prices-down-cheap-workload-2024-10