Nvidia Faces Fierce Competition as Startups Target AI Inference Market

Nvidia’s dominance in AI computing is facing unprecedented challenges as startups like SambaNova Systems, Cerebras, and Groq aggressively target the rapidly growing AI inference market. While Nvidia maintains a trillion-dollar headstart in AI chip technology, these emerging competitors are betting that inference computing—the production stage of AI where trained models generate outputs—represents their best opportunity to capture market share.

Rodrigo Liang, who cofounded SambaNova Systems in 2017, predicts that 90% of AI computing workloads will shift to inference in the near future. This projection aligns with recent data from Nvidia CFO Colette Kress, who revealed that 40% of the company’s data center revenue now comes from inference, up from a much smaller fraction just years ago. The inference market is expected to mature within approximately six months, according to Liang.

The competitive strategy centers on speed and specialized architecture. Unlike Nvidia and AMD, which rely on graphics processing units (GPUs) originally designed for rendering graphics, SambaNova uses a reconfigurable dataflow unit (RDU) specifically engineered for machine learning models. Cerebras takes a different approach with its dinner plate-sized AI chip, while Groq offers its own proprietary architecture. All three companies claim to deliver the fastest inference computing in the world.

According to artificialanalysis.ai, Cerebras, SambaNova, and Groq currently rank as the three fastest APIs for Meta’s Llama 3.1 70B and 8B models. These startups measure performance in tokens per second—the rate at which AI systems can consume prompts and generate responses. This speed advantage is particularly crucial for agentic AI applications, where multiple AI models communicate with each other and latency can significantly impact user experience.

To bypass direct competition with Nvidia, these startups are adopting innovative business models. They’re offering inference-as-a-service through cloud platforms, providing direct access to foundation models like Meta’s open-source Llama. This approach positions them as competitors to both chip manufacturers like Nvidia and AI model companies like OpenAI.

However, skeptics like Semianalysis chief analyst Dylan Patel argue that GPUs still offer superior total cost of ownership per token when considering all advantages and expenses over a chip’s lifetime. Both SambaNova and Cerebras dispute this claim, with Feldman asserting that GPU manufacturers’ TCO leadership claims reflect marketing power rather than technological superiority. Nvidia declined to comment but reportedly believes its networking strength, liquid cooling capabilities, and ARM CPU make it optimal for inference workloads.

Key Quotes

I think you’re gonna see this inference game open up for all these other alternatives in a much, much broader way than the pre-training market opened up. Because it was so concentrated with very few players, Jensen could personally negotiate those deals in a way that, for startups, it’s hard to break up.

SambaNova cofounder Rodrigo Liang explains why the inference market presents a better opportunity for startups to compete against Nvidia than the training market did. This matters because it suggests the AI chip landscape may become more diverse and competitive as the industry shifts toward deployment and production workloads.

GPUs offer superior total cost of ownership per token.

Semianalysis chief analyst Dylan Patel challenges the startup narrative, arguing that Nvidia’s GPUs remain more economical when considering lifetime costs. This perspective is significant because it highlights that raw speed metrics don’t tell the complete story about chip competitiveness in real-world deployments.

While GPU manufacturers may claim leadership in TCO, this is not a function of technology but rather the big bull horn they have.

Cerebras CEO Andrew Feldman disputes the total cost of ownership advantage claimed by GPU manufacturers, suggesting that Nvidia’s market position stems from marketing power rather than technological superiority. This quote underscores the heated debate about whether established players or innovative startups offer better value propositions.

There is typically a tradeoff when it comes to speed and cost. Higher inference speed can mean a larger hardware footprint, which in turn demands higher costs.

Rodrigo Liang acknowledges the inherent tension between performance and economics in chip design, while claiming SambaNova overcomes this through architectural efficiency. This matters because it reveals the engineering challenges these startups must solve to truly compete with Nvidia’s established ecosystem.

Our Take

The battle for AI inference supremacy represents more than just chip competition—it’s a referendum on whether specialized, purpose-built architectures can overcome the entrenched advantages of general-purpose GPUs. Nvidia’s dominance stems not just from chip performance but from its comprehensive CUDA software ecosystem, established customer relationships, and manufacturing scale. The startups’ inference-as-a-service strategy is clever, allowing them to prove their technology’s value without requiring customers to make large capital commitments or retrain engineering teams.

However, the lack of standardized benchmarking makes it difficult to verify performance claims objectively. The fact that Nvidia doesn’t appear in artificialanalysis.ai comparisons while dominating MLPerf benchmarks illustrates how fragmented performance evaluation remains. As the inference market matures over the next six months, real-world deployments will provide the ultimate test of whether speed advantages translate to sustainable competitive positions against Nvidia’s trillion-dollar ecosystem.

Why This Matters

This competitive shift in the AI chip market signals a critical inflection point for the entire artificial intelligence industry. As AI transitions from the training phase to widespread deployment, inference computing is becoming the dominant workload—potentially representing 90% of all AI computing. This transformation creates a rare opening for challengers to compete against Nvidia’s near-monopoly position.

The implications extend far beyond chip manufacturers. Faster, more cost-effective inference computing directly impacts the viability of AI applications across industries, from customer service chatbots to autonomous systems. If startups can deliver on their speed and cost promises, it could accelerate AI adoption by making real-time, interactive AI applications more practical and affordable.

For businesses investing in AI infrastructure, this competition introduces both opportunity and complexity. Organizations must now evaluate not just raw performance metrics but total cost of ownership, compatibility with existing systems, and long-term vendor viability. The emergence of inference-as-a-service models also provides alternatives to capital-intensive chip purchases, potentially democratizing access to high-performance AI computing for smaller companies and startups.

For those interested in learning more about artificial intelligence, machine learning, and effective AI communication, here are some excellent resources:

Source: https://www.businessinsider.com/nvidia-competition-sambanova-groq-cerebras-show-faster-inference-speed-2024-9