Nvidia Employee Calls Microsoft's AI Data Center Cooling 'Wasteful'

An internal controversy has emerged over Microsoft’s cooling approach for Nvidia’s latest Blackwell AI chips, with an Nvidia employee describing the setup as “wasteful” in an internal memo from early fall 2024. The incident occurred during the deployment of GB200 NVL72 server racks at a Microsoft facility supporting OpenAI’s infrastructure, where each rack houses 72 Nvidia GPUs designed to train and run advanced AI models.

The Nvidia Infrastructure Specialists (NVIS) team member noted that while Microsoft uses liquid cooling technology for the servers themselves—necessary given the intense heat generated by multiple GPUs operating in tandem—the facility’s building-level cooling system appeared inefficient. According to Shaolei Ren, an associate professor at UC Riverside who studies data center resource usage, Microsoft likely employs a two-phase cooling approach: liquid cooling for servers and air cooling at the building level. While air cooling avoids water consumption, it requires significantly more energy.

Nvidia’s Blackwell architecture, announced in March 2024, represents a major leap forward, delivering roughly twice the performance of its Hopper predecessor, according to CEO Jensen Huang. The GB200 is part of the earlier Blackwell deployment wave, with the newer GB300 generation now available. Nvidia has been installing hundreds of thousands of these systems across major tech companies to meet surging demand for AI compute power.

Microsoft defended its approach, explaining that its “liquid cooling heat exchanger unit is a closed-loop system” deployed in existing air-cooled data centers to enhance cooling capacity. The company emphasized that this design maximizes their global data center footprint while ensuring efficient heat dissipation for AI and hyperscale systems. Microsoft has committed to being “carbon negative, water positive, and zero waste” by 2030 and announced plans for zero-water cooling designs in next-generation facilities.

The internal memo also revealed typical deployment challenges, noting that onsite support was essential and “many hours were spent creating validation process documentation.” However, the production hardware quality showed improvement over early samples, with both racks achieving a 100% pass rate on compute performance tests. As AI infrastructure expands globally, the tension between energy consumption and water usage in data center cooling has become a critical issue, prompting public pushback in some regions where new facilities are planned.

Key Quotes

Microsoft’s cooling system and data center cooling approach for their GB200 deployment seems wasteful due to the size and lack of facility water use, but does provide a lot of flexibility and fault tolerance.

This quote from an Nvidia Infrastructure Specialists team member’s internal email reveals concerns about Microsoft’s cooling efficiency during Blackwell chip deployment, highlighting the trade-offs between different cooling approaches in AI data centers.

This type of cooling system tends to be using more energy, but it doesn’t use water.

Shaolei Ren, UC Riverside associate professor studying data center resources, explained the fundamental trade-off in cooling systems—air cooling avoids water consumption but requires significantly more energy, a critical consideration as AI infrastructure scales.

These companies are profit-driven, they weigh in the water cost, the energy cost, and also the publicity cost.

Professor Ren’s observation captures the complex calculus tech companies face when choosing cooling systems, balancing financial costs with public perception as communities increasingly scrutinize data center environmental impacts.

Our customers, including Microsoft, have successfully deployed hundreds of thousands of Blackwell GB200 and GB300 NVL72 systems to meet the world’s growing need for artificial intelligence.

An Nvidia spokesperson’s statement emphasizes the massive scale of Blackwell deployment despite cooling concerns, underscoring the relentless demand for AI compute power driving data center expansion worldwide.

Our Take

This incident reveals a fascinating tension within the AI infrastructure ecosystem. While Nvidia and Microsoft are partners in the AI boom, their priorities don’t always align perfectly—Nvidia focuses on chip performance and deployment efficiency, while Microsoft must balance multiple stakeholders including local communities concerned about resource consumption. The “wasteful” characterization, though diplomatically walked back, points to genuine technical disagreements about optimal cooling strategies. What’s particularly notable is how quickly these infrastructure decisions are being made under competitive pressure. The fact that production Blackwell systems achieved 100% pass rates despite deployment challenges suggests both companies are learning rapidly. However, the broader question remains: can the AI industry scale sustainably, or will environmental constraints eventually limit growth? Microsoft’s 2030 commitments are ambitious, but the current reality shows difficult trade-offs between water, energy, and operational flexibility that won’t be easily resolved.

Why This Matters

This story highlights the mounting environmental challenges facing the AI industry as it scales rapidly. The tension between water and energy consumption in data center cooling represents a fundamental infrastructure dilemma that will shape AI development for years to come. With Microsoft, Google, Amazon, and other tech giants racing to deploy more powerful AI chips, their cooling strategies have massive implications for both carbon emissions and water resources.

The fact that an Nvidia employee openly criticized a major customer’s approach underscores how urgent these efficiency concerns have become. As communities worldwide push back against data center water usage—particularly in drought-prone regions—companies face difficult trade-offs between environmental resources and public perception. Microsoft’s commitment to being water-positive by 2030 reflects growing pressure on tech companies to address their environmental footprint. The successful deployment of Blackwell chips, despite cooling concerns, demonstrates that AI infrastructure expansion continues at breakneck pace, making sustainable cooling solutions increasingly critical for the industry’s long-term viability and social license to operate.

Source: https://www.businessinsider.com/nvidia-microsoft-ai-gpu-blackwell-cooling-wasteful-2025-12