
Microsoft has unveiled its second-generation AI chip, Maia 200, marking a significant milestone in the company's end-to-end AI infrastructure strategy. Announced by Scott Guthrie, Executive Vice President of Cloud and AI, the new chip is engineered specifically for AI inference operations and represents the most efficient inference system Microsoft has ever deployed. Built on TSMC's cutting-edge 3-nanometer process, Maia 200 contains over 140 billion transistors and delivers 30% better performance per dollar compared to existing systems in Microsoft's fleet, positioning it as a cornerstone of the company's heterogeneous AI infrastructure.
From a technical standpoint, Maia 200 is an AI inference powerhouse designed to handle large-scale workloads. The chip delivers over 10 petaFLOPS in 4-bit precision (FP4) and over 5 petaFLOPS of 8-bit (FP8) performance, all within a 750W SoC TDP envelope. Microsoft claims Maia 200 offers three times better FP4 performance than Amazon's third-generation Trainium chip and FP8 performance exceeding Google's seventh-generation TPU. The redesigned memory subsystem features 216GB HBM3e memory at 7 TB/s bandwidth, 272MB of on-chip SRAM, and specialized data movement engines that keep massive models fed efficiently, addressing one of the critical bottlenecks in AI inference operations.
Maia 200 will play a crucial role in powering Microsoft's cloud services and AI offerings. The chip will serve multiple models, including OpenAI's latest GPT-5.2 family, and will provide performance advantages to Microsoft Foundry and Microsoft 365 Copilot users. Microsoft's Superintelligence team, led by Mustafa Suleyman, plans to leverage Maia 200 for synthetic data generation and reinforcement learning to enhance next-generation in-house models. The company has released a Maia SDK preview, offering developers comprehensive tools including PyTorch integration, a Triton compiler, and access to Maia's low-level programming language, enabling seamless model optimization across heterogeneous hardware accelerators.
Currently deployed in Microsoft's US Central datacenter region near Des Moines, Iowa, Maia 200 is set to expand to the US West 3 datacenter region near Phoenix, Arizona, with additional regions to follow. At the systems level, the chip introduces a novel two-tier scale-up network design built on standard Ethernet rather than proprietary fabrics like InfiniBand. This architecture enables scalable performance across clusters of up to 6,144 accelerators while reducing power consumption and total cost of ownership. Industry analysts note that this move signals Microsoft's strategic shift toward hardware independence, reducing reliance on third-party chip suppliers like Nvidia and AMD, and reflects a broader trend among hyperscalers to vertically integrate their technology stack for better optimization and cost control in the rapidly evolving AI landscape.