Microsoft’s Maia Chip Targets A.I. Inference as Big Tech Rethinks Training
Microsoft CEO Satya Nadella speaking onstage. " width="970" height="647" data-caption='As training costs soar, Microsoft is betting its latest chip on running models efficiently, not teaching them. <span class="media-credit">JASON REDMOND/AFP via Getty Images</span>'>
Microsoft this week (Jan. 26) unveiled its latest in-house A.I. chip, Maia 200, calling it “the most efficient inference system” the company has ever built. Microsoft claims the chip outperforms rival Big Tech processors such as Amazon’s Trainium 3 and Google’s TPU v7 on key benchmarks, while delivering 30 percent better performance per dollar than its existing Azure hardware fleet.
Maia 200 is a custom application-specific integrated circuit (ASIC) designed primarily for A.I. inference rather than training. The distinction is critical. Training is the process by which an A.I. model learns: engineers feed it vast amounts of data and iteratively adjust its parameters until it becomes better at recognizing patterns and making predictions. Training is computationally intensive, costly and typically done only periodically.
Inference, by contrast, is what happens once the model is deployed. Every time a user asks Microsoft Copilot a question, receives a response from a chatbot like GPT, or generates an image, the model is performing inference—using what it has already learned to produce an output. This happens on an enormous scale, often millions or billions of times per day. In simple terms, training builds the brain; inference is the brain at work.
The Maia chip is fabricated on TSMC’s 3-nanometer process and contains more than 140 billion transistors. Its architecture is optimized for low-precision compute formats such as FP4 and FP8, which modern A.I. models increasingly favor for inference workloads. It is designed to keep large language models constantly supplied with data and to generate tokens efficiently, without expending energy on features that are unnecessary for inference.
Alongside Maia 200, Microsoft also previewed a full software development kit (SDK). Together, the chip and SDK form a vertically integrated system aimed at reducing Microsoft’s dependence on Nvidia’s CUDA ecosystem, long considered the dominant platform in A.I. infrastructure.
Maia 200 will power inference workloads for OpenAI’s GPT-5.2, Microsoft 365 Copilot and synthetic data generation pipelines used by Microsoft’s Superintelligence team. The chip is already running in Azure data centers, beginning with the U.S. Central region near Des Moines, Iowa, with additional regions planned.
Microsoft’s move reflects a broader shift across the industry. Major model builders, including Google, Amazon, Meta and OpenAI, are increasingly designing their own chips to reduce reliance on Nvidia, whose top-end GPUs—such as the H100 and B200—reportedly cost between $30,000 and $70,000 each and consume enormous amounts of power.
OpenAI, for example, is developing a custom A.I. chip in partnership with Broadcom, with OpenAI leading design and Broadcom handling manufacturing. Google relies heavily on its in-house TPUs within Google Cloud, tightly integrated with its TensorFlow framework and optimized for running trained models at scale. Anthropic uses Google’s TPUs to train and operate its Claude models, while Meta is reportedly in advanced talks with Google over a multibillion-dollar deal to use TPUs for its own workloads.
Amazon, meanwhile, offers Trainium chips for training and Inferentia chips for inference, targeting AWS customers with large-scale but cost-sensitive A.I. workloads. Together, these efforts signal a growing push by tech giants to control more of the A.I. stack—from software to silicon—as compute becomes one of the industry’s most strategic bottlenecks.
