AMD will challenge Nvidia’s dominance with its Instinct MI300 AI accelerators.
The market for artificial-intelligence (AI) accelerators is dominated by Nvidia (NVDA 1.95%). The company reportedly sold half a million AI GPUs in the third quarter, with more than half of that total going to Microsoft and Meta Platforms. Nvidia’s H100 GPUs are a hot commodity, with AI companies scrambling to secure enough accelerators to tap into soaring demand for AI services.
Advanced Micro Devices (AMD 0.43%) is looking to end Nvidia’s dominance with powerful AI accelerators of its own. The company officially launched its Instinct MI300 Series accelerators on Wednesday, kicking off a battle with Nvidia for a market that AMD expects to be worth $45 billion this year and $400 billion by 2027. The company plans on selling more than $2 billion worth of AI chips in 2024, up from essentially nothing today.
A massive leap in performance
AMD is launching two AI accelerator products. The MI300X is the product that will go toe-to-toe with Nvidia’s H100, featuring a whopping 192GB of high-bandwidth memory. That’s more than twice as much memory as the H100, which could provide AMD with a key advantage given that large language models, or LLMs, require massive amounts of memory.
Compared to the H100, AMD is making some big claims. The MI300X is capable of 1.6 times the performance when running inference on certain LLMs, with the BLOOM 176B model given as an example. AMD also touted the fact that a single MI300X accelerator can run inference on a 70 billion parameter model, which is not the case for Nvidia’s products.
The MI300A isn’t as powerful, with fewer GPU cores and less memory than the MI300X. But the MI300A includes AMD’s latest Zen 4 CPU cores, positioning the chip for the high-performance computing market. AMD is focusing on efficiency, saying that the MI300A delivers 1.9 times the performance per watt of its previous-generation MI250X.
Hardware isn’t everything
One significant advantage Nvidia has in the data-center GPU market is software. Nvidia’s CUDA platform enables developers to harness its GPUs for computation tasks, and since being first released 16 years ago, it’s become the de facto industry standard. Only Nvidia GPUs are supported, so switching AI chip vendors is not as simple as popping in an AMD accelerator in place of an NVIDIA accelerator.
AMD’s answer to this problem is ROCm, its open GPU computing platform that’s now on its sixth iteration. ROCm supports the most popular AI frameworks, including TensorFlow and PyTorch, and AMD has expanded the ecosystem through partnerships. The company has also made a few AI-related acquisitions, including open-source AI software company Nod.ai, in an effort to catch up with Nvidia on the software front.
While Nvidia still holds a software advantage, AMD has a few big customers lined up for its new AI chips. Microsoft and Meta Platforms are on board: Microsoft is rolling out a new virtual server series on Azure powered by the MI300X, and Meta will use the MI300X for various AI inference workloads. Oracle will offer new bare metal instances featuring MI300X chips, and Dell, Hewlett-Packard Enterprise, Lenovo, and Supermicro are planning systems built around AMD’s new AI products.
AMD should have little problem selling as many AI chips as it can have made in the short term, given the insatiable demand for AI accelerators. How that demand evolves is hard to predict. AI isn’t going anywhere, but as competition intensifies and more options beyond Nvidia become available, pricing could come under pressure.
As Nassim Nicholas Taleb has said: “I’ve seen gluts not followed by shortages, but I’ve never seen a shortage not followed by a glut.” While AMD will probably win market share from Nvidia, AI chips may not remain so lucrative indefinitely.