Intel, GraphCore And Groq: Let The AI Cambrian Explosion Begin

As we approach the end of a year full of promises from AI startups, a few companies are meeting their promised 2019 launch dates. These include Intel, with its long-awaited Nervana platform, UK startup Graphcore and the stealthy Groq from Silicon Valley. Some of these announcements fall a bit short on details, but all claim to represent breakthroughs in performance and efficiency for training and/or inference processing. Other recent announcements include Cerebras’s massive wafer-scale AI engine inside its multi-million dollar CS-1 system and NVIDIA’s support for GPUs on ARM-based servers. I’ll opine on those soon, but here I will focus on Intel, Graphcore and Groq’s highly anticipated chips.

Intel demos Nervana NNP, previews Ponte Vecchio GPU

At an event in San Francisco on November 12, Intel announced it was sampling its Nervana chips for training and inference to select customers including Baidu, Facebook and others. Additionally, it took the opportunity to demonstrate working hardware. While this looked a lot like a launch, Intel carefully called it an “update.” Hopefully we will see a full launch soon, with more specs like pricing, named customers and OEM partners ready to ship product in volume.

Intel recently previewed impressive performance in the Mlperf inference benchmarks for the NNP-I (the “I” stands for inference). Keep in mind that these chips are the second iteration of Intel’s Nervana chip, and I expect Intel incorporated significant customer input in these revised designs. While Intel disclosed few details about the microarchitecture, it did tout an Inter-Chip Link (ICL). The ICL supposedly enables nearly 95% scalability as customers can add more chips to solve larger problems. Intel also claimed that a rack of NNP-I chips will outperform a rack of NVIDIA’s T4 GPUs by nearly 4X, although I would note that this compares 32 Intel chips to only 20 T4 chips. While improved compute density is a good thing, more details will be required to properly assess the competitive landscape.

The NNP chips support all AI frameworks and benefit from the well-respected Nervana software stack. Intel also laid out its vision for the “One API” development environment, which will support Xeon CPUs, Nervana AI chips, FPGAs and future Xe GPUs. This software approach will be critical in helping Intel’s development community to optimize their code once for a broad range of devices.

Though details were scarce, Intel also announced its first data-center GPU at SC19, codenamed Ponte Vecchio. We know that Pointe Vecchio will go inside the Argonne National Labs Aurora exascale system in 2022, but we should see consumer versions sometime in 2020.

It is noteworthy that Intel sees a role for so many architectures for specific types of workloads, a strategy Intel calls “Domain-specific architectures.” The GPU can perform a wide variety of tasks, from traditional HPC to AI, while the Nervana chips are designed to train and query deep neural networks at extreme performance and efficiency. While some may say that Intel is taking a shotgun approach, fielding many architectures hoping to hit something, I believe the company is being smart. It is optimizing chips for specific tasks at a scale only Intel could array.

The Graphcore Intelligent Processing Unit (IPU)

Unicorn UK Startup Graphcore recently launched its IPC chip, complete with customers, partners, benchmarks and immediate availability. It is geared towards training and inference processing of AI neural networks, or any other computation that can be represented as a graph. Graphcore garnered financial and strategic backing from Dell, Microsoft and others, and announced availability of its Tensor Streaming Processor in both Dell servers and in the Microsoft Azure cloud. Customers testing early silicon include the European search engine Quant (image processing), Microsoft Azure (natural language processing), hedge fund manager Carmot Capital (Markov Chain Monte Carlo) and the Imperial College of London (robotic simultaneous location and mapping).

Graphcore’s architecture was designed for the most computationally challenging problems, using 1216 cores, 300 MB of in-processor memory at 45 TB/s and 80 IPU Links at 320GB/s. The company strategy is not to take on NVIDIA on every front, but rather to focus on those applications ideally suited to its architecture. Consequently, the benchmarks Graphcore published are relatively new in the industry; they have not yet published results for industry standard benchmarks such as Mlperf. In a conversation with CEO Nigel Toon last week, I was reassured that more standard benchmarks are forthcoming that will enable tuned apples-to-apples comparisons. That being said, the published benchmarks span several workloads and are quite impressive in both throughput and latency.

Leave a Reply