With the rise of artificial intelligence and machine learning applications, researchers are increasingly demanding the amount of computation for deep learning and training. To meet this intensive computing requirement, a considerable cluster of servers is often required. At the Hot Chips conference earlier this year, we saw the Nervana NNP-T Spring Crest chip from Intel. By acquiring Nervana, Intel acquired the IP needed to build a “large training chip.”
(Image via AnandTech)
The chip is known to be based on TSMC’s 16nm process, complemented by CoWoS and four layers of HBM2 memory, and covers an area of 680 mm2. Earlier this week, Supermicro showed off its latest Nervana NNP-T server at the Supercomputing Conference.
These hardware are based on PCIe expansion cards, and you can imagine that they were previously designed to accommodate the GPUs of traditional servers. It features a typical 2P layout that inserts eight expansion cards into a 4U chassis and communicates with each other.
Each chip has a total bidirectional bandwidth of 3.58 Tbps, while off-chip connections support scalability of up to 1024 nodes. From the 8-pin auxiliary power supply per PCIe card, its peak power should be at the standard 225W.
Later this week, Supermicro informed that it had been approved to present the 8-way OAM (OCP Accelerator Module) version of the series of servers. It maintains communication between chips through the PCB on the bottom plate, rather than a bridge such as a traditional PCIe card-to-card.
This allows a large amount of air to circulate between the expansion cards to dissipate heat and is compatible with the modular OCP standard. As Intel’s first chip to support bfloat16 deep learning training, up to 119 TOPs per chip.
There is also a 60MB on-chip memory and 24 dedicated “tensor” processor clusters, which have a dual 32-by-32 matrix multiplication array. The chip has 27 billion transistors with a core frequency of 1.1 GHz, supplemented by 32GB of HBM2-2400 memory.
Technically, PCIe connections can be upgraded to Gen 4.0 x16, but the Intel Commerce CPU supports this feature. It has been suggested that some customers are upgrading their head nodes from 2P to 4P (Facebook uses 8P) to expand this calculation.
Supermicro’s statement states that its NNP-T system is ready to be deployed to facilitate deep learning and training.