In addition to Intel, AMD, Baodao Taiwan’s VIA will also build x86 processor, do not know how many people know? CenTaur, VIA’s 24-year-old processor development arm, recently developed the world’s first x86 processor with an AI coprocessor, with a working prototype, and began chip testing in September.
The new processor is manufactured using TSMC 16nm process, the core area is no more than 195 square millimeters, the internal ring bus design, in series integrated eight x86 CPU core, 16MB shared three-stage cache, four-channel DDR4-3200 memory controller, PCIe 3.0 controller (44), South Bridge and IO features are a complete SoC.
The highlight is the AI coprocessor “NCORE”, which occupies an area of about 34.4 square millimeters (17.6%), is mapped to PCI devices, supports the acceleration of DNN deep neural network creation and training, and is said to provide up to 20TB/s memory bandwidth and performance of 20 trillion AI operations per second.
The main frequency can work at 2.5GHz and supports the AVX-512 instruction set, which is not available in the AMD Zen 2 architecture.
CHA processor core diagram
Cha processor module sketch
In recent days, Centaur has released many architectural details of the processor, but it is interesting not to publish it self-published, but from The Linley Group, the California processor technology authority and publisher of the prestigious chip magazine Microprocessor Report. The latter studied Centaur’s processor architecture design documentation and interviewed the relevant designers to present the report.
Linley Gwennap, editor-in-chief of Microprocessor Report magazine, said of the newly designed x86 processor: “Centaur’s high-profile return to the x86 market has brought an innovative processor design with eight high-performance CPU cores, A custom deep learning accelerator (DLA). This is the industry’s first server processor design that integrates DLA. The new accelerator NCore’s neural network sits even better than the most powerful and does not require expensive external GPU computing card assistance. “
The Linley Group revealed that Centaur’s new x86 microarchitecture, called “CNS”, was designed to be higher than traditional PC processors, decode 4 x86 instructions per clock cycle, perform 10 microdrills in parallel, and temporarily name the first processor “CHA.” The aI coprocessor INT8 integer has peak performance of up to 20TOPS (20 trillion operations per second).
CNS Microarchitecture Diagram
NCore AI Coprocessor Architecture Diagram
Linley Group measures the AI performance of the x86 processor based on authoritative MLPerf performance tests, and found that the AI inference performance of the Centaur CHA processor is equivalent to 23 world-class Intel x86 cores. And the latter must be a VNNI vector neural network instruction that supports 512 bits. In fact, Intel doesn’t have a real 32 core product yet.
The Centaur AI coprocessor is designed to resemble the SIMD (single-instruction multi-data) concept of the VNNI instruction, but with 16MB of dedicated memory and 20TB/s bandwidth, each clock cycle can handle 32768 bits of data, and with reasoning to a dedicated AI coprocessor, x86 cores can be safely performed with other common tasks.
Centaur also provides developers with new algorithms that take advantage of the unmatched ultra-low inference latency of the Centaru AI coprocessor and work closely with the x86 CPU core.
At the ISC East conference in New York State, Centaur also debuted the CHA processor, and in addition to traditional AI applications such as video analysis, real-time object detection and classification, it also showcased cutting-edge applications such as semantic segmentation (pixel-level image classification), human attitude estimation (short stroke) and so on. It’s an eye-opener.
Centaur is currently improving hardware performance and software efficiency to optimize the new platform, and the new processor is expected to go into production in the second half of next year.
Linley Group’s detailed report can be downloaded here
Comparison of different CPU architectures