One of the big announcements at AMD’s Data Center event a few weeks ago was the announcement of its CDNA2-based computing accelerator, the Instinct MI250X. The MI250X uses two MI200 graphics computing arrays on TSMC’s N6 manufacturing node, along with four HBM2E modules per array, using a new ‘2.5D’ packaging design that uses a bridge between the array and substrate for high performance and low power connectivity. This is the GPU that enters Frontier, one of the American Exascale systems that will be included soon. At this week’s Supercomputing conference, HPE, under the HPE Cray brand, showcased one of those blades, along with a full MI250X front end shot. Many thanks to Patrick Kennedy of ServeTheHome for sharing these images and giving us permission to republish them.
The MI250X chip is a shimmery package in OAM form factor. OAM stands for OCP Accelerator Module, which was developed by the Open Compute Project (OCP) – a body for industry standards for servers and performance computing. And this is the standard for the form factor of the accelerators that partners use, especially when you pack a lot into the system. More precisely, eight of them.
This is a 1U half-blade, with two nodes. Each node is an AMD EPYC ‘Trento’ CPU (it’s a custom IO version of Milan that uses Infinity Fabric) paired with four MI250X accelerators. Everything is liquid cooled. AMD said the MI250X can go up to 560 W per accelerator, so eight of those plus two CPUs could mean this unit requires 5 kilowatts of power and cooling. If this is just a half-blade, then we are talking about a serious calculation and power density.
Each node looks relatively self-contained – the CPU on the right is not turned upside down since the rear pins of the socket are not visible, but it is also liquid cooled. What looks like four copper heat pipes, two on each side of the CPU, is actually a full 8-channel memory configuration. These servers have no power, but they get power from the integrated backplane into the rack.
The rear connectors look something like this. Each Frontier node rack will use HPE’s Slingshot interconnect fabric to extend to the entire supercomputer.
Systems like this are undoubtedly over-designed for sustainable reliability – so we have as much cooling as you can, enough phase power for a 560W accelerator, and even with this picture you can see those basic OAM motherboards easily connect in 16 layers, if not 20 or 24. For reference, a cheap consumer motherboard today can have only four layers, while motherboards for enthusiasts have 8 or 10, sometimes 12 for HEDT.
At a global press briefing, Keynote chairman and world-renowned HPC professor Professor Jack Dongarra suggested that Frontier is very close to launching to become one of the first exascale systems in the United States. He did not directly say that he would beat the supercomputer Aurora (Sapphire Rapids + Ponte Vecchio) for the title of the first, because he does not have the same insight into that system, but he sounded with the hope that Frontier will submit 1+ ExaFLOP achieved on the TOP500 list in June 2021.
Thank you very much Patrick Kennedy and ServeTheHome for permission to share his pictures.
Friendly communicator. Music maven. Explorer. Pop culture trailblazer. Social media practitioner.