ORNL completes the first phase of the Jaguar-Titan supercomputer transition
ORNL’s Jaguar supercomputer has completed the first phase of an upgrade that will keep it among the most powerful scientific computing systems in the world. Acceptance testing for the upgrade was completed in February. The testing suite included leading scientific applications focused on molecular dynamics, high temperature superconductivity, nuclear fusion, and combustion.
Jaguar, manufactured by Cray Inc., is operated by the Oak Ridge Leadership Computing Facility. Even before this month’s upgrade to 3.3 petaflops, it was among the world’s most powerful supercomputers, capable of 2300 trillion calculations each second, or 2.3 petaflops. The same number of calculations would take an individual working at a rate of one per second more than 70 million years.
When the upgrade process is completed this autumn, the system will be renamed “Titan” and will be capable of at least 20 petaflops.
Users have had access to Jaguar throughout the upgrade process. “During our upgrade, we have kept our users on Jaguar every chance we get,” said Jack Wells, director of science at the OLCF. “We have already seen the positive impact on applications, for example in computational fluid dynamics, from the doubled memory.”
The DOE Office of Science–funded project, which was concluded ahead of schedule, upgraded Jaguar’s AMD Opteron cores to the newest 6200 series and increased their number by a third, from 224,256 to 299,008. Two six-core Opteron processors were removed from each of Jaguar’s 18,688 nodes and were replaced with a single 16-coreprocessor. At the same time, the system’s interconnect was updated, and its memory was doubled to 600 terabytes.
In addition, 960 of Jaguar’s 18,688 compute nodes now contain an NVIDIA graphical processing unit. The GPUs were added to the system in anticipation of a much larger GPU installation later in the year. The GPUs act as accelerators, giving researchers a serious boost in computing power in a far more energy-efficient system.
“Applications that were squeezing onto our earlier nodes can now make full use of the 16-core processor and increased memory. Doubling the memory can have a dramatic impact on application workflow,” Wells said.
“The new Gemini interconnect is much more scalable,” Wells added, “helping applications like molecular dynamics that have demanding network communication requirements.”
GPUs will add a level of parallelism to the system and will allow Titan to reach 20 petaflops or more within the same space as Jaguar and with essentially the same power requirements. While the Opteron processors have 16 cores and are therefore able to carry out 16 computing tasks simultaneously, the GPUs will be able to tackle hundreds of computing tasks at the same time.
With nearly 1000 GPUs now available, researchers will have an opportunity to optimize their applications for the accelerated Titan system.
“This is going to be an exciting year in Oak Ridge as our users take advantage of our new architecture and get ready for the new NVIDIA Kepler GPUs in the fall,” Wells said. “A lot of work by many people is beginning to pay off.” —Leo Williams