Maintaining a Leadership Pace

Materials Science

Computational Chemistry

Computational Fluid Dynamics

Quantum Chromo-dynamics (QCD)

Geosciences

Global Climate Change

**
A possible layout of the computer cabinets for a 5-teraflops massively parallel computer. Such a machine will be able to perform 5 trillion arithmetic calculations per second. By comparison, the Intel Paragon XP/S 150 at ORNL can perform 150 billion arithmetic calculations per second.**

**
**

**T**he confluence of ever-accelerating scientific expectations and technical innovation is dramatically extending the limits of computational power. Indeed, “limit” may be an inappropriate word here. Machines now running exceed a teraflops (1 trillion FLoating point OPerations per Second, or 1 TF), 10-TF machines are being planned, and the path to petaflops (1000 TF) is being defined. So, given the remarkable rate of change in the computing realm, we must talk in terms of current technological capabilities, as well as near-term design objectives with a well-defined time horizon in describing computational capabilities. It is clear, though, that planning for machines in the range of 5 to 10 TF in the 1998–1999 time frame is an appropriate target; this is the target for which we are aiming.

**
The Center for Computational Sciences (CCS) at ORNL has been one of the world's leaders in computational power over the last several years. However, contracts in place for systems at DOE Defense Programs (DP) laboratories (Los Alamos National Laboratory and Lawrence Livermore National Laboratory) provide for systems extending to 3 TF, or well beyond the roughly 200 gigaflops (GF, or billion flops) in the CCS. The applications emphasis for these DP machines is assurance of the reliability of the U.S. nuclear stockpile in an era of no weapons testing, which basically defines DOE's Accelerated Strategic Computing Initiative (ASCI).
**

**
Computation, modeling, and simulation requirements for the DOE Energy Research (ER) community also
extend into the multiple-TF realm. Providing a substantial component of this computational capability and establishing routine TF-level distributed computing are the foci of this ORNL initiative. Given the striking accomplishments of the CCS in bringing its Intel Paragons to very high levels of productivity by
**

connecting the two largest CCS Paragons over high-speed asynchronous transfer mode (ATM) OC-12 networks to provide a machine with peak performance near 200 GF and

taking the distributed computing lead with Sandia National Laboratories (SNL) by solving huge problems through linking the Paragons at the two sites over ATM networks,

the CCS clearly represents the development environment required to bring a multiple-TF machine to optimal effectiveness for an extensive range of ER applications. This last point is of prime importance because the ER requirements extend across a very wide spectrum.

A striking array of challenges awaits TF-level machines. Meaningful modeling of mechanical behavior becomes possible, including static and dynamic properties of dislocations and dislocation arrays, radiation effects, and complex phenomena such as stress-corrosion cracking. Further, proper modeling of mechanical behavior can lead to structural integrity and an understanding of aging far surpassing current levels. Multiple-component-alloy design studies with the objective of maximizing strength-to-weight ratios will be key to achieving fuel efficiency goals for automobiles. Investigations of the performance of these alloys in simulated harsh environments will lead to alloys providing improved longevity and safety when operating in such environments and will also extend the range of operating conditions for many systems. Modeling of computer processor and memory chips with ever-decreasing feature sizes will require highly sophisticated semiconductor models incorporating accurate thermodynamic capabilities to deal properly with heat generation and dissipation strategies.

The complex nature of subsurface biological, physical, and chemical processes has, in the last decade, prompted extensive development of computer models to simulate and predict the fate of environmentally toxic materials in one of the most valuable resources of this nation—groundwater. The addition of valid bio-physicochemical processes to current models of this highly heterogeneous system necessitates computational systems in the TF range. Such systems will enable the incorporation of scientifically validated subsurface flow and transport processes in multiphase fluids and fractured geological formations and promote the inclusion of phenomena such as in-situ bioremediation and molecular-level geochemical kinetics. Perhaps even more significant are the large-scale water resources management strategies that we would be able to establish given TF-level computing. Risk assessment and decision analysis could be carried out for a time scale sufficiently long to provide scientifically sound and technologically plausible solutions, thereby providing invaluable support for those making environmental and safety decisions where financial realities compel priority setting.

In all of these application areas, and more that could be included, important problems already identified require computational power in the range of 5 to 10 TF and even more for some areas. Processing power is but one factor in the specification of the requisite balanced system. For a 5-TF system, additional requirements include

- computer memory—5 terabytes
- data storage capacity—7 petabytes
- input-output (I/O) capacity—200 gigabits/s

Further, the CCS partnership with SNL has shown the way to solving problems of extraordinary size and complexity through linking supercomputers over high-speed networks. To extend this strategy into the multiple-TF regime, an essential expectation in our view, will require networks operating in the range of 10 gigabits/sec, roughly OC-192 requirements.

The extension of computational capabilities into the range described here does require a very high degree of sensitivity to the tenets of scalability. To illustrate the scope of the proposed system, we show a possible layout of the computer cabinets for a 5-TF machine.

Hardware does not a 5-TF computer make. Software challenges abound. Operating systems, I/O software, communications software and protocols, visualization systems, and network interfaces, together with applications software, must all work together with the hardware in solving problems. Indeed, it is here in the software realm that many of the most difficult scalability issues will emerge.

It is obvious that there are major challenges in designing, developing, and ensuring near-optimum performance from a system in the 5-TF range. However, the contributions this system would provide to the ER science/technology programs and to meeting ever-expanding expectations of these programs demand that these challenges be faced. Furthermore, the partnership between ORNL and SNL that has brought distributed computing to unprecedented levels of capability should be encouraged to extend these capabilities into the TF realm. With the Intel Teraflops machine now being assembled at SNL (the design goal is 1.8 TF) and the proposed TF-level machine at ORNL, this extension is assured.—*Kenneth L. Kliewer, director of the Center for Computational Sciences at ORNL*