Supercomputing '08 Panel

Exa and Yotta Scale Data:
Are We Ready?

Panel Chair: William Kramer, NERSC

Friday, November 21, 2008
10:30 am - 12:00 pm
Austin Convention Center, Austin, Texas


panel abstract

Soon after Teraflops, HPC facilities were handling Peta-Byte data. The challenges of Exa-byte and Yotta-byte data will be a significant, possibly dominate, limiter on productivity of HPC users.This panel will address the question "Is the HPC community ready for Exa Byte data?" and will discuss challenges of Yotta-bytes.

Moderator:

William Kramer (Lawrence Berkeley National Laboratory)
Exa and Yotta Scale Data [PDF - 495K]

Panelists:

Garth Gibson (Carnegie Mellon University)
Exa & Yotta Scale Data [PDF - 2.6M]

Keith Gray (BP)

Rob Farber (Pacific Northwest National Laboratory)
Exa- to Yotta-scale Data An Optimistic View [PDF - 145K]

Gary Grider (Los Alamos National Laboratory)
Exa-Yotta-yotta-yotta… For Checkpoint Only [PDF - 200K]

Specific topics included:

  • How to maintain performance and reliability while storage systems exponentially increase in size, files and components
  • How to provide information lifecycle management across loosely integrated storage subsystems
  • How to enhance the ability to locate and use exa bytes of data to the right place at the right time

Petascale computing infrastructures for scientific discovery make petascale demands on information storage capacity, performance, concurrency, reliability, availability, and manageability. The last decade has shown that parallel file systems can barely keep pace with high performance computing along these dimensions; this poses a critical challenge when petascale requirements are considered. The Petascale Data Storage Institute (http://www.pdsi-scidac.org/) focuses on data storage problems found in petascale computing environments and leverages experience in applications and diverse file and storage systems expertise of its members, the institute allows a group of researchers to collaborate extensively on developing requirements, standards, algorithms, and development and performance tools.

The Drive to Exa-scale Computing means faster computers need more data, faster:

  • Data movement at Petabytes/sec
  • Exa and Yotta byte sized files (100,000,000s of Library of Congress equivalents)
  • Trillions of files

Data Challenges at this level include:

  • Scaling file system speeds and feeds
  • Scalable interoperable interfaces and protocols
  • Automating data distribution and fault mitigation
  • Enumerate and search metadata of trillions of files
  • Assuring reliability and consistencies for storage that is composed hundreds of millions of components
  • Software reliability of multi-layer, highly scaled, loosely coupled storage systems of the future

SUMMARY

There are significant issues regarding Large Scale System integration that are not being addressed in other forums, such as the current research portfolios or vendor user groups. The impact of less-than-optimal integration technology means the time required to deploy, integrate and stabilize a large scale system may consume up to 25 percent of the useful life of such systems. Therefore, improving the state of the art for large scale systems integration has potential for great impact by increasing the scientific productivity of these systems.