4th Parallel Data Storage Workshop

held in conjunction with
Supercomputing '09

Chair: Garth Gibson, CMU

Sunday, November 15, 2009
9:00 a.m. - 5:30 p.m.
Oregon Convention Center, Portland, OR

SC09 Workshop Web Page

ACM Digital Library Proceedings

workshop abstract

Petascale computing infrastructures make petascale demands on information storage capacity, performance, concurrency, reliability, availability, and manageability. This one-day workshop focuses on the data storage problems and emerging solutions found in petascale scientific computing environments, with special attention to issues in which community collaboration can be crucial, problem identification, workload capture, solution interoperability, standards with community buy-in, and shared tools. This workshop seeks contributions on relevant topics, including but not limited to: performance and benchmarking results and tools, failure tolerance problems and solutions, APIs for high performance features, parallel file systems, high bandwidth storage architectures, wide area file systems, metadata intensive workloads, autonomics for HPC storage, virtualization for storage systems, data-intensive and cloud storage,archival storage advances, resource management innovations, etc.


All papers presented at this workshop are also online at the ACM Digital Library (table of contents of the procedings).

8:55am - 9:00am
Welcome - Garth Gibson, Workshop Chair
9:00am - 10:00am
SESSION 1: Data-Intensive Cluster Storage
  Mixing Hadoop and HPC Workloads on Parallel Filesystems
Esteban Molina-Estolano, Carlos Maltzahn, Scott Brandt, University of California Santa Cruz, Maya Gokhale, John May, Lawrence Livermore Nat Lab, John Bent, Los Alamos Nat Lab
Paper | Slides

DiskReduce: RAID for Data-Intensive Scalable Computing
Bin Fan, Wittawat Tantisiriroj, Lin Xiao, Garth Gibson, Carnegie Mellon University
Paper | Slides
10:00am - 10:30am
POSTER SESSION 1 - List of participants and links to posters
10:30am - 12:30pm
SESSION 2: Patterns in Petascale Storage Access
  Data Layout Optimization for Petascale File Systems
Xian-He Sun, Yong Chen, Yanlong Yin, Illinois Institute of Technology
Paper | Slides

Case Studies in Storage Access by Loosely Coupled Petascale Applications
Justin M. Wozniak, Michael Wilde, Argonne Nat Lab
Paper | Slides

...And eat it too: High read performance in write-optimized HPC I/O middleware file formats
Milo Polte, Garth Gibson, Carnegie Mellon University, Jay Lofstead, Karsten Schwan, Matthew Wolf, Georgia Institute of Technology, John Bent, Meghan Wingate, Los Alamos Nat Lab, Scott A. Klasky, Qing Liu, Norbert Podhorszki, Oak Ridge Nat Lab, Manish Parashar, Rutgers University
Paper | Slides

Scalable I/O Tracing and Analysis
Karthik Vijayakumar, Frank Mueller, Xiaosong Ma, North Carolina State University, Philip C. Roth, Oak Ridge Nat Lab.
Paper | Slides
12:30pm - 2:00pm
2:00pm - 3:00pm
SESSION 3: Integrating Enterprise Storage Features
  pNFS, POSIX, and MPI-IO: A Tale of Three Semantics
Dean Hildebrand, Roger Haskin, IBM Almaden Research Center, Arifa Nisar, Northwestern University
Paper | Slides

Uncovering Errors: The Cost of Detecting Silent Data Corruption
Sumit Narayan, John A. Chandy, University of Connecticut, Samuel Lang, Philip Carns, Robert Ross, Argonne Nat Lab
Paper | Slides
3:00pm - 3:30pm
POSTER SESSION 2 - List of participants and links to posters
3:30pm - 4:30pm
SESSION 4: Integrating Databases
  Fusing Data Management Services with File Systems
Scott Brandt, Carlos Maltzahn, Neoklis Polyzotis, Wang-Chiew Tan, University of California, Santa Cruz
Paper | Slides

Using the Active Storage Fabrics Model to Address Petascale Storage Challenges

Blake G. Fitch, Aleksandr Rayshubskiy, Michael C. Pitman, Robert S. Germain, IBM T.J. Watson Research Center, T.J. Christopher Ward, IBM Software Group Hursley Park
Paper | Slides
4:30pm - 5:00pm
Short Announcements (sign up onsite) & Town Hall Meeting
5:00pm - 5:30pm
POSTER SESSION 3 - List of participants and links to posters



Garth A. Gibson, Carnegie Mellon University and Panasas Inc.
Darrell Long, University of California, Santa Cruz
Peter Honeyman, University of Michigan, Ann Arbor,
    Center for Information Technology Integration
Gary A. Grider, Los Alamos National Laboratory
John Shalf, National Energy Research Scientific Computing Center,
    Lawrence Berkeley National Laboratory
Philip C. Roth, Oak Ridge National Laboratory
Evan J. Felix, Pacific Northwest National Laboratory
Lee Ward, Sandia National Laboratory
Rob Ross, Argonne National Laboratory
Karsten Schwan, Georgia Institute of Technology
William T. C. Kramer, National Center for Supercomputing Applications