pdsw 2020:

5th International Parallel Data Systems Workshop


HELD IN CONJUNCTION WITH SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS


DATE: November 12, 2020
VIRTUAL WORKSHOP

Time: 10:00 am - 6:30 PM (EST)

SC Workshop page


SC20 now virtual

Our workshop, as with SC20, will be held virtually this year. Updates and info may be found at https://sc20.supercomputing.org/attend/virtual-event-faq/

-- links to papers and talks now available --

please refer to this web page for all deadline information


Program Chair:

Inria, France

Program Vice Chair:

Riken, Japan


Publicity Chair:

Inria, France

Web & Publications Chair:

Carnegie Mellon University
General Chair:

Argonne National Laboratory, USA


Reproducibility Co-Chairs:

University of California, Santa Cruz


University of California, Santa Cruz

abstract / agenda / keynote speaker / cfp / submissions / WIP session
committees / author instructions / workshop registration


keynote speaker

PDSW 20 is proud to announce that Nitin Agrawal, ThoughtSpot, will be our keynote speaker. He will highlight potential challenges, recent work, and research opportunities, in designing systems to handle this discrepancy between data generation and access. Please watch for further details here.

Nitin Agrawal is a Principal Engineer at ThoughtSpot working on search infrastructure and in-memory databases. His past work experience is broadly in systems, with an emphasis on storage, mobile, and distributed systems, and has received multiple best–paper awards, led to commercial & academic impact, an outstanding patent award, and widespread media attention. He served as the program-committee chair for USENIX FAST ’18, HotStorage '16, and earned his doctorate in Computer Science from the University of Wisconsin - Madison. More info at http://pages.cs.wisc.edu/~nitina/.


agenda

 

This schedule is also available on the SC20 Website.

10:00am – 10:07am Welcome & Introduction
Slides
10:07am – 11:00am Keynote Speaker - Nitin Agrawal, ThoughtSpot
Sink or Swim: How Not to Drown in Colossal Streams of Data?
Slides
11:00am - 11:05am Break
Session 1 - Chair: Yong Chen, Texas Tech University
11:05am - 11:32am Keeping It Real: Why HPC Data Services Don't Achieve I/O Microbenchmark Performance
Philip Carns* (Argonne National Laboratory)
Kevin Harms (Argonne National Laboratory)
Bradley W. Settlemyer (Los Alamos National Laboratory)
Brian Atkinson (Los Alamos National Laboratory)
Robert B. Ross (Argonne National Laboratory)
Paper | Slides
11:32am - 11:55am Towards On-Demand I/O Forwarding in HPC Platforms
Jean Luca Bez* (Federal University of Rio Grande do Sul)
Francieli Z. Boito (University of Bordeaux, Inria, CNRS, Bordeaux-INP)
Alberto Miranda (Barcelona Supercomputing Center)
Ramon Nou (Barcelona Supercomputing Center)
Toni Cortes (Barcelona Supercomputing Center, Polytechnic University of Catalonia)
Philippe O. A. Navaux (Federal University of Rio Grande do Sul)
Paper | Slides
11:55am - 12:10pm Break
Session 2 - Chair: Jay Lofstead, Sandia National Laboratory
12:10pm – 12:35pm Gauge: An Interactive Data-Driven Visualization Tool for HPC Application I/O Performance Analysis
Eliakin del Rosario* (Texas A&M University)
Mikaela Currier* (Texas A&M University)
Mihailo Isakov (Texas A&M University)
Sandeep Madireddy (Argonne National Laboratory)
Prasanna Balaprakash (Argonne National Laboratory)
Philip Carns (Argonne National Laboratory)
Robert B. Ross (Argonne National Laboratory)
Kevin Harms (Argonne National Laboratory)
Shane Snyder (Argonne National Laboratory)
Michel A. Kinsy (Texas A&M University)
Paper | Slides
12:35pm - 12:58pm Fractional-Overlap Declustered Parity: Evaluating Reliability for Storage Systems
Huan Ke* (University of Chicago)
Haryadi S. Gunawi (University of Chicago)
Dominic Manno (Los Alamos National Laboratory)
David Bonnie (Los Alamos National Laboratory)
Bradley W. Settlemyer (Los Alamos National Laboratory)
Paper | Slides
12:58pm - 1:27 pm GPU Direct I/O with HDF5
John Ravi* (North Carolina State University)
Suren Byna (Lawrence Berkeley National Laboratory)
Quincey Koziol (Lawrence Berkeley National Laboratory)
Paper | Slides
1:27pm - 2:30pm Break
Session 3 - Chair: Suren Byna, Lawrence Berkeley National Laboratory
2:30pm - 2:57pm Emulating I/O Behavior in Scientific Workflows on High Performance Computing Systems
Fahim Chowdhury* (Florida State University)
Yue Zhu (Florida State University)
Francesco Di Natale (Lawrence Livermore National Laboratory)
Adam Moody (Lawrence Livermore National Laboratory)
Elsa Gonsiorowski (Lawrence Livermore National Laboratory)
Kathryn Mohror (Lawrence Livermore National Laboratory)
Weikuan Yu (Florida State University)
Paper | Slides
2:57pm - 3:23pm Pangeo Benchmarking Analysis: Object Storage vs. POSIX File System
Haiying Xu* (National Center for Atmospheric Research)
Kevin Paul (National Center for Atmospheric Research)
Anderson Banihirwe (National Center for Atmospheric Research)
Paper | Slides
3:23pm - 3:46pm Fingerprinting the Checker Policies of Parallel File Systems
Runzhou Han* (Iowa State University)
Duo Zhang (Iowa State University) Mai Zheng (Iowa State University)
Paper | Slides
3:46pm - 4:00pm Break
WIP Session - Chair: Jay Lofstead, Sandia National Laboratory
4:00pm - 4:05pm Deriving Storage Insights from the IO500
Luke Logan (Illinois Institute of Technology)
Jay Lofstead (Sandia National Laboratory )
Anthony Kougkas (Illinois Institute of Technology)
Xian-He Sun (Illinois Institute of Technology)
Abstract | Slides
4:05pm - 4:09pm I/O Traces of HPC Applications
Chen Wang (University of Illinois at Urbana-Champaign)
Kathryn Mohror (Lawrence Livermore National Laboratory)
Marc Snir (University of Illinois at Urbana-Champaign)
Abstract | Slides
4:09pm - 4:14pm Scalable Communication and Data Persistence Layer for NVM-based Storage Systems
Hiroki Ohtsuji (Fujitsu Laboratories Ltd)
Takuya Okamoto (Fujitsu Ltd.)
Erika Hayashi (Fujitsu Laboratories Ltd)
Eiji Yoshida (Fujitsu Laboratories Ltd)
Abstract | Slides
4:14pm - 4:25pm Live Q&A with speakers
4:25pm - 4:40pm Closing Remarks
4:40pm - 5:40pm Virtual Happy Hour: Live discussion open to all Workshop Attendees
https://meet.google.com/jef-brsy-zpd

WORKSHOP ABSTRACT


We are pleased to announce the 5th International Parallel Data Systems Workshop (PDSW’20). PDSW'20 will be hosted in conjunction with SC20: The International Conference for High Performance Computing, Networking, Storage and Analysis.

Efficient data storage and data management are crucial to scientific productivity in both traditional simulation-oriented HPC environments and Big Data analysis environments. This issue is further exacerbated by the growing volume of experimental and observational data, the widening gap between the performance of computational hardware and storage hardware, and the emergence of new data-driven algorithms in machine learning.

The goal of this workshop is to facilitate research that addresses the most critical challenges in scientific data storage and data processing. We therefore encourage the community to submit original manuscripts that:

  • introduce and evaluate novel algorithms or architectures,
  • inform the community of important scientific case studies or workloads, or
  • validate the reproducibility of previously published work

Special attention will be given to issues in which community collaboration is crucial for problem identification, workload capture, solution interoperability, standardization, and shared tools. We also strongly encourage papers to share complete experimental environment information (software version numbers, benchmark configurations, etc.) to facilitate collaboration.

Topics of interest include the following:

  • Scalable architectures for data storage, archival, and virtualization
  • Performance benchmarking, resource management, and workload studies
  • Programmability of storage systems
  • Parallel file systems, metadata management, and complex data management
  • Alternative data storage models, including object stores and key-value stores
  • Programming models and frameworks for data intensive computing
  • Techniques for data integrity, availability, reliability, and fault tolerance
  • Productivity tools for data intensive computing, data mining, and knowledge discovery
  • Application of emerging big data frameworks towards scientific computing and analysis
  • Enabling cloud and container-based models for scientific data analysis
  • Data filtering/compressing/reduction techniques
  • Tools and techniques for managing data movement among compute and data intensive components
  • Integrating computation into the memory and storage hierarchy to facilitate in-situ and in-transit data processing

CALL FOR PAPERS

 

CALL FOR PAPERS - updated with new submission deadlines!


Regular paper SUBMISSIONS

All papers will be evaluated by a competitive peer review process under the supervision of the workshop program committee. Selected papers and associated talk slides will be made available on the workshop web site. The papers will also be published by the IEEE TCHPC.

Authors of regular papers are strongly encouraged to submit Artifact Description (AD) Appendices that can help to reproduce and validate their experimental results. While the inclusion of the AD Appendices is optional for PDSW’20, submissions that are accompanied by AD Appendices will be given favorable consideration for the PDSW Best Paper award. PDSW’20 follows the SC20 reproducibility and transparency initiative. For Artifact Description (AD) Appendices, we will use the format of the SC20 for PDSW'20 submissions. The AD should include a field for one or more links to data (zenodo, figshare, etc.) and code (github, gitlab, bitbucket, etc.) repositories. For the Artifacts that will be placed in the code repository, we encourage authors to follow the PDSW'20 Artifact Packaging Guidelines on how to structure the artifact, as it will make it easier to the reviewing committee and readers of the paper in the future.

Submit a not previously published paper as a PDF file, indicate authors and affiliations. Papers must be up to 5 pages, not less than 10 point font and not including references and optional reproducibility appendices. Papers must use the IEEE conference paper template available at: https://www.ieee.org/conferences/publishing/templates.html.

Deadlines

Submissions deadline: Paper (in pdf format) due Sept. 6, 2020, 11:59 PM AoE
Submissions website: https://submissions.supercomputing.org/
Notification: Sep. 28, 2020
Copyright forms due: Oct. 18, 2020 - note new date
Pre-recorded presentation due: Oct. 7, 2020, 11:59 PM AoE
Slides due before workshop: OCt. 7, 2020, 11:59 PM AoE - note new date
Camera ready files due:
Oct. 18, 2020, 11:59 PM AoE - note new date
* Submissions must be in the IEEE conference format


Work In Progress (WIP) Session


There will be a WIP session where presenters provide brief (5-minute) talks on their on-going work, with fresh problems/solutions. WIP content is typically material that may not be mature or complete enough for a full paper submission and will not be included in the proceedings. Authors are invited to submit a one-page abstract or a 5-minutes pre-recorded presentation (voice over PowerPoint). Please use the IEEE conference paper template when preparing the one-page abstract. Feel free to condense the author list contents to offer more space.

Deadlines

Work in Progress (WIP) submissions due: Sep. 29, 2020, 11:59 PM AoE - note new date
Notification: On or before Oct. 1, 2020
Pre-recorded presentation due: Oct. 7, 2020, 11:59 PM AoE - note new date

Submissions by email: Please email your submission as a PDF attachment of the one-page abstract OR as a link to your pre-recorded presentation to and . Put "PDSW 2020 WIP" as the first part of the message subject. To verify your submission a reply will be made indicating official submission. If you do not receive such an email within 2 hours of the above deadline, please forward the original submission again.


Workshop Registration

To attend the workshop, please register through the Supercomputing '20 registration page. Registration opens August 21, 2020.


PROGRAM COMMITTEE:

  • Olivier Beaumont, Inria, France
  • Jalil Boukhobza, University of Western Brittany, France
  • Suren Byna, Lawrence Berkeley National Laboratory, USA
  • Raghunath Raja Chandrasekar, Amazon Web Services, USA
  • Yong Chen, Texas Tech University, USA
  • Yue Cheng, George Mason University, USA
  • Jason Cope, Data Direct Networks, USA
  • Toni Cortes, Universitat Politècnica de Catalunya, Spain
  • Matthieu Dorier, Argonne National Laboratory, USA
  • Lisa Gerhardt, Lawrence Berkeley National Laboratory, USA
  • Elsa Gonsiorowski, Lawrence Livermore National Laboratory, USA
  • Bingsheng He, National University of Singapore, Singapore
  • Johann Lombardi, Intel Corporation, USA
  • Xiaoyi Lu, Ohio State University, USA
  • Xiaosong Ma, Qatar Computing Research Institute, Qatar
  • Diana Moise, Hewlett Packard Enterprise, Switzerland
  • Anna Queralt, BSC, Spain
  • Brad Settlemyer, Los Alamos National Laboratory, USA
  • Xuanhua Shi, Huazhong University of Science and Technology, China
  • Vasily Tarasov, IBM Corporation, USA
  • Osamu Tatebe, University of Tsukuba, Japan
  • Amelie Chi Zhou, Shenzhen University, China

STEERING COMMITTEE:

  • Jay Lofstead, Sandia National Laboratories - Chair
  • Dean Hildebrand, Google - Vice-Chair
  • John Bent, Seagate
  • Ali R. Butt, Virginia Tech
  • Philip Carns, Argonne National Laboratory
  • Shane Canon, Lawrence Berkeley National Laboratory
  • Raghunath Raja Chandrasekar, Amazon Web Services
  • Yong Chen, Texas Tech University
  • Evan J. Felix, Pacific Northwest National Laboratory
  • Gary Grider, Los Alamos National Laboratory
  • William D. Gropp, University of Illinois at Urbana-Champaign
  • Dries Kimpe, KCG, USA
  • Glenn Lockwood, Lawrence Berkeley National Laboratory
  • Xiaosong Ma, Qatar Computing Research Institute, Qatar
  • Carlos Maltzahn, University of California, Santa Cruz
  • Suzanne McIntosh, New York University
  • Kathryn Mohror, Lawrence Livermore National Laboratory
  • Robert Ross, Argonne National Laboratory
  • Philip C. Roth, Oak Ridge National Laboratory
  • John Shalf, NERSC, Lawrence Berkeley National Laboratory
  • Xian-He Sun, Illinois Institute of Technology
  • Rajeev Thakur, Argonne National Laboratory
  • Lee Ward, Sandia National Laboratories
  • Brent Welch, Google