International Parallel Data Systems Workshop

pdsw 2019:

4th International Parallel Data Systems Workshop

HELD IN CONJUNCTION WITH SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS

Monday, November 18, 2019
Room 601

Colorado Convention Center
Denver, CO

Time: 9:00am - 5:30 pm

Location: Room 601

SC Workshop page

Program Co-Chairs:

Lawrence Berkeley National Laboratory

Argonne National Laboratory

Publicity Chair:

EPCC

Work-In-Progress Chair:

Sandia National Laboratory

Web & Publications Chair:

Carnegie Mellon University

General Chair:

New York University,
Courant Institute of Mathematical Sciences Center for Data Science

Vice General Chair:

Sandia National Laboratory

Reproducibility Co-Chairs:

University of California, Santa Cruz

University of California, Santa Cruz

abstract / agenda / keynote speaker / cfp / submissions / reproducibility / WIP session
committees / author instructions / workshop registration

keynote speaker

PDSW 19 is proud to announce that Haoyuan (H.Y.) Li, Alluxio, will be our keynote speaker. He will be discussing Alluxio - Data Orchestration for Analytics and AI in the Cloud. Please watch for further details here.

Haoyuan (H.Y.) Li is the Founder, Chairman, and CTO of Alluxio. He holds a PhD in computer science from UC Berkeley’s AMPLab, where he co-created the Alluxio (formerly Tachyon) open source data orchestration system, co-created Apache Spark Streaming, and became an Apache Spark founding committer. He also holds an MS from Cornell University and a BS from Peking University, both in computer science.

agenda

Information on scheduling is available here.

9:00am – 9:10am	Welcome & Introduction Slides
9:10am – 10:00am	Keynote Speaker - Haoyuan (H.Y.) Li, Alluxio Alluxio - Data Orchestration for Analytics and AI in the Cloud Slides
10:00am – 10:30am	Break
Session 1 - Chair: Yong Chen, Texas Tech University
10:30am - 10:55am	In Search of a Fast and Efficient Serverless DAG Engine *Benjamin Carver (George Mason University) Jingyuan Zhang (George Mason University) Ao Wang (George Mason University) Yue Cheng (George Mason University) Paper \| Slides
10:55am - 11:20am	Enabling Transparent Asynchronous I/O using Background Threads *Houjun Tang (Lawrence Berkeley National Laboratory) Quincey Koziol (Lawrence Berkeley National Laboratory) Suren Byna (Lawrence Berkeley National Laboratory) John Mainzer (HDF Group) Tonglin Li (Lawrence Berkeley National Laboratory) Paper \| Slides
11:20am - 11:40am	WIP Session 1
	Exploiting Different Storage Types with the Earth-System Data Middleware *Julian Kunkel, University of Reading Luciana Pedro, University of Reading Bryan Lawrence, University of Reading Sandro Fiore, CMCC Foundation Huang Hua, Seagate Technology LLC Abstract \| Slides
	Improving I/O Performance of HPC Application Using Intra-Job Scheduling *Arnab K. Paul, Virginia Tech Olaf Faaland, Lawrence Livermore National Laboratory Adam Moody, Lawrence Livermore National Laboratory Elsa Gonsiorowski, Lawrence Livermore National Laboratory Kathryn Mohror, Lawrence Livermore National Laboratory Ali R. Butt, Virginia Tech Lawrence Livermore National Laboratory Abstract \| Slides
	I/O Characteristics of Scientific Applications *Chen Wang, University of Illinois at Urbana-Champaign Adam Moody, Lawrence Livermore National Laboratory Elsa Gonsiorowski, Lawrence Livermore National Laboratory Kathryn Mohror, Lawrence Livermore National Laboratory Marc Snir, University of Illinois at Urbana-Champaign Abstract \| Slides
	fs123: A Scalable, Read-only Network Filesystem *John K. Salmon, D. E. Shaw Research Michael Fenn, D. E. Shaw Research Mark A. Moraes, D. E. Shaw Research David E. Shaw, D. E. Shaw Research Abstract \| Slides
Session 2 - Chair: Ron Oldfield, Sandia National Laboratories
11:40am – 12:05pm	Active Learning-based Automatic Tuning and Prediction of Parallel I/O Performance Megha Agarwal (Indian Institute of Technology Kanpur) Divyansh Singhvi (Indian Institute of Technology Kanpur) Preeti Malakar (Indian Institute of Technology Kanpur) *Suren Byna (Lawrence Berkeley National Laboratory) Paper \| Slides
12:05pm - 12:30pm	Applying Machine Learning to Understand Write Performance of Large-scale Parallel Filesystems *Bing Xie (Oak Ridge National Laboratory) Zilong Tan (Carnegie Mellon University) Philip Carns (Argonne National Laboratory) Jeff Chase (Duke University) Kevin Harms (Argonne National Laboratory) Jay Lofstead (Sandia National Laboratories) Sarp Oral (Oak Ridge National Laboratory) Sudharshan Vazhkudai (Oak Ridge National Laboratory) Feiyi Wang (Oak Ridge National Laboratory) Paper \| Slides
12:30pm - 2:00 pm	Lunch Break
2:00 - 2:40 pm	PANEL A House Divided: Why Don't Cloud Storage and HPC Storage Share More Technology? Presenters: Philip Carns, Glenn K. Lockwood Panelists: Brent Welch, Raghu Raja, Evan Burness Panel Notes \| Panel Slides
2:40pm - 3:00pm	WIP Session 2
	Discoverable Metadata for System Monitoring Data *S. Leak, NERSC A. Greiner, NERSC A. Gentile, NERSC J. Brant, NERSC Abstract \| Slides
	Integrating I/O Measurement into Performance Optimisation and Productivity (POP) Metrics *Radita Tapaning Hesti Liem, RWTH Aachen University Abstract \| Slides
	Profiling Composable HPC Data Services *Srinivasan Ramesh, University of Oregon Philip Carns, Argonne National Laboratory Robert Ross, Argonne National Laboratory Shane Snyder, Argonne National Laboratory Allen Malony, University of Oregon Abstract \| Slides
	Semi-Automatic Assessment of IO Behavior *Eugen Betke, DKRZ Julian Kunkel, University of Reading Abstract \| Slides
3:00pm - 3:30pm	Break
Session 3 - Chair: Brad Settlemyer, LANL
3:30pm - 3:55pm	Towards Physical Design Management in Storage Systems Kathryn Dahlgren (University of California, Santa Cruz) *Jeff LeFevre (University of California, Santa Cruz) Ashay Shirwadkar (University of California, Riverside) Ken Iizawa (Fujitsu Laboratories Ltd.) Aldrin Montana (University of California, Santa Cruz) Peter Alvaro (University of California, Santa Cruz) Carlos Maltzahn (University of California, Santa Cruz) Paper \| Slides
3:55pm - 4:20pm	A Foundation for Automated Placement of Data Douglas Otstott (Arizona State University) Sean Williams (New Mexico Consortium) *Latchesar Ionkov (Los Alamos National Laboratory) Michael Lang (Los Alamos National Laboratory) Ming Zhao (Arizona State University) Paper \| Slides
4:20pm - 4:45pm	Profiling Platform Storage Using IO500 and Mistral Nolan Monnier (Oral Roberts University, Sandia National Laboratories) Jay Lofstead (Sandia National Laboratories) Margaret Lawson (Sandia National Laboratories, University of Illinois) Matthew Curry (Sandia National Laboratories) Paper \| Slides
4:45pm - 5:10pm	Understanding Data Motion in the Modern HPC Data Center Glenn K. Lockwood (Lawrence Berkeley National Laboratory) Shane Snyder (Argonne National Laboratory) Suren Byna (Lawrence Berkeley National Laboratory) Philip Carns (Argonne National Laboratory) Nicholas J. Wright (Lawrence Berkeley National Laboratory) Paper \| Slides
5:10pm - 5:30pm	WIP Session 3
	Understanding Performance Bottlenecks to Improve Parallel Efficiency of Louvain Algorithm *Naw Safrin Sattar, University of New Orleans Shaikh Arifuzzaman, University of New Orleans Abstract \| Slides
	Unifying Tradeoffs in Parallel Data Systems *Xiaotian Yin, Futurewei Technologies Jian Li, Futurewei Technologies Tingqiu Tim Yuan, Huawei Technologies Abstract \| Slides
	Scalable Data Processing at Network transfer rates with nCorium Compute in Memory Modules Suresh Devalapalli, nCorium Brett Neuman, Los Alamos National Laboratory Arvindh Lalam, nCorium Abstract \| Slides
	Mitigating the Impact of Tail Latency of Storage Systems on Scalable Deep Learning *Hiroki Ohtsuji, Fujitsu Laboratories Ltd. Erika Hayashi, Fujitsu Laboratories Ltd. Naoto Fukumoto, Fujitsu Laboratories Ltd. Eiji Yoshida, Fujitsu Laboratories Ltd. Takuya Okamoto Fujitsu Ltd. Takeru Kuramoto, University of Tsukuba Osamu Tatebe, University of Tsukuba Abstract \| Slides

WORKSHOP ABSTRACT

We are pleased to announce the 4th International Parallel Data Systems Workshop (PDSW’19). PDSW'19 will be hosted in conjunction with SC19: The International Conference for High Performance Computing, Networking, Storage and Analysis.

Efficient data storage and data management are crucial to scientific productivity in both traditional simulation-oriented HPC environments and Big Data analysis environments. This issue is further exacerbated by the growing volume of experimental and observational data, the widening gap between the performance of computational hardware and storage hardware, and the emergence of new data-driven algorithms in machine learning.

The goal of this workshop is to facilitate research that addresses the most critical challenges in scientific data storage and data processing. We therefore encourage the community to submit original manuscripts that:

introduce and evaluate novel algorithms or architectures,
inform the community of important scientific case studies or workloads, or
validate the reproducibility of previously published work

Special attention will be given to issues in which community collaboration is crucial for problem identification, workload capture, solution interoperability, standardization, and shared tools. We also strongly encourage papers to share complete experimental environment information (software version numbers, benchmark configurations, etc.) to facilitate collaboration.

Topics of interest include the following:

Scalable architectures for data storage, archival, and virtualization
Performance benchmarking, resource management, and workload studies
Programmability of storage systems
Parallel file systems, metadata management, and complex data management
Alternative data storage models, including object stores and key-value stores
Programming models and frameworks for data intensive computing
Techniques for data integrity, availability, reliability, and fault tolerance
Productivity tools for data intensive computing, data mining, and knowledge discovery
Application of emerging big data frameworks towards scientific computing and analysis
Enabling cloud and container-based models for scientific data analysis
Data filtering/compressing/reduction techniques
Tools and techniques for managing data movement among compute and data intensive components
Integrating computation into the memory and storage hierarchy to facilitate in-situ and in-transit data processing

CALL FOR PAPERS

CALL FOR PAPERS - now available

Regular paper SUBMISSIONS

All papers will be evaluated by a competitive peer review process under the supervision of the workshop program committee. Selected papers and associated talk slides will be made available on the workshop web site. The papers will also be published by the IEEE TCHPC.

Authors are also strongly encouraged to automate the reproducibility and validation of their experimental results. Submissions that are accompanied by URLs to resources that allow reviewers to repeat automatic validation will be given favorable consideration for the PDSW Best Paper award. The PDSW reproducibility initiative will do their best to provide infrastructure and resources to support automated reproducibility and validation. PDSW reviewers, while appreciative, might not be able to validate non-automated artifact descriptions and evaluations included in (optional) reproducibility appendices. Read detailed information on the PDSW reproducibility initiative (bit.ly/pdsw-automatic).

Submit a not previously published paper as a PDF file, indicate authors and affiliations. Papers must be between 6 and 10 pages long including references, but not including optional reproducibility appendices. Papers must use the IEEE conference paper template available at: https://www.ieee.org/conferences/publishing/templates.html.

Deadlines

Submissions deadline: Paper (in pdf format) due ~~Sep. 1, 2019, 11:59 PM AoE~~
SUBMISSION CLOSED

Submissions website: https://submissions.supercomputing.org/
Notification: September 29, 2019
Copyright forms due: TBD
Camera ready files due: TBD
Slides due before workshop: November 10, 2019 to jdigney@cs.cmu.edu
* Submissions must be in the IEEE conference format

Work In Progress Session

There will be a WIP session where presenters provide brief (5-minute) talks on their on-going work, with fresh problems/solutions. WIP content is typically material that may not be mature or complete enough for a full paper submission and will not be included in the proceedings. A one-page abstract is required. Please use the IEEE conference paper template. Feel free to condense the author list contents to offer more space.

Deadlines

Work in Progress (WIP) submissions due: Nov. 3, 2019, 11:59 PM AoE WIP
Notification: Nov. 10, 2019
Submissions by email: Please email Jay Lofstead all submissions at gflofst@sandia.gov. Put "PDSW 2019 WIP" as the first part of the message subject. To verify your submission a reply will be made indicating official submission. If you do not receive such an email within 2 hours of the above deadline, please forward the original submission again.

Workshop Registration

To attend the workshop, please register through the Supercomputing '19 registration page. Registration opens July 11, 2019.

PROGRAM COMMITTEE:

Yong Chen, Texas Tech University
Yue Cheng, George Mason University
Jason Cope, DDN Storage
Stratos Efstathiadis, New York University
Lisa Gerhardt, Lawrence Berkeley National Laboratory
Elsa J. Gonsiorowski, Lawrence Livermore National Laboratory
Jian Huang, University of Illinois
Shadi Ibrahim, French Institute for Research in Computer Science and Automation (INRIA)
Sidharth Kumar, University of Alabama
Julian Kunkel, University of Reading
Johann Lombardi, Intel Corporation
Xiaoyi Lu, Ohio State University
Pierre Matri
Ron Oldfield, Sandia National Laboratories
Sangmi Pallickara, Colorado State University
Vasily Tarasov, IBM
Osamu Tatebe, University of Tsukuba
Gala Yadgar, Technion - Israel Institute of Technology
Amelie Chi Zhou, Shenzhen University

STEERING COMMITTEE:

John Bent, Seagate
Ali R. Butt, Virginia Tech
Shane Canon, Lawrence Berkeley National Laboratory
Raghunath Raja Chandrasekar, Amazon Web Services
Yong Chen, Texas Tech University
Evan J. Felix, Pacific Northwest National Laboratory
Gary Grider, Los Alamos National Laboratory
William D. Gropp, University of Illinois at Urbana-Champaign
Dean Hildebrand, Google
Dries Kimpe, 3 Red Partners
Jay Lofstead, Sandia National Laboratories
Xiaosong Ma, Qatar Computing Research Institute, Qatar
Carlos Maltzahn, University of California, Santa Cruz
Suzanne McIntosh, New York University
Kathryn Mohror, Lawrence Livermore National Laboratory
Robert Ross, Argonne National Laboratory
Philip C. Roth, Oak Ridge National Laboratory
John Shalf, Lawrence Berkeley National Laboratory
Xian-He Sun, Illinois Institute of Technology
Rajeev Thakur, Argonne National Laboratory
Lee Ward, Sandia National Laboratories
Brent Welch, Google

pdsw

pdsw 2019:

4th International Parallel Data Systems Workshop

HELD IN CONJUNCTION WITH SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS

Monday, November 18, 2019
Room 601

Colorado Convention Center
Denver, CO

Time: 9:00am - 5:30 pm

Location: Room 601

SC Workshop page

keynote speaker

agenda

WORKSHOP ABSTRACT

CALL FOR PAPERS

CALL FOR PAPERS - now available

Regular paper SUBMISSIONS

Deadlines

Work In Progress Session

Deadlines

Workshop Registration

PROGRAM COMMITTEE:

STEERING COMMITTEE:

pdsw '26

past pdsw events

past discs events

pdsw

pdsw 2019:

4th International Parallel Data Systems Workshop

HELD IN CONJUNCTION WITH SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS

Monday, November 18, 2019 Room 601

Colorado Convention Center Denver, CO

Time: 9:00am - 5:30 pm

Location: Room 601 SC Workshop page

keynote speaker

agenda

WORKSHOP ABSTRACT

CALL FOR PAPERS

CALL FOR PAPERS - now available

Regular paper SUBMISSIONS

Deadlines

Work In Progress Session

Deadlines

Workshop Registration

PROGRAM COMMITTEE:

STEERING COMMITTEE:

pdsw '26

past pdsw events

past discs events

Monday, November 18, 2019
Room 601

Colorado Convention Center
Denver, CO

Location: Room 601

SC Workshop page