pdsw 2022:

7th International Parallel Data Systems Workshop


HELD IN CONJUNCTION WITH SC22: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS

In cooperation with: IEEE Computer Society


DATE: November 14, 2022
Kay Bailey Hutchison
Convention Center
Dallas, TX

Time: 1:30 PM - 5:00 pm (CST)
Room C148
SC Workshop page


 

Program Co-Chairs:

Hong Kong Baptist University, China


Oak Ridge National Laboratory, USA

Reproducibility Co-Chairs:


University of California, Santa Cruz


Leiden University, Netherlands
General Chair:

Riken, Japan

Publicity Chair:

Lawerence Berkeley National Laboratory, USA

Web & Publications Chair:

Carnegie Mellon University

submissions closed
abstract / cfp [updated] / submissions / WIP session
workshop registration / committees
PDSW22 Reproducability Addendum


Invited speaker


Arif Merchant, Google

Splinters - Distributed IO Sampling for Cloud Data Centers – Design and Applications
Splinters is a distributed system for sampling IO metadata in Google data centers. It ... is the main engine for the analysis of storage systems and workloads in Google. [more].


agenda

1:30pm - 1:31pm Organizers' Welcome & Introduction
Kento Sato, Amelie Chi Zhou, Bing Xie, Jay Lofstead
1:31pm - 1:40pm Opening Remarks
Kento Sato, Amelie Chi Zhou, Bing Xie, Jay Lofstead
Slides
INVITED TALK SESSION: Chair -- Kento Sato, RIKEN
1:40pm - 2:30pm Invited Speaker - Arif Merchant, Google
Splinters - Distributed IO Sampling for Cloud Data Centers – Design and Applications
Slides
SESSION 1: Chair -- Jay Lofstead, Sandia National Labs
2:30pm - 2:50pm Drishti: Guiding End-Users in the I/O Optimization Journey
Jean Luca Bez, Lawrence Berkeley National Laboratory (LBNL)
Hammad Ather, Lawrence Berkeley National Laboratory (LBNL)
Suren Byna, Lawrence Berkeley National Laboratory (LBNL)
Paper | Slides
2:50pm - 2:55pm (WiP) Revisit Data Partitioning in Data-Intensive Workflows
Radita Liem, RWTH Aachen University, IT Center
Shadi Ibrahim, French Institute for Research in Computer Science and Automation (INRIA)
Paper | Slides
2:55pm - 3:00pm (WiP) Data Lifecycles for Optimizing Data Movement
Hyungro Lee, Pacific Northwest National Laboratory (PNNL)
Jesun Firoz, Pacific Northwest National Laboratory (PNNL)
Nathan R. Tallent, Pacific Northwest National Laboratory (PNNL)
Meng Tang, Illinois Institute of Technology
Anthony Kougkas, Illinois Institute of Technology
Xian-He Sun, Illinois Institute of Technology
Paper | Slides
3:00pm - 3:30pm Afternoon Break
SESSION 2: Chair -- Margaret Lawson, Google
3:30pm - 3:50pm Performance Comparison of DAOS and Lustre for Object Data Storage Approaches
Nicolau Manubens Gil, European Centre for Medium-Range Weather Forecasts (ECMWF)
Simon Smart, European Centre for Medium-Range Weather Forecasts (ECMWF)
Tiago Quintino, European Centre for Medium-Range Weather Forecasts (ECMWF)
Adrian Jackson, Edinburgh Parallel Computing Centre (EPCC)
Paper | Slides
3:50pm - 4:10pm Accelerating Flash-X Simulations with Asynchronous I/O
Rajeev Jain, Argonne National Laboratory (ANL)
Houjun Tang, Argonne National Laboratory (ANL)
Akash Dhruv, Argonne National Laboratory (ANL)
Austin Harris, Oak Ridge National Laboratory (ORNL)
Suren Byna, Lawrence Berkeley National Laboratory (LBNL)
Paper | Slides
4:10pm - 4:15pm (WiP) Dask-Enabled External Tasks for In Transit Analytics
Amal Gueroudji, Atomic Energy and Alternative Energies Commission (CEA)
iJulien Bigot, Atomic Energy and Alternative Energies Commission (CEA)
Bruno Raffin, INRIA
Paper | Slides
SESSION 3: Chair -- Suren Byna, Lawrence Berkeley National Lab
4:15pm - 4:35pm DENKV: Addressing Design Trade-Offs of Key-Value Stores for Scientific Applications
Safdar Jamil, Sogang University, South Korea
Awais Khan, Oak Ridge National Laboratory (ORNL)
Kihyun Kim, Sogang University, South Korea
Jae-Kook Lee, Korea Inst. of Science and Technology Information (KISTI)
Dosil An, Korea Inst.of Science and Technology Information (KISTI)
Taeyoung Hong, Korea Inst. of Science and Technology Information (KISTI)
Sarp Oral, Oak Ridge National Laboratory (ORNL)
Youngjae Kim, Sogang University, South Korea
Paper | Slides
4:35pm - 4:55pm BTS: Exploring Effects of Background Task-Aware Scheduling for Key-Value CSDs
Yeohyeon Park, Sogang University, South Korea
Chang-Gyu Lee, Sogang University, South Korea
Seungjin Lee, Sogang University, South Korea
Inhyuk Park, SK hynix
Soonyeal Yang, SK hynix
Woosuk Chung, SK hynix
Youngjae Kim, Sogang University, South Korea
Paper | Slides
4:55pm - 5:00pm PDSW22 – Closing Remarks
Slides

WORKSHOP ABSTRACT


We are pleased to announce the 7th International Parallel Data Systems Workshop (PDSW’22). PDSW'22 will be hosted in conjunction with SC22: The International Conference for High Performance Computing, Networking, Storage and Analysis.

Efficient data storage and data management are crucial to scientific productivity in both traditional simulation-oriented HPC environments and Big Data analysis environments. This issue is further exacerbated by the growing volume of experimental and observational data, the widening gap between the performance of computational hardware and storage hardware, and the emergence of new data-driven algorithms in machine learning. The goal of this workshop is to facilitate research that addresses the most critical challenges in scientific data storage and data processing. PDSW will continue to build on the successful tradition established by its predecessor workshops: the Petascale Data Storage Workshop (PDSW, 2006-2015) and the Data Intensive Scalable Computing Systems (DISCS 2012-2015) workshop. These workshops were successfully combined in 2016, and the resulting joint workshop has attracted up to 38 full paper submissions and 140 attendees per year from 2016 to 2021.

We encourage the community to submit original manuscripts that:

  • introduce and evaluate novel algorithms or architectures,
  • inform the community of important scientific case studies or workloads, or
  • validate the reproducibility of previously published work

Special attention will be given to issues in which community collaboration is crucial for problem identification, workload capture, solution interoperability, standardization, and shared tools. We also strongly encourage papers to share complete experimental environment information (software version numbers, benchmark configurations, etc.) to facilitate collaboration.

Topics of interest include the following:

  • Scalable architectures for distributed data storage, archival, and virtualization
  • The application of new data processing models and algorithms towards scientific computing and analysis
  • Performance benchmarking, resource management, and workload studies
  • Enabling cloud and container-based models for scientific data analysis
  • Techniques for data integrity, availability, reliability, and fault tolerance
  • Programming models and big data frameworks for data intensive computing
  • Hybrid cloud/on-premise data processing
  • Cloud-specific data storage and transit costs and opportunities
  • Programmability of storage systems
  • Data filtering/compressing/reduction techniques
  • Parallel file systems, metadata management, and complex data management
  • Integrating computation into the memory and storage hierarchy to facilitate in-situ and in-transit data processing
  • Alternative data storage models, including object stores and key-value stores
  • Productivity tools for data intensive computing, data mining, and knowledge discovery
  • Tools and techniques for managing data movement among compute and data intensive components
  • Cross-cloud data management
  • Storage system optimization and data analytics with machine learning
  • Innovative techniques and performance evaluation for new memory and storage systems

CALL FOR PAPERS

 

Call for papers available now (pdf). [updated August 12, 2022]


Regular paper SUBMISSIONS

All papers will be evaluated by a competitive peer review process under the supervision of the workshop program committee. Selected papers and associated talk slides will be made available on the workshop web site. The papers will also be published by the IEEE Computer Society.

Authors of regular papers are strongly encouraged to submit Artifact Description (AD) Appendices that can help to reproduce and validate their experimental results. While the inclusion of the AD Appendices is optional for PDSW’22, submissions that are accompanied by AD Appendices will be given favorable consideration for the PDSW Best Paper award.

PDSW’22 follows the SC22 Reproducibility Initiative (see Addendum).. For Artifact Description (AD) Appendices, we will use the format of the SC22 for PDSW'22 submissions. The AD should include a field for one or more links to data (zenodo, figshare, etc.) and code (github, gitlab, bitbucket, etc.) repositories. For the Artifacts that will be placed in the code repository, we encourage authors to follow the guidelines of PDSW22 on how to structure the artifact, as it will make it easier to the reviewing committee and readers of the paper in the future.

Submit a not previously published paper as a PDF file, indicate authors and affiliations. Papers must be up to 5 pages, not less than 10 point font and not including references and optional reproducibility appendices. Papers must use the IEEE conference paper template.

Deadlines - Regular Papers and Reproducibility Study Papers

Submissions due: Aug. 20, 2022, 11:59 PM AoE
Submissions website: https://submissions.supercomputing.org/
Notification: Sep. 9, 2022
Copyright forms due: TBD
Slides due before workshop: TBD
Camera ready files due:
Sep. 30, 2022, 11:59 PM AoE


Work In Progress (WIP) Session


There will be a WIP session where presenters provide brief 5-minute talks on their on-going work, with fresh problems/solutions. WIP content is typically material that may not be mature or complete enough for a full paper submission and will not be included in the proceedings. A one-page abstract is required.

Deadlines - Work in Progress (WIP)

Work in Progress (WIP) submissions due: Sep. 16, 2022, 11:59PM AoE
Notification: On or before Sep. 23, 2022
Submissions website: https://submissions.supercomputing.org/


Workshop Registration

Registration opens July 13, 2022. To allow you to prepare, find details on registration pricing, and policies affecting registration changes and cancellations here on July 13.


PROGRAM COMMITTEE:

 

  • Jalil Boukhobza, University of Western Brittany, France
  • Suren Byna, Lawrence Berkeley National Laboratory
  • Yong Chen, Texas Tech University
  • Wei Der Chen, University of Edinburgh
  • Dong Dai, University of North Carolina at Charlotte
  • Matthieu Dorier, Argonne National Laboratory (ANL)
  • Bogdan Ghit, Databricks
  • Qian Gong, Oak Ridge National Laboratory
  • Luanzheng Guo, Pacific Northwest National Laboratory
  • Shadi Ibrahim, Inria ‪
  • Tanzima Islam, Texas State University
  • Youngjae Kim, Sogang University
  • Johann Lombardi, Intel Corporation
  • Xiaoyi Lu, University of California, Merced
  • Xiaosong Ma, Qatar Computing Research Institute
  • Kathryn Mohror, Lawrence Livermore National Laboratory
  • Diana Moise, Hewlett Packard Enterprise
  • Sarah Neuwirth, Habilitation Candidate at Goethe University
  • M. Mustafa Rafique, Rochester Institute of Technology
  • Raghunath Raja Chandrasekar, Stealth Startup
  • Michael Schöttner, Duesseldorf University
  • Vasily Tarasov, IBM Corporation
  • Qing Zheng, Los Alamos National Lab

STEERING COMMITTEE:

  • John Bent, Cray
  • Ali R. Butt, Virginia Tech
  • Philip Carns, Argonne National Laboratory
  • Shane Canon, Lawrence Berkeley National Laboratory
  • Raghunath Raja Chandrasekar, Amazon Web Services
  • Yong Chen, Texas Tech University
  • Evan J. Felix, Pacific Northwest National Laboratory
  • Gary Grider, Los Alamos National Laboratory
  • William D. Gropp, University of Illinois at Urbana-Champaign
  • Dean Hildebrand, Google
  • Shadi Ibraim, Inria, France
  • Dries Kimpe, KCG, USA
  • Glenn Lockwood, Lawrence Berkeley National Laboratory
  • Jay Lofstead, Sandia National Laboratories
  • Xiaosong Ma, Qatar Computing Research Institute, Qatar
  • Carlos Maltzahn, University of California, Santa Cruz
  • Suzanne McIntosh, New York University
  • Kathryn Mohror, Lawrence Livermore National Laboratory
  • Robert Ross, Argonne National Laboratory
  • Philip C. Roth, Oak Ridge National Laboratory
  • Kento Sato, Riken, Japan
  • John Shalf, NERSC, Lawrence Berkeley National Laboratory
  • Xian-He Sun, Illinois Institute of Technology
  • Rajeev Thakur, Argonne National Laboratory
  • Lee Ward, Sandia National Laboratories
  • Brent Welch, Google
  • Amelie Chi Zhou, Shenzhen University, China