pdsw-DISCS 2017:

2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems


HELD IN CONJUNCTION WITH SC17: THE INTERNATIONAL CONFERENCE
FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS.

Monday, November 13, 2017
Denver, CO

TiME: 9:00am - 6:00 pm

Location: ROOM 601
SC WORkshop page


Program Co-Chairs:

Lawrence Livermore National Laboratory


Google
General Chair:

Google

abstract / agenda / keynote speaker / cfp / submissions / WIP session / committees

keynote speaker

PDSW-DISCS17 is proud to announce that Denis Serenyi, Google, will be our keynote speaker. He will be discussing From GFS to Colossus: Cluster-Level Storage @ Google. Please see details here.




agenda

Information on scheduling will be added here as the event approaches.

8:50am – 9:00am Welcome & Introduction
9:00am – 10:00am Keynote SpeakerDenis Serenyi, Google
From GFS to Colossus: Cluster-Level Storage @ Google
Slides
10:00am – 10:30am Break
10:30am – 12:00pm SESSION 1: Improving Storage System Performance
Chair: Suren Byna, Lawrence Berkeley National Laboratory
  EMPRESS—Extensible Metadata PRovider for Extreme-scale Scientific Simulations
Margaret Lawson (Sandia National Laboratories and Darmouth College)
Jay Lofstead (Sandia National Laboratories)
Scott Levy (Sandia National Laboratories)
Patrick Widener (Sandia National Laboratories)
Craig Ulmer (Sandia National Laboratories)
Shyamali Mukherjee (Sandia National Laboratories)
Gary Templet (Sandia National Laboratories)
Todd Kordenbrock (DXC Technology)
Paper | Slides
  Taming Metadata Storms in Parallel Filesystems with MetaFS
Tim Shaffer (University of Notre Dame)
Douglas Thain (University of Notre Dame)
Paper | Slides
  Architecting HBM as a High Bandwidth, High Capacity, Self-Managed Last-Level Cache
Tyler Stocksdale (North Carolina State University)
Mu-Tien Chang (Samsung)
Hongzhong Zheng (Samsung Semiconductor Inc.)
Frank Mueller (NCSU)
Paper | Slides
11:45am – 12:00pm WIP SESSION 1
  Direct-FUSE: Removing the Middleman for High-Performance FUSE File System Support
Yue Zhu
Teng Wang
Kathryn Mohror
Adam Moody
Kento Sato
Muhib Khan
Weikuan Yu
Abstract | Slides
  Accelerating the Data Deduplication Performance with GPU in Hybrid Storage Systems
Prince Hamandawana
Awais Khan
Changgyu Lee
Sungyong Park
Youngjae Kim
Abstract | Slides
  NUMA-Aware Thread and Resource Scheduling for Terabit Data Movement
Taeuk Kim
Awais Khan
Youngjae Kim
Sungyong Park
Scott Atchley
Abstract | Slides
12:00pm – 1:30pm Lunch (not provided)
1:30pm – 3:00pm SESSION 2: Scalability of Storage Systems
Chair: Carlos Maltzahn, University of California, Santa Cruz
  Optimized Scatter/Gather Data Operations for Parallel Storage
Latchesar Ionkov (Los Alamos National Laboratory)
Carlos Maltzahn (University of California, Santa Cruz)
Michael Lang (Los Alamos National Laboratory)
Paper | Slides
  Software-Defined Storage for Fast Trajectory Queries using a DeltaFS Indexed Massive Directory
Qing Zheng (Carnegie Mellon University)
George Amvrosiadis (Carnegie Mellon University)
Saurabh Kadekodi (Carnegie Mellon University)
Garth A. Gibson (Carnegie Mellon University)
Charles D. Cranor (Carnegie Mellon University)
Bradley W. Settlemyer (Los Alamos National Laboratory)
Gary Grider (Los Alamos National Laboratory)
Fan Guo (Los Alamos National Laboratory)
Paper | Slides
  CoSS: Proposing a Contract-Based Storage System for HPC
Matthieu Dorier (Argonne National Laboratory)
Matthieu Dreher (Argonne National Laboratory)
Tom Peterka (Argonne National Laboratory)
Robert Ross (Argonne National Laboratory)
Paper | Slides
2:45pm – 3:00pm WIP SESSION 2
  mpiFileUtils: A Parallel and Distributed Toolset for Managing Large Datasets
Danielle Sikich
Giuseppe Di Natale
Matthew Legendre
Adam Moody
Abstract | Slides
  Resource Requirement Specification for Novel Data-aware and Workflow-enabled HPC Job Schedulers
Emmanouil Farsarakis
Iakovos Panourgias
Adrian Jackson
Juan F. R. Herrera
Michele Weiland
Mark Parsons
Abstract | Slides
  A Study of NVRAM Performance Variability under Concurrent I/O Accesses
Anthony Kougkas
Hariharan Devarajan
Xian-He Sun
Abstract | Slides
3:00pm – 3:30pm Break
3:30pm – 5:10pm SESSION 3: Understanding I/O Performance
Chair: Elsa Gonsiorowski, Lawrence Livermore National Laboratory
  Diving into Petascale Production File Systems through Large Scale Profiling and Analysis
Feiyi Wang (Oak Ridge National Laboratory)
Hyogi Sim (Oak Ridge National Laboratory)
Cameron Harr (Lawrence Livermore National Laboratory)
Sarp Oral (Oak Ridge National Laboratory)
Paper | Slides
  Performance Analysis of Emerging Data Analytics and HPC Workloads
Christopher Daley (Lawrence Berkeley National Laboratory)
Sudip Dosanjh (Lawrence Berkeley National Laboratory)
Prabhat (Lawrence Berkeley National Laboratory)
Nicholas Wright (Lawrence Berkeley National Laboratory)
Paper | Slides
  Toward Scalable Monitoring on Large-Scale Storage for Software Defined Cyberinfrastructure
Arnab K. Paul (Virginia Tech)
Ryan Chard (Argonne National Laboratory)
Kyle Chard (University of Chicago)
Steven Tuecke (University of Chicago)
Ali R. Butt (Virginia Tech)
Ian Foster (Argonne National Laboratory, University of Chicago)
Paper | Slides
  UMAMI: A Recipe for Generating Meaningful Metrics through Holistic I/O Performance Analysis
Glenn Lockwood (Lawrence Berkeley National Laboratory)
Shane Snyder (Argonne National Laboratory)
Wucherl Yoo (Lawrence Berkeley National Laboratory)
Kevin Harms (Argonne National Laboratory)
Zachary Nault (Argonne National Laboratory)
Suren Byna (Lawrence Berkeley National Laboratory)
Philip Carns (Argonne National Laboratory)
Nicholas Wright (Lawrence Berkeley National Laboratory)
Paper | Slides
5:10pm – 5:25pm Break
5:25pm – 6:00pm WIP SESSION 3
Chair
: Jay Lofstead, Sandia National Laboratories
  Exploring Use-cases for Non-Volatile Memories in support of HPC Resilience
Onkar Patil
Saurabh Hukerikar
Frank Mueller
Christian Engelmann
Abstract | Slides
  Evaluating Performance of Burst Buffer Models for Real-World Application Workloads in HPC Systems
Harsh Khetawat
Frank Mueller
Christopher Zimmer
Abstract | Slides
  Towards Structure-Aware Earth System Data Management
Jakob Lüttgau
Julian Kunkel
Bryan N. Lawrence
Abstract | Slides
  I/O Mini-apps, Compression, and I/O Libraries for Physics-based Simulations
Sean Ziegeler
Scot Breitenfeld
Jose Renteria
Jordan Henderson
Abstract | Slides
  Compiler-Assisted Scientific Workflow Optimization
Hadia Ahmed
Peter Pirkelbauer
Purushotham Bangalore
Anthony Skjellum
Abstract | Slides
  Micro-Storage Services for Open Ethernet Drive
Hariharan Devarajan
Anthony Kougkas
Xian-He Sun
Abstract | Slides
  Comprehensive Burst Buffer Evaluation
Eugen Betke
Julian Kunkel
Abstract | Slides
  Virtualized Big Data: Reproducing Simulation Output on Demand
Salvatore Di Girolamo
Pirmin Schmid
Thomas Schulthess
Torsten Hoefler
Abstract | Slides
  Establishing the IO-500 Benchmark
Julian Kunkel
John Bent
Jay Lofstead
George S. Markomanolis
Abstract | Slides
* = speaker

WORKSHOP ABSTRACT


(Find the complete proposal outlining the merger between PDSW and DISCS here.)

We are pleased to announce that the second Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems (PDSW-DISCS’17) will be hosted at SC17: The International Conference for High Performance Computing, Networking, Storage and Analysis.  The objective of this one day joint workshop is to combine two overlapping communities and to better promote and stimulate researchers’ interactions to address some of the most critical challenges for scientific data storage, management, devices, and processing infrastructure for both traditional compute intensive simulations and data-intensive high performance computing solutions.  Special attention will be given to issues in which community collaboration can be crucial for problem identification, workload capture, solution interoperability, standards with community buy­-in, and shared tools.

Many scientific problem domains continue to be extremely data intensive. Traditional high performance computing (HPC) systems and the programming models for using them such as MPI were designed from a compute-centric perspective with an emphasis on achieving high floating point computation rates. But processing, memory, and storage technologies have not kept pace and there is a widening performance gap between computation and the data management infrastructure. Hence data management has become the performance bottleneck for a significant number of applications targeting HPC systems.  Concurrently, there are increasing challenges in meeting the growing demand for analyzing experimental and observational data.  In many cases, this is leading new communities to look towards HPC platforms.  In addition, the broader computing space has seen a revolution in new tools and frameworks to support Big Data analysis and machine learning.  

There is a growing need for convergence between these two worlds.  Consequently, the U.S. Congressional Office of Management and Budget has informed the U.S. Department of Energy that new machines beyond the first exascale machines must address both the traditional simulation workloads as well as data intensive applications. This coming convergence prompts integrating these two workshops into a single entity to address the common challenges.

The scope of the proposed joint PDSW-DISCS workshop is summarized as:

  • Scalable storage architectures, archival storage, storage virtualization, emerging storage devices and techniques

  • Performance benchmarking, resource management, and workload studies from production systems including both traditional HPC and data-intensive workloads.

  • Programmability, APIs, and fault tolerance of storage systems

  • Parallel file systems, metadata management, and complex data management, object and key-value storage, and other emerging data storage/retrieval techniques

  • Programming models and frameworks for data intensive computing including extensions to traditional and nontraditional programming models, asynchronous multi-task programming models, or to data intensive programming models

  • Techniques for data integrity, availability and reliability especially

  • Productivity tools for data intensive computing, data mining and knowledge discovery

  • Application or optimization of emerging “big data” frameworks towards scientific computing and analysis

  • Techniques and architectures to enable cloud and container-based models for scientific computing and analysis

  • Techniques for integrating compute into a complex memory and storage hierarchy facilitating in situ and in transit data processing

  • Data filtering/compressing/reduction techniques that maintain sufficient scientific validity for large scale compute-intensive workloads

  • Tools and techniques for managing data movement among compute and data intensive components both solely within the computational infrastructure as well as incorporating the memory/storage hierarchy


CALL FOR PAPERS

 

CALL FOR PAPERS POSTER - Download to hang in your office

The Parallel Data Storage Workshop holds a peer reviewed competitive process for selecting short papers. Submit a not previously published short paper of up to 5 pages, not less than 10 point font and not including references, in a PDF file as instructed on the workshop web site. Submitted papers will be reviewed under the supervision of the workshop program committee. Submissions should indicate authors and affiliations. Final papers must not be no more than 5 pages long (not including references), not less than 10 point font, in PDF format. Selected papers and associated talk slides will be made available on the workshop web site; the papers will also be published in the digital library of the ACM or IEEE.


SUBMISSIONS

 

Deadlines

Submissions deadline: Paper (in pdf format) due September 1, 2017
                                        DEADLINE EXTENDED: September 7, 2017 - closed
Submissions website: https://easychair.org/conferences/?conf=pdswdiscs17
Notification: September 29, 2017
Camera ready and copyright forms due: October 10, 2017
Slides due before workshop: Sunday, November 12, 2017 to jdigney@cs.cmu.edu
* Submissions must be in the ACM sigconf format

Paper Submission Details:

The PDSW-DISCS Workshop holds a peer reviewed competitive process for selecting short papers. Submit a not previously published short paper of no more than 5 pages long (not including references), not less than 10 point font, in a PDF file as instructed on the workshop web site. Submitted papers will be reviewed under the supervision of the workshop program committee. Submissions should indicate authors and affiliations. Selected papers and associated talk slides will be made available on the workshop web site; the papers will also be published in the digital libraries of the IEEE and ACM.


Work-in-progress (WIP) Submissions


wip Submissions:

There will also be a WIP session at the workshop, where presenters give 5-minute brief talks on their on-going work, with fresh problems/solutions, but may not be mature or complete yet for paper submission. A 1-page abstract is required.

Please email your submission to pdswdiscs17@easychair.org

WIP Submission Deadline: November 1, 2017
WIP Notification: November 7, 2017


ATTENDING THE WORKSHOP

Please be aware that all attendees to the workshop, both speakers and participants, will have to pay the SC17 registration fee. Workshops are no longer included as part of the technical program registration.

To attend the workshop, please register through the Supercomputing '17 registration page. Registration opens in July.


PROGRAM COMMITTEE:

  • Kathryn Mohror, Lawrence Livermore National Laboratory, Program Co-Chair
  • Brent Welch, Google, Program Co-Chair
  • Janine Bennett, Sandia National Laboratories
  • Angela Demke Brown, University of Toronto
  • Suren Byna, Lawrence Berkeley National Laboratory
  • Shane Canon, Lawrence Berkeley National Laboratory
  • Raghunath Raja Chandrasekar, Amazon Web Services
  • Yong Chen, Texas Tech University
  • Toni Cortes, Universitat Politècnica de Catalunya
  • Garth Gibson, Carnegie Mellon
  • Elsa Gonsiorowski, Lawrence Livermore National Laboratory
  • Bingsheng He, National University of Singapore
  • Shadi Ibrahim, Inria
  • Dries Kimpe, KCG
  • Jay Lofstead, Sandia National Laboratories
  • Xiaosong Ma, Qatar Computing Research Institute
  • Carlos Maltzhan, University of California, Santa Cruz
  • Suzanne McIntosh, New York University
  • Sangmi Pallickara, Colorado State University
  • Rob Ross, Argonne National Labs
  • Philip C. Roth, Oak Ridge National Laboratory
  • Kento Sato, Lawrence Livermore National Laboratory

STEERING COMMITTEE:

  • John Bent, Cray
  • Ali R. Butt, Virginia Tech
  • Shane Canon, Lawrence Berkeley National Laboratory
  • Yong Chen, Texas Tech University
  • Evan J. Felix, Pacific Northwest National Laboratory
  • Garth A. Gibson, Carnegie Mellon University
  • William D. Gropp, University of Illinois at Urbana-Champaign
  • Gary Grider, Los Alamos National Laboratory
  • Dean Hildebrand, Google
  • Dries Kimpe, KCG, USA
  • Jay Lofstead, Sandia National Laboratories
  • Darrell Long, University of California, Santa Cruz
  • Xiaosong Ma, Qatar Computing Research Institute, Qatar
  • Carlos Maltzahn, University of California, Santa Cruz
  • Robert Ross, Argonne National Laboratory
  • Philip C. Roth, Oak Ridge National Laboratory
  • John Shalf, National Energy Research Scientific Computing Center,
    Lawrence Berkeley National Laboratory
  • Xian-He Sun, Illinois Institute of Technology
  • Rajeev Thakur, Argonne National Laboratory
  • Lee Ward, Sandia National Laboratories