pdsw-DISCS 2017:

2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems


held in conjunction with SC17

Monday, November 13, 2017
Denver, CO


Program Co-Chairs:

Lawrence Livermore National Laboratory


Google
General Chair:

Google

About the Joint PDSW-DISCS Workshop


Many scientific problem domains have become extremely data intensive. For instance, simulations that model the behavior of plasma in a tokamak fusion device or in the Earth’s magnetosphere can generate tens or even hundreds of terabytes of data during a single run. The Linear Coherent Light Source at the Stanford Linear Accelerator Laboratory produced over 2 PB in 2013 alone. Traditional high performance computing (HPC) systems and the programming models for using them such as MPI were designed from a compute-centric perspective with an emphasis on achieving high floating point computation rates. But processing, memory, and storage technologies have advanced at differing rates resulting in a widening performance gap between computation and the data management infrastructure. Hence data management has become the performance bottleneck for a significant number of applications targeting HPC systems.

The explosion of data processing systems using infrastructure like MapReduce has altered the storage and data management landscape feeding different data processing techniques back into traditional HPC data processing workflows for manipulating and exploring large data volumes. The Congressional Office of Management and Budget has informed the Department of Energy that new machines beyond the first exascale machines must address both the traditional simulation workloads as well as data intensive applications. This coming convergence prompts integrating these two workshops into a single entity to address the common coming challenges.

The scope of the proposed joint PDSW-DISCS workshop is summarized as:

  • Storage architectures, virtualization, emerging storage devices and techniques
  • Performance benchmarking, resource management, and workload study from production systems
  • Programmability, APIs, and fault tolerance of storage systems
  • Parallel file systems, metadata management, complex data management, object and key-value storage, and other emerging data storage/retrieval techniques.
  • System architectures interconnection networks, I/O, power efficiency for data intensive computing
  • Programming models for data intensive computing (extensions to traditional programming models or to data intensive programming models, non-traditional programming languages/models)
  • Runtime systems, inter-node and inter-system communication, data compression and de-duplication, caching and prefetching, and data integrity for data intensive computing
  • Productivity tools for data intensive computing, data mining and knowledge discovery tools, mathematical and statistical techniques, tools for performance, debugging, and administration
  • Techniques for integrating compute into a complex memory hierarchy facilitating in situ and in transit data processing avoiding I/O bottlenecks.
  • Data filtering/compressing/reduction techniques that maintain sufficient scientific validity for large scale compute-intensive workloads.
  • (If workflow is in scope from a data management perspective [it was for DISCS]) tools and techniques for managing data movement among compute and data intensive components both solely within the compute area as well as incorporating the memory/storage hierarchy
  • Data management support for emerging programming models such as Asynchronous Multi-Task programming models (e.g., Charm++ or Legion)