pdsw-DISCS 2018:

3Rd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems

Held in conjunction with SC18

Monday, November 12, 2018
Dallas, TX

Program Co-Chairs:

New York University

Amazon Web Services
General Chair:

Lawrence Livermore National Laboratory

Rangan Sukumar, Cray


Architectural Challenges Emerging from the Convergence of Big Data, High-Performance Computing and Artificial Intelligence


abstract: The convergence of Big Data, High-performance computing (HPC) and Artificial Intelligence (AI) is happening at several supercomputing facilities around the following use-cases: (i) Integration of large-scale experiments and compute resources (ii) Strong-Scaling AI/ML codes (iii) Leveraging ML models to steer simulation codes (iv) Analysis of tera/petabytes of scientific Big Data requiring HPC resources (v) Collating data and compute portals for the design of discovery-oriented workflows. While tremendous progress is being made towards accelerating the performance of these end-to-end workflows using several knobs (such as algorithmic cleverness, mathematical approximations, sub-precision processing, domain-specific accelerators, memory hierarchies etc.), new opportunities for architecting smarter storage, compute and analysis infrastructure emerge. Based on our experience from use-case instantiations, we present open architectural challenges around (i) non-traditional data (graphs, imagery, sequences, etc.); (ii) enabling combinatorial search and generalizability – critical for application of modern methods such as deep learning to science; (iii) the need for better programming models to implement data, ensemble and model parallelism (iv) productivity tools to minimize cost-to-insight and (v) I/O acceleration. These challenges reveal that real success of convergence along dimensions of both performance and productivity can be achieved via architecture design that is able to integrate flexibility in system-design from a facility perspective while accommodating workflow-specific heterogeneity from a end-user perspective.

bio: Rangan Sukumar is an artificial intelligence researcher who architects productive and performant data solutions. He serves as the Senior Analytics Architect in the CTO’s office at Cray Inc. His role is three-fold: (i) Analytics evangelist - Demonstrating what Big Data and HPC can do for data-centric organizations, (ii) Technology visionary - Designing the roadmap for analytic products through evaluation of customer requirements and aligning them with emerging hardware and software technologies, (iii) Solutions architect - Creating bleeding-edge solutions for scientific and enterprise problems in the long-tail of the Big Data market requiring scale and performance beyond what cloud computing offers. Before his role at Cray, he served as the founding group leader, data scientist and artificial intelligence/machine learning researcher scaling algorithms on unique super-computing infrastructures at the Oak Ridge National Laboratory. He has over 70 publications in areas of disparate data collection, organization, processing, integration, fusion, analysis and inference - applied to a wide variety of domains such as healthcare, social network analysis, electric grid modernization and public policy informatics. As an entrepreneur at heart, he also serves on several technical advisory boards and committees for several startups.