pdsw-DISCS 2016:

1st Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems

held in conjunction with SC16

monday, November 14, 2016
Salt Lake City, UT

Program Co-Chairs:

Lawrence Berkeley National Laboratory

General Co-Chairs:

Carnegie Mellon University

Texas Tech University

ion stoica, uC Berkeley

Trends and Challenges in Big Data Processing

: Almost six years ago we started the Spark project at UC Berkeley. Spark is a cluster computing engine that is optimized for in-memory processing, and unifies support for a variety of workloads, including batch, interactive querying, streaming, and iterative computations. Spark is now the most active big data project in the open source community, and is already being used by over one thousand organizations. In this talk, I'll take a look back at Spark's humble beginning, the lessons we learned, and its success as a unified system. Furthermore I'll outline the hardware and software trends, as well as challenges and the research opportunities. [slides - coming soon]

speaker bio: Ion Stoica is a Professor in the EECS Department at University of California at Berkeley. He does research on cloud computing and networked computer systems. Past work includes the Dynamic Packet State (DPS), Chord DHT, Internet Indirection Infrastructure (i3), declarative networks, replay-debugging, and multi-layer tracing in distributed systems. He is an ACM Fellow and has received numerous awards, including the SIGCOMM Test of Time Award (2011), and the ACM doctoral dissertation award (2001). In 2006, he co-founded Conviva, a startup to commercialize technologies for large scale video distribution, and in 2013, he co-founded Databricks a startup to commercialize Apache Spark.