pdsw-DISCS 2017:2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems
|
Program Co-Chairs: Lawrence Livermore National Laboratory |
General Chair: |
pdsw-DISCS 2017:2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems
|
Program Co-Chairs: Lawrence Livermore National Laboratory |
General Chair: |
abstract: In the mid-2000s Google began to run into the limits of the Google Filesystem (GFS), and there was a realization that a fundamentally different distributed filesystem architecture would be needed to meet Google’s scaling and performance needs moving forward. Thus Colossus was born, which represents an order-of-magnitude leap forward both in metadata scalability and predictable performance. Today Colossus is the cluster-level storage system that is home to most of Google’s production data. It has facilitated a much higher degree of storage resource declustering vs GFS, which allows Google to mix a variety of workloads in its clusters and drive down storage costs. This talk provides an overview of Colossus’s architecture for managing metadata, followed by a deep-dive into how Colossus minimizes storage TCO (total cost of ownership) via data placement and rebalancing strategies. It concludes with a discussion of macro-storage trends and the implications for storage-intensive applications. [slides]
bio: Denis Serenyi has over 15 years of experience in the field of storage systems. Most recently, Denis has spent six years as contributor and technical lead for Google's Colossus distributed filesystem. He has made major contributions to improving the scalability, performance, and efficiency of Colossus, specializing in data placement algorithms, load balancing, data striping and encoding. Previous to his work at Google he was a software architect at Panasas, developing their object-based storage devices. Denis has a BA in Computer Science from Dartmouth College.