pdsw-DISCS 2017:

2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems

held in conjunction with SC17

Monday, November 13, 2017
Denver, CO

Program Co-Chairs:

Lawrence Livermore National Laboratory

General Chair:


Denis Serenyi, google

From GFS to Colossus: Cluster-Level Storage @ Google

abstract: In the mid-2000s Google began to run into the limits of the Google Filesystem (GFS), and there was a realization that a fundamentally different distributed filesystem architecture would be needed to meet Google’s scaling and performance needs moving forward. Thus Colossus was born, which represents an order-of-magnitude leap forward both in metadata scalability and predictable performance. Today Colossus is the cluster-level storage system that is home to most of Google’s production data. It has facilitated a much higher degree of storage resource declustering vs GFS, which allows Google to mix a variety of workloads in its clusters and drive down storage costs. This talk provides an overview of Colossus’s architecture for managing metadata, followed by a deep-dive into how Colossus minimizes storage TCO (total cost of ownership) via data placement and rebalancing strategies. It concludes with a discussion of macro-storage trends and the implications for storage-intensive applications. [slides]

bio: Denis Serenyi has over 15 years of experience in the field of storage systems. Most recently, Denis has spent six years as contributor and technical lead for Google's Colossus distributed filesystem. He has made major contributions to improving the scalability, performance, and efficiency of Colossus, specializing in data placement algorithms, load balancing, data striping and encoding. Previous to his work at Google he was a software architect at Panasas, developing their object-based storage devices. Denis has a BA in Computer Science from Dartmouth College.