# A Study of NVRAM Performance Variability under Concurrent I/O Accesses

Anthony Kougkas, Hariharan Devarajan, and Xian-He Sun Illinois Institute of Technology, Department of Computer Science, Chicago, IL {akougkas, hdevarajan}@hawk.iit.edu, sun@iit.edu

*Index Terms*—Performance Variability, Performance Modelling, Benchmarking, NVRAM, NVME, SSD, Non-Volatile

#### I. INTRODUCTION

odern HPC applications generate massive amounts of data. However, the improvement in the speed of diskbased storage systems has been much slower than that of memory, creating a significant I/O performance gap [1], [2]. To reduce the performance gap, the storage subsystem is going through extensive changes, by adding multiple levels of memory and storage in a hierarchy [3]. Newly emerging hardware technologies such as High Bandwidth Memory (HBM), Non-Volatile RAM (NVRAM), Solid-State Drives (SSD), and dedicated shared buffering nodes (e.g., burst buffers) have been also introduced to alleviate this issue [4], [5]. Several new supercomputers employ such low latency devices to deal with the burstiness of I/O [6], [7], reducing the peak I/O requirements for external storage [8]. For example, Cori system at the National Energy Research Scientific Computing Center (NERSC) [9], uses CRAY's Datawarp technology [10]. Los Alamos National Laboratory Trinity supercomputer [11] uses burst buffers with a 3.7 PB capacity and 3.3 TB/s bandwidth. Summit in Oak Ridge National Lab will also employ fast NVMe storage for buffering, based on the first developer machine already deployed [12]. NERSC demonstrated [13] an improvement of 60% performance on balanced usage over applications not using burst buffer acceleration. However, they also stated that when two compute nodes share a burst buffer node, then their accesses compete for bandwidth which resulted in significant degradation in performance for both job. This phenomenon is even stronger for data-intensive applications which spend significantly more time in I/O. As multiple layers of storage are introduced into HPC systems, the complexity of data movement among the layers increases significantly, making it harder to take advantage of the highspeed or lowlatency storage systems [14].

### II. OUR APPROACH

In this study, we aim to explore and uncover any performance variability of NVRAM devices. The difference of the medium (i.e., flash-based vs spinning drives) dictates different access concurrency, device bandwidth and latency, sensitivity to random access, and other performance variabilities such as garbage collection and data fragmentation.

#### A. Experimental Environment

As our testbed we use Chameleon systems [15]. Specifically, we used the bare metal configuration on the storage hierarchy nodes that have several storage devices, NVRAM included. Table 1 demonstrates the specifications of each device used. Even though this is an exploration of how NVRAM handles concurrent accesses, we included all devices as a comparison.

| Table 1: Device specifications |                    |                   |                   |                        |                        |
|--------------------------------|--------------------|-------------------|-------------------|------------------------|------------------------|
| Device                         | RAM                | NVRAM             | SSD               | HDD fast               | HDD                    |
| Model                          | M386A4G40DM0       | Intel DC<br>P3700 | Intel DC<br>S3610 | Seagate<br>ST600MP0005 | Seagate<br>ST9250610NS |
| Connection                     | DDR4 2133Mhz       | PCIe Gen3<br>x8   | SATA 6Gb/s        | 12Gb/s SAS             | SATA 6Gb/s             |
| Capacity                       | 512<br>GB(32GBx16) | 1 TB              | 1.6 TB            | 600 GB                 | 2.4 TB                 |
| Latency                        | 13.5 ns            | 20 us             | 55-66 us          | 2 ms                   | 4.16 ms                |
| RPM                            | -                  | -                 | -                 | 15000                  | 7200                   |
| Buffer                         | -                  | -                 | -                 | 128 MB                 | 64 MB                  |

Table 1: Device specifications

As our driver program, we used our own synthetic benchmark. Each process writes 64MB requests in a file-perprocess pattern. We increase the number of concurrent processes while the total I/O remains 2GB (i.e., weak-scaling). We define a new metric, *medium-sensitivity*, as the rate at which each storage medium experiences bandwidth reduction due to concurrent access:

Medium-Sensitivity=(#Processes/#Lanes)\*((MaxBW-RealBW)/MaxBW)

#### B. Initial Results

As it can in Figure 1, the NVRAM demonstrated sensitivity very close to the main memory. Specifically, for write operations, RAM has sensitivity value of 0.43, NVRAM has a value of 3.1 whereas the traditional drives 30 and 31 respectively. Same trends can be seen for read operations. These results are only the first step towards a more detailed study on performance variability of NVRAM we plan to do.



*Figure 1: Performance Variability (left figure - Write, right figure – Read)* 

## REFERENCES

[1] DONG, BIN, XIUQIAO LI, LIMIN XIAO, AND LI RUAN. "A NEW FILE-SPECIFIC STRIPE SIZE SELECTION METHOD FOR HIGHLY CONCURRENT DATA ACCESS." IN *GRID COMPUTING (GRID), 2012 ACM/IEEE 13th International Conference on*, pp. 22-30. IEEE, 2012.

[2] SHOSHANI, ARIE, AND DORON ROTEM, EDS. SCIENTIFIC DATA MANAGEMENT: CHALLENGES, TECHNOLOGY, AND DEPLOYMENT. CRC PRESS, 2009.

[3] BENT, JOHN, GARY GRIDER, BRETT KETTERING, ADAM MANZANARES, MEGHAN MCCLELLAND, AARON TORRES, AND ALFRED TORREZ. "STORAGE CHALLENGES AT LOS ALAMOS NATIONAL LAB." IN MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST), 2012 IEEE 28TH SYMPOSIUM ON, PP. 1-5. IEEE, 2012.

[4] A. M. CAULFIELD, L. M. GRUPP, AND S. SWANSON, "GORDON: USING FLASH MEMORY TO BUILD FAST, POWER-EFFICIENT CLUSTERS FOR DATA-INTENSIVE APPLICATIONS," ACM SIGPLAN NOTICES, VOL. 44, NO. 3, PP. 217–228, 2009.

[5] S. KANNAN, A. GAVRILOVSKA, K. SCHWAN, D. MILOJICIC, AND V. TALWAR, "USING ACTIVE NVRAM FOR I/O STAGING," IN PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON PETASCALE DATA ANALYTICS: CHALLENGES AND OPPORTUNITIES. ACM, 2011, PP. 15–22.

[6] N. MI, A. RISKA, Q. ZHANG, E. SMIRNI, AND E. RIEDEL, "EFFICIENT MANAGEMENT OF IDLENESS IN STORAGE SYSTEMS," ACM TRANSACTIONS ON STORAGE (TOS), VOL. 5, NO. 2, P. 4, 2009.

[7] Y. KIM, R. GUNASEKARAN, G. M. SHIPMAN, D. DILLOW, Z. ZHANG, B. W. SETTLEMYER *et al.*, "WORKLOAD CHARACTERIZATION OF A LEADERSHIP CLASS STORAGE CLUSTER," IN *PETASCALE DATA STORAGE WORKSHOP (PDSW), 2010 5th.* IEEE, 2010, pp. 1–5.

[8] LAWRENCE LIVERMORE NATIONAL LAB, "LARGE MEMORY APPLIANCE/BURST BUFFERS USE CASE." [ONLINE]. AVAILABLE: <u>HTTPS://ASC.LLNL.GOV/CORAL-BENCHMARKS/LARGE MEMORY USE</u> <u>CASES LLNL.PDF</u>

[9] NERSC, "CORI SYSTEM BURST BUFFER DESIGN." [ONLINE]. AVAILABLE: <u>https://www.nersc.gov/users/computational-</u> systems/cori/burst-buffer/

[10] CRAY INC, "DATAWARP TECHNOLOGY," 2017. [ONLINE]. AVAILABLE: <u>HTTP://WWW.CRAY.COM/SITES/DEFAULT/FILES/RESOURCES/CRAYXC40</u> -DATAWARP.PDF

[11] LOS ALAMOS NATIONAL LAB, "TRINITY SPECS." [ONLINE]. AVAILABLE: HTTP://WWW.LANL.GOV/PROJECTS/TRINITY/SPECIFICATIONS.PHP

[12] WHITT, JUSTIN L, "OAK RIDGE LEADERSHIP COMPUTING FACILITY: SUMMIT AND BEYOND," 2017. [ONLINE]. AVAILABLE: <u>HTTPS://INDICO.CERN.CH/EVENT/618513/CONTRIBUTIONS/2527318/ATT</u> ACHMENTS/1437236/2210560/SUMMITPROJECTOVERVIEWF GJLW.PDF

[13] W. BHIMJI, D. BARD, M. ROMANUS, D. PAUL, A. OVSYANNIKOV, B. FRIESEN, M. BRYSON, J. CORREA, G. K. LOCKWOOD, V. TSULAIA, S. BYNA, S. FARRELL, D. GURSOY, C. DALEY, V. BECKNER, B. V. STRAALEN, D. TREBOTICH, C. TULL, G. WEBER, N. J. WRIGHT, AND K. ANTYPAS, "ACCELERATING SCIENCE WITH THE NERSC BURST BUFFER EARLY USER PROGRAM," 2016

[14] A. M. CAULFIELD, J. COBURN, T. MOLLOV, A. DE, A. AKEL, J. HE, A. JAGATHEESAN, R. K. GUPTA, A. SNAVELY, AND S. SWANSON, "UNDERSTANDING THE IMPACT OF EMERGING NON-VOLATILE MEMORIES ON HIGH-PERFORMANCE, IO-INTENSIVE COMPUTING," IN *PROCEEDINGS OF THE 2010 ACM/IEEE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS.* IEEE COMPUTER SOCIETY, 2010, PP. 1–11.

[15] CHAMELEON.ORG. CHAMELEON SYSTEM, 2016. [ONLINE]. AVAILABLE:

HTPS://WWW.CHAMELEONCLOUD.ORG/ABOUT/CHAMELEON/