A Cycle-accurate Microarchitecture-level NAND Flash Memory System Simulation Framework

Overview

NANDFlashSim is a flash simulation model, decoupled from specific flash firmware and supports detailed NAND flash transactions with cycle accuracy. This low-level simulation framework can enable research on the NAND flash memory system itself as well as many NAND flash-based devices. The simulation model of NANDFlashSim is validated with an hardware prototype (MSIS), and currently we have successfully integrated it with a very-large scale NAND flash-based storage system and evaluated a thousand thousand NANDFlashSim instances on NERSC Hopper and Carver supercomputers. NANDFlashSim is originally designed and implemented as a library, which is a part of the cycle-level simulation models -- Hardware/software co-simulation model for exascale of National Energy Research Scientific Computing Center, Lawrence Berkeley National Laboratory (CoDEx) and a many NVRAM-based SSD platform (FLASHWOOD) of UT Dallas and PennState. For better understanding NANDFlashSim usages, we provide a sample system with NANDFlashSim, but please note that such sample system doesn't contain system clock itself.

If you decide to use NANDFlashSim in your own research, please cite our MSST paper. For convenience, here is the BibTeX information.

Why NANDFlashSim?


Figure 1

Based on diverse manufacturers and node technologies, timing parameters of modern NAND flash memory, especially multi level cell(MLC) type device, significantly fluctuates as shown in Figure 1. Furthermore, NAND flash performance depends independent variables such as NAND flash packaging type, system configuration, fabrication process, NAND command protocol, NAND I/O interface, and so on. As consequence, simplified latency models ignore the substantial contributions of the flash firmware to memory system performance. This may result in architecture and system designers overlooking research potential regarding new algorithms in/on NAND flash memory systems, such as those involved in internal parallelism handling, wear-leveling,garbage collection, the flash translation layer, flash-aware file systems, and flash controllers.

Figure 2 visualizes pipelined four NAND I/O requests scheduling with simple interleaved-die legacy-program mode. Even though most latency approximation models simulate internal parallelism with a round-robin policy and stripped I/O requests over multiple dies and plane, scheduling of requests on multi-die and many-plane NAND flash architecture is more complex and complicated than earlier NAND memory systems due to intrinsic latency variation. When advanced NAND flash I/O protocols are applied in addition to latency variation problem, making decisions about optimal system is non-trivial.


Figure 2

Therefore, NANDFlashSim, a microarchitecture-level simulation model, is designed to be performance variation-aware and employs different page offsets in a physical block for many NAND flash based applications. The memory system, controller and NAND flash memory cells have independent synchronous clock domains. In addition, by employing multi-stage operations and command chains for each die, NANDFlashSim provides a set of timing details for a large array of NAND flash operation modes including: legacy mode, cache mode, internal data move mode, multi-plane mode, multi-plane cache mode, interleaved-die mode, interleaved-die cache mode, interleaved-die multi-plane mode, and interleaved-die multi-plane cache mode.


Figure 3

These detailed NAND operation modes and their associated timings (which are co-validated with a hardware prototype shown in Figure 3) expose performance optimization points to NAND flash-based application designers and developers. In addition, NANDFlashSim supports highly reconfigurable architectures in terms of multiple dies and planes. This architecture allows a researcher to explore true internal parallelism in such an architecture by exposing the intrinsic latency variations in NAND flash. NANDFlashSim removes the dependency on a particular flash firmware, which enables memory system designers and architects to develop and optimize diverse algorithms targeting NAND flash such as buffer replacement algorithms, wear-leveling algorithms, flash file systems, flash translation layers, and I/O schedulers. The NANDFlash device and system configurations, timing parameters and energy consumption parameters can all be set independently and adjusted in configuration files.

Library Interface

NANDFlashSim provides a sample system that describes how to connect NANDFlashSim to your system. However, Since the NANDFlashSim has been originally designed for integrating other system-level simulation model with independent clocks, library-style NANDFlashSim is encouraged to use. The initial version of NANDFlashSim is provided shared libraries and interfaces for both static and dynamic. Overall NANDFlashSim interfaces and functionalities are illustrated in Figure 3.


Figure 4


The system model can commit request to NANDFlashSim by calling AddTransaction API as default:

 AddTransaction( UINT32 nHostTransId, NAND_TRANS_OP nTransOp, UINT32 nAddr)

nHostTransId is an identifier for the caller, host system-level simulator. When the NAND flash system is busy to handle the last request, it will return NAND_FLASH_ERROR_BUSY as a return value. In order to check status of a specific die in a NAND flash system instance, you can use the IsBusy API with the die ID that you want to inquiry. The caller can define a combination of various NAND flash commands through nTransOp and it is able to specify the destination using nAddr parameter. After you commit a transaction without any error return code, then please call the Update() API for supplying clock signals. The clock period for the host and NAND flash system are can be configurable using ini file for simulation device (e.g., device.ini). In addition to the APIs mentioned above, NANDFlashSim provides several overloaded AddTransaction APIs and different versions of Update API for flexibility and boosting simulation performance.