We are always interested in offering a valuable research vehicle to understand how the evolution of emerging non-volatile memory technologies and massively parallel chip architectures can be guided by close-coupled feedback with the design of the application, algorithms, and hardware together. So far, we built up several hardware prototypes, FPGA-based memory controllers and simulation frameworks that capture low-level memory characteristics.

CAMEL's Open Source Licence Software

  • SimpleSSD -- Open-Source Licenced Educational SSD Simulatior for High-Performance Storage and Full-System Evaluations [ website ]

SimpleSSD is a high-fidelity SSD simulation framework designed towards an educational purpose. It builds up a complete storage stack from scratch, models all detailed characteristics of SSD internal hardware and software, provides promising simulation speed, and can be integrated into publicly-available full system simulators. We have verified our simulation framework with commercial SSD and our experiments demonstrate high accuracy of simulation results.

  1. "Amber: Enabling Precise Full-System Simulation with Detailed Modeling of All SSD Resources," MICRO'18
  2. "FlashShare: Punching Through Server Storage Stack from Kernel to Firmware for Ultra-Low Latency SSDs," OSDI'18
  3. "SimpleSSD: Modeling Solid State Drive for Holistic System Simulation", IEEE CAL


  • OpenExpress -- Fully Hardware Automated Open Research Framework for Future Fast NVMe Devices [ paper ] [ download ]

NVMe is widely used by diverse types of storage and nonvolatile memories subsystems as a de-facto fast I/O communication interface. Industries secure their own intellectual property (IP) for high-speed NVMe controllers and explore challenges of software stack with future fast NVMe storage cards. Unfortunately, such NVMe hardware IP cores are prohibited to access and redesign from academia. There exist a few third-party vendors that do hardware IP business. However, their IP cores require expensive “per-month” unitary prices, ranging from $30K ∼ $45K; the price of singleuse source code is around $100K. There are even restrictions on modifying the hardware IP and architecture, which is inadmissible in research communities. OpenExpress, a fully hardware automated framework that has no software intervention to process concurrent NVMe requests while supporting scalable data submission, rich outstanding I/O command queues, and submission/completion queue management.

  1. "OpenExpress: Fully Hardware Automated Open Research Framework for Future Fast NVMe Devices," USENIX ATC'20
  • OpenNVM -- An Open-Sourced FPGA-based NVM Controller for Low Level Memory Characterization [ website ] [ download ]

OpenNVM can cope with diversified memory transactions and cover a variety of evaluation workloads without any FPGA logic block updates. In our design, while evaluation scripts are managed by a host, all the NVM-related transactions are handled by our FPGA-based NVM controller connected to the hardware circuit board that can accommodate different types of NVM product and our custom-made power measurement board. This open scheme has been developed to generate exhaustive, empirical data of emerging non-volatile memory in a configured, programmable FPGA based hardware prototype in order to serve the area in research on memories, especially non-volatile type.

  1. "FlashAbacus: A Self-governing Flash-based Accelerator for Low-power System," EuroSys'18
  2. "NearZero: An Integration of Phase Change Memory with Multi-core Coprocessor," IEEE CAL'17
  3. "OpenNVM: An Open-Sourced FPGA-based NVM Controller for Low Level Memory Characterization," ICCD'15


  • NANDFlashSim -- A cycle-accurate and hardware-validated NAND flash simulation model (open source project) [ website ] [ download ]

NANDFlashSim is a flash simulation model, decoupled from specific flash firmware and supports detailed NAND flash transactions with cycle accuracy. This low-level simulation framework can enable research on the NAND flash memory system itself as well as many NAND flash-based devices such as Flash-based SSD, eMMC, CF memory card, mobile NAND flash mediums etc. We have been evaluated a hundred thousands of NANDFlashSim instances on NERSC Hopper and Carver supercomputers. You can download source code from code.

  1. "NANDFlashSim: High-Fidelity, Micro-Architecture-Aware NAND Flash Memory Simulation," ACM Transactions on Storage (TOS)
  2. "HIOS: A Host Interface I/O Scheduler for Solid State Disks," ISCA'14
  3. "Sprinkler: Maximizing Resource Utilization in Many-Chip Solid State Disks," HPCA'14
  4. "Triple-A: A Non-SSD Based Autonomic All-Flash Array for Scalable High Performance Computing Storage Systems," ASPLOS'14
  5. "Physically Addressed Queueing (PAQ): Improving Parallelism in Solid State Disks," ISCA'12
  6. "Understanding System Characteristics of Online Erasure Coding on Scalable, Distributed and Large-Scale SSD Array Systems," MSST'12


Storage traces are widely used for storage simulation and system evaluation. Since high performance SSD, flash array and NVM systems exhibit different I/O timing behaviors, the traditional traces need to be revised or collected on relatively modern systems. To address this, we are collecting different types of traces with many different combinations of devices and systems. Our trace repository distributes traces collected on Kandemir, Wilson, John and Donofrio machines under different types of parallel file systems such as lustre and ceph. Hope that we can secure more traces, which can help storage system and architecture communities to make better and easy-to-reproduce research.

  1. "PEN: Design and Evaluation of Partial-Erase for 3D NAND-Based High Density SSDs," FAST'18
  2. "TraceTracker: Hardware/Software Co-Evaluation for Large-Scale I/O Workload Reconstruction," IISWC'17
  3. "Understanding System Characteristics of Online Erasure Coding on Scalable, Distributed and Large-Scale SSD Array Systems," IISWC'17


FlashGPU is a MacSim-based GPU simulation model that integrates with SimpleSSD. This research framework can be basically used for exploring an emerging GPU platform that would employ flash wihtin its discrete device. The current version of FlashGPU replaces global memory with Z-NAND that exhibits ultra-low latency. It also architects a flash core to manage request dispatches and address translations underneath L2 cache banks of GPU cores. While Z-NAND is a hundred times faster than conventional 3D-stacked flash, its latency is still longer than DRAM. We expect that many different types of persistent memory subsystems, algorithms, controllers can be researched to address such long latnecy issues on flash in cases where ones require putting flash into massive GPU core network.

  1. "ZnG: Architecting GPU Multi-Processors with New Flash for Scalable Data Analysis," ISCA'20
  2. "FlashGPU: Placing New Flash Next to GPU Cores," DAC'19


SystemC interface converter (SCIC) enables DRAMSim to be integrated with comprehensive pin-level system simulation models. SCIC manages protocol difference between the DRAMSim and SystemC interface. SCIC also provides storage resources for modeling data movement. Additionally in this project, a pin-level protocol (Transaction Level 0) is introduced it to the memory system model of DRAMSim; therefore, the memory system can be harmonized to other simulators that employ SystemC or HDL simulation. You can download the source code from code.

  1. "SCIC: A System C Interface Converter for DRAMSim," Lawrence Berkeley National Laboratory Techinical Report

Solid State Drive Simulator/Emulator

  • FLASHWOOD -- Comprehensive Large-scale NVRAM Storage Simulation Framework

In this simulation framework, NVRAM software stack on multi-channel architecture are fully implemented, and diverse parameters/algorithms are reconfigurable (e.g., buffer cache, NVMHCIs, flash drivers, flash translation layers, physical layouts). Hardware components are emulated in a cycle-level by multiple NANDFlashSim instances, DRAM simulation instances, and virtual channel and controller modules. Flashwood can also evaluate dynamic energy and power consumption by catching all the different components' activities. The code for the flash software in the framework and device simulation code are around two hundreds thousand of lines and ten thousand of lines, respectively.

  • CoDEN -- A Hardware/Software CoDesign Emulation Platform for SSD-Accelerated Near Data Processing

CoDEN is a novel hardware/software co-design emulation platform, which not only offers flexible/scalable design space that can employ a broad range of SSD controller and firmware policies, but also capture the details of entire software/hardware stacks for SSD-accelerated near data processing. Our CoDEN can be connected to a host through PCI Express (PCIe) interface, a high performance memory bus, and recognized by the host as a real SSD storage device.

  • ASURA -- SSD emulation kernel driver

The SSD emulation kernel driver provides a logical volume to native file systems (e.g., Windows NTFS, EXT4) as a pseudo SSD device. Virtual channels and cycle-level NAND flash simulation instances of Asura model the actual runtime in cycle accurate by hooking kernel I/O dispatch routines. To enable large-scale SSD emulation, the driver only stores metadata of kernel modules, data of flash firmware and device simulation models -- omits actual data contents. In addition, Asura can be initiated as multiple driver instances in order to emulate an SSD RAID system. Asura has been implemented by a filter driver (WDM) for Windows (NTFS) and loadable kernel module for Linux (EXT4).