Myoungsoo Jung | ResearchProject

PROJECTS (hide)

WHAT ARE WE STUDYING? (2 minutes introduction -- Korean )
RESEARCH TOPICS
PROJECTS FOR JUST FUN:

WHAT ARE WE STUDYING? (2 minutes introduction -- Korean )

What We Are Doing? (2 minutes introduction -- Korean )

RESEARCH TOPICS

CXL HARDWARE AND SOFTWARE CO-SOLUTION WITH REAL

As the big data era arrives, resource disaggregation has attracted significant attention thanks to its excellent scale-out capability, cost efficiency, and transparent elasticity. Many industry prototypes and academic simulation/emulation-based studies explore a wide spectrum of approaches to realize memory disaggregation technology and put significant effort into making memory disaggregation practical. However, the concept of memory disaggregation has not been successfully realized by far due to several fundamental challenges. In this project, we provide a large memory system with the world-first CXL solution framework that can achieve outstanding performance in big data applications, such as machine learning, in-memory database, and real-world graph analytics. Our CXL solution opens up a new direction for memory disaggregation, and it ensures direct access and high-performance capabilities.

Publications

Breaking Barriers: Expanding GPU Memory with Sub-Two Digit Nanosecond Latency CXL Controller (HotStorage'24)

Bridging Software-Hardware for CXL Memory Disaggregation in Billion-Scale Nearest Neighbor Search (ToS'24)

Cache in Hand: Expander-Driven CXL Prefetcher for Next Generation CXL-SSD (HotStorage'23)

CXL-ANNS: Software-Hardware Collaborative Memory Disaggregation and Computation for Billion-Scale Approximate Nearest Neighbor Search (ATC'23)

Failure Tolerant Training with Persistent Memory Disaggregation over CXL (IEEE Micro'23)

Memory Pooling with CXL (IEEE Micro'23)

Training Resilience with Persistent Memory Pooling using CXL Technology (HCM@HPCA'23)

Direct Access, High-Performance Memory Disaggregation with DirectCXL (ATC'22)

Practical Memory Disaggregation using Compute Express Link (WORDS'22)

Hello Bytes, Bye Blocks: PCIe Storage Meets Compute Express Link for Memory Expansion (CXL-SSD) (HotStorage'22)

Realizing Scale-Out, High-Performance Memory Disaggregation with Compute Express Link (CXL) (KAIST'22)

MACHINE LEARNING AND BIG DATA ANALYTICS WITH STORAGE/SCM

Yet, most storage systems, and operating systems (OS) kernel rely on conventional rule-based strategies. This simply makes sense as the latency to determine the running decision, but usually all they are greedy, and the heuristic algorithms often cannot unfortunately find out sub-optimal solutions. We are exploring machine learning algorithms to make system-related decisions like I/O request pattern prediction and hot/cold data management by being aware of diverse memory and storage system device-level characteristics. In addition, we are designing and implementing hardware acceleration architectures within memory and storage subsystems to enable inference and training at runtime with minimal overhead.

Publications

Flagger: Cooperative Acceleration for Large-Scale Cross-Silo Federated Learning Aggregation (ISCA'24)

Bridging Software-Hardware for CXL Memory Disaggregation in Billion-Scale Nearest Neighbor Search (ToS'24)

CXL-ANNS: Software-Hardware Collaborative Memory Disaggregation and Computation for Billion-Scale Approximate Nearest Neighbor Search (ATC'23)

GraphTensor: Comprehensive GNN-Acceleration Framework for Efficient Parallel Processing of Massive Datasets (IPDPS'23)

Failure Tolerant Training with Persistent Memory Disaggregation over CXL (IEEE Micro'23)

Hardware/Software Co-Programmable Framework for Computational SSDs to Accelerate Deep Learning Service on Large-Scale Graphs (FAST'22)

PreGNN: Hardware Acceleration to Take Preprocessing Off the Critical Path in Graph Neural Networks (IEEE CAL'22)

Large-scale Graph Neural Network Services through Computational SSD and In-Storage Processing Architectures (HotChips'22)

HolisticGNN: Geometric Deep Learning Engines for Computational SSDs (NVMW'22)

Platform-Agnostic Lightweight Deep Learning for Garbage Collection Scheduling in SSDs (HotStorage'20)

TensorPRAM: Designing a Scalable Heterogeneous Deep Learning Accelerator with Byte-addressable PRAMs (HotStorage'20)

TURNNING NEW MEMORY COMPUTING INTO A REAL

New memory including PRAM and Ultra-Low Latency Memory define different memory hierarchies compared to the convention. By far, most academic proposals related to new memory subsystems and storage systems such as persistence control, firmware algorithms, and cross-layer optimizations, are all evaluated by the simulation and/or emulation. In this project, we build up a set of ACTUAL hardware and software resources from the ground. Currently, we secure multi-core processor IPs, PRAM and NVM controllers, and the corresponding datapath, which are all available to execute Linux. In addition, we are exploring new territory to ingrate such new memory into diverse computing system domains, including domain-specific accelerators, AI accelerators, machine learning platform,s and fully hardware-automated FPGA storage subsystems.

Publications

LightPC: Hardware and Software Co-Design for Energy-Efficient Full System Persistence (ISCA'22)

Slow is Fast: Rethinking In-Memory Graph Analysis with Persistent Memory (NVMW'22)

Empirical Guide to Use of Persistent Memory for Large-Scale In-Memory Graph Analysis (ICCD'21)

Automatic-SSD: Full Hardware Automation over New Memory for High Performance and Energy Efficient PCIe Storage Cards (ICCAD'20)

DRAM-less: Hardware Acceleration of Data Processing with New Memory (HPCA'20)

OpenExpress: Fully Hardware Automated Open Research Framework for Future Fast NVMe Devices (USENIX ATC'20)

BIBIM: A Prototype Multi-Partition Aware Heterogeneous New Memory (HotStorage'18)

NearZero: An Integration of Phase Change Memory with Multi-core Coprocessor (IEEE Computer Architecture Letters 2017)

ENERGY EFFICIENT HETEROGENEOUS COMPUTING

Heterogeneous computing is widely applied to most of data processing and big analytic applications by incorporating many dissimilar many processors such as general purpose graphic processing units (GPGPU), many-integrated core (MIC), field-programmable gate array (FPGA) based coprocessors. However, it faces nowaday many challenges coming from different programming interfaces and data movement models. We are researching energy-efficient heterogeneous computing with diverse types of FPGA devices (Xilinx and Altera) and a thousand coprocessors in Kandemir machine (NVIDIA GPGPUs and Xeon Phi). The main goal of this research is i) to remove data movement by aggressively integrating memory with hardware accelerator and ii) to enable low-power hardware acceleration with hardware/software cross-optimizations.

Publications

DockerSSD: Containerized In-Storage Processing and Hardware Acceleration for Computational SSDs (HPCA'24)

Containerized In-Storage Processing Model and Hardware Acceleration for Fully-Flexible Computational SSDs (IEEE CAL'23)

PreGNN: Hardware Acceleration to Take Preprocessing Off the Critical Path in Graph Neural Networks (IEEE CAL'22)

DRAM-less: Hardware Acceleration of Data Processing with New Memory (HPCA'20)

FlashGPU: Placing New Flash Next to GPU Cores (DAC'19)

FUSE: Fusing STT-MRAM into GPUs to Alleviate Off-Chip Memory Access Overheads (HPCA'19)

Computing with Near Data (SIGMETRICS'19)

FlashAbacus: A Self-governing Flash-based Accelerator for Low-power Systems (EUROSYS'18)

Enhancing Computation-to-Core Assignment with Physical Location Information (PLDI'18)

NearZero: An Integration of Phase Change Memory with Multi-core Coprocessor (IEEE Computer Architecture Letters 2017)

NVMMU: A Non-Volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures (PACT'15)

GPUdrive: Reconsidering Storage Accesses for GPU Acceleration (ASBD at ISCA'14)

NEXT GENERATION NON-VOLATILE MEMORY

We are researching next generation non-volatile memory (NVM) systems as a memory extension or NAND flash alternative storage medium. Especially, this project includes 1) characterizing challenges of emerging NVMs such as Resistive RAM (RRAM), Phase-change RAM (PCRAM), Magnetic RAM (STT-RAM), 2) building system-level prototypes, 3) exploring killer applications exploiting these emerging NVMs and 4) architecting new platforms with byte-addressable NVMs.

Publications

Vigil-KV: Hardware-Software Co-Design to Integrate Strong Latency Determinism into Log-Structured Merge Key-Value Stores (ATC'22)

Hello Bytes, Bye Blocks: PCIe Storage Meets Compute Express Link for Memory Expansion (CXL-SSD) (HotStroage'22)

What You Can't Forget: Exploiting Parallelism for Zoned Namespaces (HotStorage'22)

Automatic-SSD: Full Hardware Automation over New Memory for High Performance and Energy Efficient PCIe Storage Cards (ICCAD'20)

OpenExpress: Fully Hardware Automated Open Research Framework for Future Fast NVMe Devices (USENIX ATC'20)

LL-PCM: Low-Latency Phase Change Memory Architecture (DAC'19)

Invalid Data-Aware Coding to Enhance the Read Performance of High-Density Flash Memories (MICRO'18)

BIBIM: A Prototype Multi-Partition Aware Heterogeneous New Memory (HotStorage'18)

ReveNAND: A Fast-Drift Aware Resilient 3D NAND Flash Design (ACM TACO)

PEN: Design and Evaluation of Partial-Erase for 3D NAND-Based High Density SSDs (FAST'18)

Exploiting Data Longevity for Enhancing the Lifetime of Flash-based Storage Class Memory (SIGMETRICS'17)

NearZero: An Integration of Phase Change Memory with Multi-core Coprocessor (IEEE Computer Architecture Letters 2017)

DUANG: Lightweight Page Migration and Adaptive Asymmetry in Memory Systems (HPCA'16)

OpenNVM: An Open-Sourced FPGA-based NVM Controller for Low Level Memory Characterization (ICCD'15)

Area, Power, and Latency Considerations of STT-MRAM to Substitute for Main Memory (MemoryForum at ISCA'14)

ZombieNAND:Resurrecting Dead NAND Flash for Improved SSD Longevity (MASCOTS'14)

Design of a Large-Scale Storage-Class RRAM System (ICS'13)

Challenges in Getting Flash Drives Closer to CPU (USENIX HotStorage'13)

NVM CONTROLLER/SOFTWARE DESIGN

Modern SSDs can be plagued by enormous performance variations depending on whether the underlying architectural complexities and NVM management overheads can be hidden or not. Designing a smart NVM controller is key hiding these architectural complexities and reducing the internal firmware overheads. In this project, we present a set of novel storage optimizations including various concurrency methods, I/O scheduling algorithms, and garbage collection avoidance mechanisms.

Publications

Vigil-KV: Hardware-Software Co-Design to Integrate Strong Latency Determinism into Log-Structured Merge Key-Value Stores (ATC'22)

What You Can't Forget: Exploiting Parallelism for Zoned Namespaces (HotStorage'22)

ScalaRAID: Optimizing Linux Software RAID System for Next-Generation Storage (HotStorage'22)

SOML Read: Rethinking the Read Operation Granularity of 3D NAND (ASPLOS'19)

Invalid Data-Aware Coding to Enhance the Read Performance of High-Density Flash Memories (MICRO'18)

PEN: Design and Evaluation of Partial-Erase for 3D NAND-Based High Density SSDs (FAST'18)

Exploiting Intra-Request Slack to Improve SSD Performance (ASPLOS'17)

DUANG: Lightweight Page Migration and Adaptive Asymmetry in Memory Systems (HPCA'16)

HIOS: A Host Interface I/O Scheduler for Solid State Disks (ISCA'14)

Sprinkler: Maximizing Resource Utilization in Many-Chip Solid State Disks (HPCA'14)

Physically Addressed Queueing (PAQ): Improving Parallelism in Solid State Disks (ISCA'12)

Taking Garbage Collection Overheads off the Critical Path in SSDs (USENIX Middleware'12)

Middleware - Firmware Cooperation for High-Speed Solid State Drives (USENIX Middleware'12)

GREEN HIGH PERFORMANCE COMPUTING

Drawing parallels to the rise of general purpose graphical processing units (GPGPUs) as accelerators for specific high performance computing (HPC) workloads, there is a rise in the use of non-volatile memory (NVM) as accelerators for I/O-intensive scientific applications. In addition, flash drives or NVM technologies begin to replace disks at major data centers of Amazon, Facebook, Dropbox and etc. In this work, we 1) deliver how to efficiently manage flash drives and emerging NVM technologies as an I/O accelerator in HPC and Datacenter systems 2) redesign current memory/storage hierarchy and HPC storage stack from scratch with emerging NVM 3) develop a novel and efficient hardware/software cooperative techniques being aware of system-level characteristics as well as underlying NVM technologies complexities.

Publications

DockerSSD: Containerized In-Storage Processing and Hardware Acceleration for Computational SSDs (HPCA'24)

Design of Global Data Deduplication for A Scale-out Distributed Storage System (ICDCS'18)

Understanding System Characteristics of Online Erasure Coding on Scalable, Distributed and Large-Scale SSD Array Systems (IISWC'17)

TraceTracker: Hardware/Software Co-Evaluation for Large-Scale I/O Workload Reconstruction (IISWC'17)

NVMMU: A Non-Volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures (PACT'15)

CoDEN: A Hardware/Software CoDesign Emulation Platform for SSD-Accelerated Near Data Processing (ASBD at ISCA'15)

Triple-A: A Non-SSD Based Autonomic All-Flash Array for Scalable High Performance Computing Storage Systems (ASPLOS'14)

Exploring the Future of Out-Of-Core Computing with Compute-Local Non-Volatile Memory (Supercomputing'13)

PARALLEL I/O PROCESSING

Exploiting internal parallelism over hundreds NAND flash memory is becoming a key design issue in high-speed SSDs. The main goal behind this memory-level parallelism project is to fully take advantage of both system-level and memory-level parallelism such that SSD can offer short latency with full bandwidth. In this project, we are exploring internal SSD/NVM architecture with a full design space sitting on system and memory-level organizations with a variety of parameters such as a standard queue, multiple buses, chips, and diverse advanced flash operations.

Publications

ScalaAFA: Constructing User-Space All-Flash Array Engine with Holistic Designs (USENIX ATC'24)

What You Can't Forget: Exploiting Parallelism for Zoned Namespaces (HotStorage'22)

ScalaRAID: Optimizing Linux Software RAID System for Next-Generation Storage (HotStorage'22)

Exploring Fault-Tolerant Erasure Codes for Scalable All-Flash Array Clusters (IEEE Transactions on Parallel and Distributed Systems (TPDS) 2019)

Understanding System Characteristics of Online Erasure Coding on Scalable, Distributed and Large-Scale SSD Array Systems (IISWC'17)

Exploring Parallel Data Access Methods in Emerging Non-Volatile Memory Systems (IEEE Transactions on Parallel and Distributed Systems'2017)

Exploring the Potentials of Parallel Garbage Collection in SSDs for Enterprise Storage System (Supercomputing'16)

Triple-A: A Non-SSD Based Autonomic All-Flash Array for Scalable High Performance Computing Storage Systems (ASPLOS'14)

HIOS: A Host Interface I/O Scheduler for Solid State Disks (ISCA'14)

Sprinkler: Maximizing Resource Utilization in Many-Chip Solid State Disks (HPCA'14)

Physically Addressed Queueing (PAQ): Improving Parallelism in Solid State Disks (ISCA'12)

An Evaluation of Different Page Allocation Strategies on High-Speed SSDs (USENIX HotStorage'12)

MEMORY MODELING FOR HW/SW CO-DESIGN

To explore impacts of diverse NVM technologies in modern computer architecture and systems, it is required to have fast, high fidelitous and accurate NVM simulation/emulation research tools. Unfortunately, modeling NVM technologies for the broad range of variety is non-trivial research area as there are multiple design parameters and unprecedented device-level considerations. In this project, we are developing several research frameworks, including open-source simulation models, FPGA-based NVM emulators, and hardware validation prototypes. In addition to offering valuable research vehicles, we also propose a hardware-software codesign environment that will allow application, algorithm and system developers to influence the direction of future architectures, thereby satisfying diverse computing area demands.

Publications

DockerSSD: Containerized In-Storage Processing and Hardware Acceleration for Computational SSDs (HPCA'24)

FlashGPU: Placing New Flash Next to GPU Cores (DAC'19)

Amber: Enabling Precise Full-System Simulation with Detailed Modeling of All SSD Resources (MICRO'18)

Parallelizing Garbage Collection with I/O to Improve Flash Resource Utilization (HPDC'18)

SimpleSSD: Modeling Solid State Drive for Holistic System Simulation (IEEE CAL)

OpenNVM: An Open-Sourced FPGA-based NVM Controller for Low Level Memory Characterization (ICCD'15)

NVM-Charade: An Open Sourced FPGA-Based NVM Characterization Scheme (WARP@ISCA'15)

NANDFlashSim: High-Fidelity, Micro-Architecture-Aware NAND Flash Memory Simulation (ACM Transaction on Storage 2016)

An Evaluation of Different Page Allocation Strategies on High-Speed SSDs (USENIX HotStorage'12)

Intrinsic Latency Variation Aware NAND Flash Memory System Modeling and Simulation at Microarchitecture level (MSST'12)

SSD CHARACTERIZATIONS

Storage applications leveraging SSD technology are being widely deployed in diverse computing systems. These applications accelerate system performance by exploiting several SSD-specific characteristics. However, modern SSDs have undergone a dramatic technology and architecture shift in the past few years, which makes widely held assumptions and expectations regarding them highly questionable. The main goal of this project is to question popular assumptions and expectations regarding SSDs through an extensive experimental analysis. This project use two different types of SSD, which are most popular in many market segments; 1) PCI Express based SSDs and 2) mass storage type SSDs. This project also offers insightful analyses to system-level kernel and architecture designers.

Publications

Faster than Flash: An In-Depth Study of System Challenges for Emerging Ultra-Low Latency SSDs (IISWC'19)

Exploring System Challenges of Ultra-Low Latency Solid State Drives (HotStorage'18)

TraceTracker: Hardware/Software Co-Evaluation for Large-Scale I/O Workload Reconstruction (IISWC'17)

Exploring Parallel Data Access Methods in Emerging Non-Volatile Memory Systems (IEEE Transactions on Parallel and Distributed Systems'2017)

Exploring Design Challenges in Getting Solid State Drives Closer to CPU (IEEE Transactions on Computers'2016)

Power, Energy and Thermal Considerations in SSD-Based I/O Acceleration (USENIX HotStorage'14)

Revisiting Widely-held Expectations of SSD and Rethinking Implications for Systems (SIGMETRICS'13)

An Evaluation of Different Page Allocation Strategies on High-Speed SSDs (USENIX HotStorage'12)

PROJECTS FOR JUST FUN:

IPAD (2004~2005)

A forerunner of high-end portable media player, which supports processing and managing images, and playing entertainment contents such as music, flash and digital movies as a standalone device. IPAD suggests the potential hand-held smart devices such as Apple's iPad, but our IPAD is developed four years earlier than the iPad first generation. Our IPAD provides a method to directly upload images, processed in X25 embedded platform to the web blog through wireless networks, which leads that users do not require to connect their own device to PC or laptop at all.

Code Wizard (2005)

An intuitive drag and drop programming tool, which enables someone who doesn’t know how to program robotics invention to easily develop their own robots. Most people can create a program through an intuitive drag and drop programming. Code-wizard project provides programmable robot suites and a convenient mechanism to control them. Such robot suites consist of several peripheral devices such as interactive servo motors, and touch sensors.

CLASS-MATE (2005)

An object oriented paradigm-based education game framework, where the goal is to develop a humanoid to battle against other humanoids, developed under Class-mate library. Developers who are not familiar with OOP can improve their programming skills and easily learn features of OOP such as the polymorphism, inheritance design as a part of game play. The purpose of class-mate project is very similar to java Robocode project. However, unlike java, C++ RTL have no VM, which allows to link diverse user's programmed objects. Class-mate leverages COM-based dynamic linkable object methods and provides a framework for playing/coding robots.