Parallel Computer Architecture

This course will mainly introduce computer organization and design, including the following topics: i) instruction-level parallelism, including parallel processing, superscalar, VILW, static instruction scheduling dynamic scheduling and precise exception handling, ii) memory-level parallelism, iii) data-level parallelism including multi-core architecture, GPU, iv) thread-level parallelism and v) NVM-level parallelism. This course is a project-centric; we will have five gem5 lab projects. Most projects are a step-by-step tutorial to teach you how you can do simulation-based architectural explorations and studies. It will include CPU design analysis, exploring different branch predictors, multi-threading on full-system mode evaluations, and SSD internal parallelism analysis on gem5. Considering undergraduate students, this course will also include quick review lectures, which will include, instruction set architecture, MIPS/RISC architecture, pipelining, hazard and cache architecture.


No.TitleTopics included
#1Overviews [ pdf ] [ ppsx ]Logitistics
#2Quick Review: Instruction Set Architecture [ pdf ]ISA, MIPS, RISC and RISC-V
#3Quick Review: MIPS Architecture [ pdf ]RTL, Combination Logic, Single Cycle Datapath, ALU Control Unit
#4Quick Review: Multi-cycle Machine and Pipelining [ pdf ]Multi-cycle machine, FSM, Pipeline Design and Pipeline Paradox
#5Quick Review: Hazards [ pdf ]Dependencies and Harzards, Data Forwarding, Pipeline Integration
#6Instruction-Level Parallel Processing [ pdf ]Superscalar, Deep pipelining, VLIW, Tracing, Loop Unrolling
#7Static Scheduling [ pdf ]Compiler Optimization, Register Renaming Software Pipelining
#8Dynamic Scheduling (Scoreboard) [ pdf ]Out-of-Order Execution, Scorebard Algorithm
#9Tomasulo Scheduling [ pdf ]Reservation Stations, Decentralized Buffers, Tomasulo Algorithm
#10Reorder Buffer [ pdf ]Register Renaming, Alias Table, Advanced RoB Processors
#11Precise Exception [ pdf ]Exception Handling, Speculative Executions, LSQ, Spectre/Meltdown
#12Branch Prediction [ pdf ]Branch History Tables, Correlated Prediction, Tournament, RAS
#13Cache [ pdf ]$ Basics, Practical/Approximate LRUs, UnCacheable Speculative WC
#14Virtual Memory [ pdf ]Virtual Memory, Page Tables, Multi-level Paging, MMU, TLB
#15Volatile Memory (DRAM) [ pdf Δ ]MAT, Bank/Rank, Channel, EDO, Burst EDO, DDR, LRDIMM, FBDIMM
#16Volatile Memory (Adavanced) [ pdf Δ ]Xn Prefetching, Clock Skew, Single-ended, Reflection
#17Volatile Memory (Adavanced) [ pdf Δ ]Scheduling, Multi-channel, Bank-level Parallelism, LPDDR, GDDR
#18Non-Volatile Memory (Practical Approach) [ pdf Δ ]PMEM, Persitence Support, DAX, FSDAX, DEVDAX, New Instrutions
#19Multicore with Coherence/Consistence [ pdf Δ ]MESI, MSI, MOESI, MESID, Director, Snooper, Multi-level $ issues

Back to Table of Contents

Operating Systems

The purpose of this course is to teach the general concepts and principles behind operating systems. The topic we will cover through this class, including i) kernel and process abstractions and programming, ii) scheduling and synchronization, iii) memory management and address translation, iv) caching and virtual memory v) file systems, storage devices, files and reliability, vi) full and para-virtualization. In addition to these lectures, we will also have term projects, which use an operating system simulator/emulator built for an educational purpose. In these projects, we expect that you not only can learn Linux practices but also make great strides in studies on operating systems design and implementation. This is in C/C++, rather than Java or Python. We believe that these projects will provide a more realistic experience in operating systems to students. In this class, all homeworks are treated in an individual assignment; whereas projects are considered as a group assignment. In typical, it is difficult to figure out the contributions that each member committed, the submission for these projects will be done through a git repository per group (e.g., bitbucket), and TA will check all push and pull transactions to grade a team. Note that, in this sense, Intermediate pushes will help TA see how students are progressing. The project teams may be two persons, some may be threes; the number of students per team will be decided based on information of the final registration, and will be announced in classes.


No.TitleTopics included
#3Hadware managementOS history, architecture and hardware supports, I/O ports, memory mapped I/O, DMA, typical memory layout and a bootup sequence example
#4ProcessesMultiprogramming, execution stack, address spaces, context switch, process creation, inter-process communication
#5ThreadsLight-weight processes, thread State, lifecycle of a thread, dispatch Loop, evennts, interrupts, thread execution
#6Concurrency managementUNIX process management, process tree, fork(), exec(), pthread, join(), OpenMP
#7SynchronizationA high level view of parallelism and synchronization
#8AtomicityRace conditions, critical sections, mutexes, Instruction-level atomicity, spin Lock
#9DeadlockBounded buffer problem, read/write lock, semaphore, condition variable, monitor, circular waiting, and deadlock avoidance
#10CPU schedulerProcessor behavior anaylsis, scheduling architecture, FCFS, SJF, STCF, RR and CPU burst prediction,
#11Advanced CPU schedulerPriority scheduling basics, priority boost, EDF, MLQ, MLFQ, fair scheduling, lottery scheduling, stride scehduling, multiprocessor-aware scheduling, MQMS, process migration, CFS and red-black process tree
#12Virtual memorySegementation, multi-segment model, segment translation, translation table, swapping, paging, sharing, multi-level translation, two-level page table and inverted page table
#13Cache and TLBCache basics, direct mapped cache, set associative cache, fully-associative cache, address translation on caching, TLB, demanding paging, paging table entry and software-driven TLB
#14Page replacementPage faults, FIFO, MIN, LRU, Belady's anomaly, clock algorithm, n-chance approximated LRU and free-list
#15Disk schedulingDisk architecture and organization, interfaces, transfering data, caching, FCFS, SSTF, SCAN, C-SCAN, C-LOOK and device-level command-queueing
#16Beyond disksBlock addressing, chunk sizing, RAID performance analysis, RAID-0, RAID-1, hybrid-RAID, RAID-4, RAID-5, RAID-6, flash, SSD, garbage collection, TRIM and wear-leveling
#17File system basicsFile system overview, MBR, patition, root file system, mount, virtual file system, file allocation table and file meta data, FAT analysis
#18inode, block, and block groupinode, inode block pointers, link, ext optimization for many small file, fast file system, ext and ext2
#19JournalingConsistency and reliability, file system checker, write-ahead log, commits and checkpoints, crash recovery, meta-data journaling
#20Log-structureExtents and B-Trees, log-structured file system, buffering writes, garbage collection and copy-on-write
#21Full virtualizationVMM organization and functions, guest, virtual machine hardware, protected mode, privileged instructions, binary translation, caching translated code and shadow page tables
#22ParavirtualizationHardware support for VM, virtualization performance analysis, AMD-V and VT-x, second level address translation, Xen, hypercalls, virtual devices, virtual devices


Operating Systems Concepts, 9th Edition Silbershatz, Galvin, Gagne

Suggested references

Operating Systems: Three Easy Pieces, Remzi and Andrea Arpaci-Dusseau.
Free, PDFs available online
The Design and Implementation of the FreeBSD OS
The Practice of Programming
The Mythical Man-Month


  • Project-0: Install Pintos and implement print_name event [ description ]
  • Project-1: Threads, timer and priority scheduler [ description is available at YSCEC ] [ DESIGNDOC sample ]
  • Project-2: System call implementation [ DESIGNDOC sample ]
  • Project-3: Virtual memory implementation [ DESIGNDOC sample ]

Back to Table of Contents

Memory Architecture and Storage Systems

Modern flash-based solid state disk (SSDs) can be plagued by enormous performance variations depending on whether the underlying architectural complexities and flash management overheads can be hidden or not. Designing a smart flash controller and storage system is key hiding these architectural complexities and reducing the internal firmware overheads. In this course, we first understand the core components of SSD architecture and key concepts behind flash firmware. It then presents a set of novel storage optimizations including various concurrency methods, I/O scheduling algorithms, and garbage collection avoidance mechanisms.

The topic we will cover through this class as follows:

Lectures (in progress)

  1. Logistics
  2. Flash introduction -- NAND Flash Basics, Basics of floating gates and Reliability issues of NAND cells
  3. Fundamental address mapping (FTL) -- Simple mapping algorithm and physical block management
  4. Advanced address mapping (FTL) -- Associativity, garbage collection and wear leveling basics
  5. Garbage collection basic -- Foreground and background garbage collection techniques
  6. Flash-level controller [ pdf ] [ ppsx ] -- Flash memory transactions and advanced command control
  7. SSD architecture and system-level controller [ pdf ] [ ppsx ] -- SSD Architecture and I/O parallelism-centric design techniques
  8. Wear-leveling algorithms -- Reliability management functions and hot/cold data management
  9. Internal buffer management -- DRAM caching, flash aware replacement and buffering mechanisms

The lectures will also have invitation talks that cover a series of industrial topics non-volatile memory express (NVMe) architecture, device management, and distributed flash controller. We will also provide a simulation framework for your project that can accelerate your knowledge on storage systems.

Back to Table of Contents

Advanced Programming Language

This course is mainly designed towards introducing the design and implementation of programming language. From the design perspective, we will study language features for expressing algorithms. On the other hand, we will also study the basic concepts of the tools to map such language features onto modern computing hardware (such as compilers, and interpreters). In this course, rather than harp on the feature of a particular language, we will focus on fundamental concepts, the differences among the programming languages, the reasons for those differences, and the implication of those differences offers language implementation. The topic we will cover through this class as follows:

  • formal aspects of syntax and semantics
  • naming, scoping, and binding
  • scanning, parsing, semantic analysis, and code generation
  • control flow, subroutines, exception handling, and concurrency
  • type systems, composite types, data abstraction, and storage management
  • imperative, functional, logic-based, and object-oriented programming paradigms


It would be preferred to prepare the concepts of programming and have a practice on programming based on your language preference. We will assume that you either know the material that is supposed to be covered in the fundamental issues, or that you are willing to learn the material as necessary.


Programming language pragmatics/Scott, Michael Lee/Elsevier/Morgan Kaufmann Publishers

Suggested references

Structure and interpretation of computer programs/Abelson, Harold/MIT Press ; McGraw-Hill

Back to Table of Contents

Computer Organization and Design

This course will mainly introduce computer organization and design, including the following topics:

  • Instruction set design, illustrated by the MIPS instruction set architecture.
  • Design of the datapath and control for a simple processor.
  • CPU performance analysis and systems-level view of computer arithmetic.
  • Parallelism, pipelining, hazard and dependency
  • Cache and Memory design.
  • Hierarchical memory.
  • I/O subsystems, storage systems, I/O performance analysis.

Even though the class topics are related to architecture fundamentals, students are expect to have some hardware and computer science background. This course will include two or three simple projects, one of each leveraging a different style of simulation models build for an educational purpose. The one of the goal behind these projects is that students can learn i) how to use full system simulation software and ii) how to perform simulation-based architectural studies, which in turn can be a good steppingstone for your future research. The simulation framework built on both most 32-bit and 64-bit flavors of UNIX and Windows NT-based operating systems, but we recommend to modify them on a UNIX-like system. The projects will be relatively simple (compared to what an advanced computer architecture course usually deals with), but students should be capable of freely analyzing/modifying C/C++ written software models.


We expect that you know C/C++ and data structures, have done some assembly language programming, and that you know about series and products, logarithms, advanced algebra, some calculus, and basic probability (means, standard deviations, etc.). We will assume that you either know the material that is supposed to be covered in those fundamental topics, or that you are willing to learn the material as necessary.

Back to Table of Contents