SCIC: A System C Interface Converter for DRAMSim

Myoungsoo Jung
Department of Computer Science and Engineering
The Pennsylvania State University
Project Duration: 3 weeks (Jun, 2011)

1 Introduction

The memory system simulator, DRAMSim [3], is a well-known cycle-accurate model, which communicates with other system models using abstracted protocol data (Transaction Level 1 [2]) and function call-style interfaces. To leverage the co-simulation of DRAMSim with a SystemC-based Hardware Description Language (HDL) simulator (e.g. XTSC), the introduction of a SystemC DRAMSim interface is required. This manual introduces the SystemC interface converter (SCIC) which enables DRAMSim to be integrated with comprehensive pin-level system simulation models. SCIC manages protocol difference between the DRAMSim and SystemC interface. DRAMSim inherently does not provide data storage resources for modeling word-level data transactions, therefore SCIC also provides storage resources for modeling data movement. Additionally in this work, a pin-level protocol (Transaction Level 0) is introduced it to the memory system model of DRAMSim; therefore, the memory system can be harmonized to other simulators that employ SystemC or HDL simulation [4 5]. Lastly, SCIC is designed to leverage the public interface of DRAMSim without modification of the original DRAMSim source code, thereby providing forward compatibility with future versions of DRAMSim without extra integration costs.

2 System Overview

An example of a system that employs the proposed SCIC module and a CPU module is illustrated in Figure 1. The two modules are synchronized by a global clock1; the control and data I/O signals of the two modules are connected to each other. The SCIC is a bridge class module aware of the SystemC interface and function-style DRAMSim interface; SCIC transfers clock events from the its own clock input port to the DRAMSim clock update function. At the same time, SCIC understands the differences the pin-level read/write protocol between the CPU model and memory system model. By parsing input signals based on the each protocol, for example, the converter reassembles information such as a destination address, request type, and payload. Based on the information provided, SCIC builds a memory transaction message from a transmitting module and forwards it to receiving module. In addition to the data communication, the CPU model may require a way to identify memory device status in order to handle transaction congestion control; SCIC generates output signals to notify results and states (ready/busy) of the memory system. Using these control signals, the CPU model can manage memory coherency regardless of I/O scheduling policies employed by DRAMSim.


PIC

Figure 1: System overview of SCIC.


2.1 Pin Configuration

In order to transfer data between a CPU model and DRAMSim, SCIC instances seven SystemC ports for control signals as listed in Table 1 and instances eight SystemC ports for data I/O. The number of data ports is designed based on burst length that DRAMSim originally provides, and each data port of SCIC composes 64 pins – the pin configuration is specified by JEDEC for DDR memory series [1]. Since the memory system of DRAMSim has its own transaction queue, a CPU model can issue as many I/O requests as the limited storage of the memory queue allows. This is the reason that SCIC employs separate I/O ports for data communication rather than bi-directional I/O ports. In Table 1, the CLK is an input port to receive the global system clock signal. The proposed SCIC updates the memory system model of DRAMSim on every signal clock cycle based on the CLK port. Other ports for control are mainly divided into two parts in a CPU model view. The first one is control ports related to issue memory I/O requests; the R/B, WE, DE, and DA ports are classified as these kinds of ports. Note that no I/O request can be issued until the R/B is high. The second kind of port (including CA, AO, and WO) is used to control feedback of the memory system. Using these control ports for feedback, the CPU model can manage I/O requests, which may be scheduled in an out-of-order scheduling policy of DRAMSim (depending on what the users want to) thereby maintaining memory data coherency. Note that the CPU model is ignorant of I/O completion until the CA is low. The one thing that system designers have to make sure is to connect SystemC signals to corresponding ports based on what Table 1 describes before the kernel of SystemC starts. Details for control flows and timing behaviors related to aforementioned ports are described in Section 4.




Port Name

Function



CLK

CLOCK

The CLK port is the clock signal supply for the memory system.



R/B

READY/BUSY OUTPUT OF BUS INTERFACE UNIT

The R/B output indicates the status of memory system’s bus interface unit. When input signal of it is high, the memory system is incapable of service.



WE

WRITE ENABLE

The WE input controls I/O ports which bring data when the input signal of the WE port is high.



DE

DIMM ENABLE

The DE input is the memory system selection control. When the memory system is busy state (RB high), the high input signal of the DE port will be ignored.



DA[0:62]

DESTINATION ADDRESS

The DA port designates a destination address of the memory system



CA

COMPLETION ACKNOWLEDGE

The CA output informs that the memory system completes a memory transaction indicated by the AO and WO output ports



AO[0:62]

ADDRESS OUTPUT

The AO output brings the I/O address that the memory system has completed.



WO

WRITE OUTPUT

The WO output classifies the type of memory transaction requests that the memory system has completed




Table 1: Descriptions for each port of SCIC.

3 Memory System with SystemC

A SystemC interface based memory system can be achieved without any modification of DRAMSim by employing the SCIC. The simplest way to employ SCIC is to attach the pointer indicating an instance of DRAMSim. System designers are able to create instances of different memory systems based on diverse system configurations before attaching it. Note that, SCIC is not involved in configuring the memory system; the number of ranks, number of banks, transaction queue size, and characteristics of the memory device are entirely dependent on the input parameters for the particular instance of DRAMSim. To simulate SCIC and a CPU model together, system designers should connect the appropriate signals between the two models using sc_signal – a SystemC primitive. The ports of SCIC that signals should be connected to are illustrated in Figure 2.


PIC

Figure 2: Interface configuration of SCIC.


Once all ports are appropriately connected, SCIC starts transferring clock signals from the CLK port to update() function of DRAMSim at both rising and falling edges so that the memory system of DRAMSim can be updated with double-rate. While the attached memory system is busy handling outstanding I/O requests and the transaction queue is full, the recent memory I/O request arrival shall be disregarded. Therefore, system designers should make sure that the R/B port of SCIC is low before sending an I/O request. The clock period and duty cycles for DRAMSim can be set up at the time when the sc_clock that SystemC provides is initialized, and the time resolution can also be initialized at that moment (default time resolution is nanosecond, which is equal to the clock rate that DDR3 uses). An example of initialization for the clock time information is described in Table 2. In this initialize process, a system designer can introduce different clock periods and duty cycles to the CPU model and DRAMSim.



sc_set_time_resolution(1, SC_NS);

sc_set_default_time_unit(1, SC_NS);

sc_clock CLK("clock", 2, SC_NS, 0.5, 0, SC_NS, true);



Table 2: An example to set the system time.

3.1 Internal Storage Resource Management for Modeling Word-level Data Transaction

The DRAMSim simulates cycle-accurate memory performance without storage for memory transactions when NO_STORAGE preprocessor is defined at compile time. The definition of preprocessor is removed, however, DRAMSim handles a memory transaction using a pointer which is given by a CPU model. The data communication protocol using the CPU side buffer is inappropriate for pin-level interfaces with SystemC and HDL; Instead of the pointer, SCIC provides eight data I/O ports and provides internal storage for handling memory transaction. SCIC allocates the internal storage when input signal of the DE port is high. Based on events of the WE input port, SCIC deals the storage with a different way; if the input of the WE port is high, SCIC copies data from DIN ports to the internal storage and serves it to DRAMSim with other transaction information. In contrast, the input of WE port is low, SCIC generates output signals and bring them to the DOUT ports according to data of the internal storage at the I/O completion times.

Since the memory system of DRAMSim supports several transaction re-ordering schemes, the I/O completion sequence can be changed as compared to the sequence of input requests. Therefore, SCIC is asked to preserve internal storage spaces until DRAMSim completes to read/write requests. To appropriately manage storage resources, SCIC maintains a map, called scoreboard; Using the scorebard, SCIC looks up the allocated storage and appropriately releases it when the memory system of DRAMSim completes relevant transaction. Unlike DRAMSim, since SCIC provides and manages the internal storage pool itself for memory transactions, which is not perceived by users, the CPU model is able to deal the memory system in a more general way. It should be noted that all aforementioned progress of internal storage management is performed without time having passed, call delta cycle of SystemC. That is, overheads for managing internal buffer have no impact on deciding the memory latency.


PIC

Figure 3: Timing diagram for write operation.


3.2 Stimulus

In addition to enabling DRAMSim to be integrated with comprehensive pin-level system simulation models, this work provides a CPU model so that SCIC operate without other components assistance.; the CPU model is called stimulus. The main role of the stimulus is handing memory instructions, which are specified on a user trace file, based on the SCIC protocol interface. Since SCIC operates with double-rate clock signal, the stimulus loads memory instructions from a given trace file at the both low and high clock; it makes output signals regarding destination addresses, data and memory access types and serves them to corresponding SCIC input ports. If the memory system of SCIC are busy to handing outstanding memory transaction, the stimulus suspends loaded memory transactions and resumes them in later. In the case that the stimulus has no instruction for execution, it makes the DE signal a falling edge, thereby informing such idle situation to the memory system. When users employs specific CPU model simulator working with SystemC or HDL, this module can be removed.

3.3 Compiling DRAMSim with SCIC

To compile DRAMSim with pin-level protocol interfaces, there are two activities that a user has to: 1) inform System C library location and 2) define the preprocessor which is named by USE_SYSTEM_C_INTERFACE. Since routines related SystemC were surrounded by the preprocessor, user should make the value of it to ’1’ at compile time, thereby enabling the routines associated with SCIC. Otherwise, the simulator works with function-style interface that DRAMSim employs. An example of makefile is described in Table 3.



# Configurations related to System C library

TARGET_ARCH = linux

SYSTEMC = /home/MJ/Develop/systemc-2.2.0

INCDIR = -I. -I.. -I$(SYSTEMC)/include

LIBDIR = -L. -L.. -L$(SYSTEMC)/lib-$(TARGET_ARCH)

LIBS = -lsystemc -lm $(EXTRA_LIBS)

# The preprocessor for SCIC

CXXFLAGS=-DNO_STORAGE -Wall -DDEBUG_BUILD -DUSE_SYSTEM_C_INTERFACE=1


Table 3: An example of makefile for compling memory system with SystemC.

4 SCIC System Interface

This section describes pin-level protocol interfaces for read and write operations of the SCIC. For both operations, the first thing that a CPU model makes sure is checking on the output signal of the R/B port is low. All I/O requests given by the CPU model will be discarded if the memory transaction queue of DRAMSim is already full to handle outstanding transactions.

4.1 Write Operation

Figure 3 illustrates a timing  diagram for a write operation. The writing procedure for SCIC is as follows:

From detecting falling edge of the SCIC R/B port, a CPU model is able to put a destination address into the DA input port and to send data to the DIN ports of SCIC. According to the burst length definition of DRAMSim configuration, the data can be delivered by the four DIN ports (from DIN0 to DIN3) based on a format of big-endean. Just in case, if the burst length is shorter than the number of bus bits that DDR memory family provides, an invalidate signature can be used; the invalidate signature is defined by a hexadecimal value which is filled by all ’f’s. Once delivering the destination address and data to corresponding ports are done, the CPU model has to make the WE input port and DE input port of SCIC rising edge for the memory system to start serving memory transaction. the CPU model has to cleanup the signal of the DE port after issuing the write request. There are two reason behind the cleanup process. The first is SCIC is able to serve outstanding memory transactions when the signal of the DE input port is low. The second is an invalid memory transaction may corrupt memory data consistency because SCIC attempts to write data whenever the signal of the DE port is high.


PIC

Figure 4: Timing diagram for read operation.


4.2 Read Operation

The process issuing read requests to SCIC is not much different to what the CPU model has to do in write operation. The specific read timing behavior of SCIC is illustrated in Figure 4. Specifically, the process of reads is defined as follows:

At a moment that requests read transaction, only one difference as compared to write operation is that the CPU model makes the WE input port falling edge. In later, the CPU model can get data through the four DOUT ports of SCIC. Specifically, SCIC makes the CA port rising edge when the memory system completes to read data from a designated bank. That is, with an event of the CA port, the CPU model is able to sense available signal of SCIC. Note that, as stated earlier, the output sequence could be different compared to the sequence of inputs because of scheduling policy of DRAMSim’s bust interface unit. Therefore, the CPU model is responsible for identifying read transaction output order. Since SCIC provides information that is given by requesting read transaction through the AO and WO port, the CPU model can recognize which memory transaction request has been completed by looking into the two output ports. In a similar way to sending data, The CPU model is able to assemble the data by using big-endean format. Output signals of the four DOUT ports will be cleared after one cycle with the invalidate signature (all ’f’s) by the SCIC.

5 References

[1]     JEDEC. http://www.jedec.org/standards-documents. In JEDEC STANDARD DOCUMENTS, 2011.

[2]     L. Cai and D. Gajski. Transaction level modeling: An overview. In Proceedings of CODES-ISSS, October 2003.

[3]     P. Rosenfeld, E. Cooper-Balis, and B. Jacob. DRAMSim2: A cycle accurate memory system simulator. In IEEE Computer Architecture Letters, January 2011.

[4]     J. Shalf and D. Donofrio. http://www.nersc.gov/projects/codex. In CoDEx: CoDesign for Exascale, Architectural Simulation and Modeling for Exascale Platform Development. Lawrence Berkeley National Laboratory, 2011.

[5]     Tensilica. http://www.nersc.gov/projects/codex. In The XTensa Modeling Protocol (XTMP) and XTensa SystemC Modeling (XTSC) for Fast System Modeling and Simulation, 2011.