CAMEL's High Performance Computing Systems

  • KANDEMIR -- All-Flash Array based High Performance Computing Testbed

Kandemir is a HPC testbed that only employs NVM/SSD as its storage system. We plan to build up Kandemir with around 3,000 heavy processor cores and more than 50,000 coprocessor cores. All these cluster testbed cores will be connected to around 600 flash-based SSD systems and a few FPGA-based NVDIMM systems that we are designing. The current development is in PHASE1, which initially has around 400 Haswell cores and 10K coprocessor cores.

PHASE1: Kandemir PHASE1 is an NVM-equipped HPC that employs 472 Haswell-EP Xeon processor cores, 10,167 coprocessor cores and 3.5TB DRAM on an all-flash array system. In this research tool, we're in analysis of a wide spectrum of flash-based storage workloads and in tailoring a parallel file system (Lustre) to accommodate the underlying high performance SSD systems. In this phase, we're building up the all-flash array systems with sixty different NVM Express based SSD devices and two hundreds SATA-based high performance SSD devices.

  • WILSON&ALLEGRETTO -- SSD-based Object Storage Cluster

Wilson&Allegretto are three testbeds for object storage cluster, each employing a single mata data server and three object storage nodes, each of them employs two 3GHz Xeon E5-2620 processors (8 cores), 12MB on-chip cache, 32GB DRAM (DDR4-2133 ECC memory). Each node employ two local high performance SSD, and for parallel file system, individual object storages consist 12 MLC-based SSDs with a hardware RAID controller. Wilson&Allegretto also contain three evaluation nodes (as a computation node) each employing 256GB DRAM and two octa-core E5 series Xeon processor. This node is used for characterizing object-based file system (e.g., Ceph) performance and their overheads, and for a development of SSD-oriented parallel file system. In total, there are more than 150 high performance SSDs and 240 cores.

  • SHALF -- NVDIMM-based Compute Node

Shalf is a development cluster for a NVM/DRAM hybrid memory, which consists of three specialized computation nodes. Each node employs real 32GB NVDIMM devices and 32GB ECC-R DDR4 ECC DRAM modules with two deca-core 2.3 GHz processors. All the NVDIMM nodes in Shalf are connected dual-port Gigabit ethernet network. Shalf is used to devise a new type of file system, system software and scientific applications that utilize NVDIMM.

  • DONOFRIO -- FGPA/CPU Hybrid Server Node

Donofrio is a customized acceleration compute node that contains Xeon processor and Altera FPGA accelerator, which are strongly connected. We believe that new heterogeneous architectures will go beyond homogeneous parallelism, embrace heterogeneity, and exploit the bounty of transistors to incorporate application-customized hardware. Donofrio is used for developing diverse accelerators and FPGA modules for hybrid computing.

Emperical Analysis Equipment

  • ZHANG -- SSD and GPU Cluster Testbed

Zhang is a small cluster that connects tens of open testbed platforms. Each testbed platform is capable of employing four GPGPU devices and NVM Express SSD devices. These platforms are used for not only measuring the device-level performance (e.g., temperature, latency, throughput and power) but also stressing out the actual device to evaluate durability and reliability. Especially, to capture the dynamic thermal performance, specific devices are monitored by FLIR thermal imaging camera, and their dynamic behaviors (based on the differences) are automatically recorded by imaging software.

  • Multicore/FGPA Accelerator and Evaluation Boards

There are around five multicore-based accelerators that can be attached to PCIe interfacee. These accelerators work with our customized general purpose compuation kernels, and used for forming a low-power heterogeous computing cluster. We also have different types of Xilinx and Altera FPGA evaluation boards, which can be used for a wide spectrum of data processing and performing low power heterogeneous computation.