Cloud computing is emerging as a dominant computing platform for providing scalable online services to a global client base. Today's popular online services (e.g., web search, social networking, and business analytics) are characterized by massive working sets, high degrees of parallelism, and real-time constraints. These characteristics set scale-out applications apart from desktop (SPEC), parallel (PARSEC), and traditional commercial server applications. In order to stimulate research in the field of cloud and data-centric computing, we have created CloudSuite, a benchmark suite based on real-world online services. CloudSuite covers a broad range of application categories commonly found in today's datacenters. The first release includes data analytics, data serving, media streaming, large-scale and computation-intensive tasks, web search, and web serving.
Computer architects have long relied on software simulation to measure dynamic performance metrics (e.g., CPI) of a proposed design. Unfortunately, with the ever-growing size and complexity of modern microprocessors, detailed software simulators have become four or more orders of magnitude slower than their hardware counterparts. The low simulation throughput is especially prohibitive for large-scale multiprocessor systems because the simulation turnaround for these systems grows at least linearly with the number of processors. This project proposes the SimFlex framework to support fast, accurate and flexible simulation of large-scale systems. SimFlex applies rigorous statistical sampling theory to reduce simulation turnaround by several orders of magnitude, while achieving high accuracy and confidence in estimates. SimFlex relies heavily on well-defined component interface models to facilitate both model integration and compile-time simultaor optimization.
Emerging datacenter applications operate on vast datasets that are kept in DRAM to minimize latency. The large number of servers needed to accommodate this massive mem- ory footprint requires frequent server-to-server communication in applications such as key-value stores and graph-based applications that rely on large irregular data structures. The fine-grained nature of the accesses is a poor match to commodity networking technologies, including RDMA, which incur delays of 10-1000x over local DRAM operations. Scale-Out NUMA is an architecture, programming model, and communication protocol for low-latency, distributed in-memory processing, designed to bridge the latency gap between local and remote memory access.
VISA: Vertically Integrated Server Architecture
The impending plateau of voltage levels with a continued increase in chip density (according to Moore's law) is causing energy to be the number one concern in the design of future digital computing platforms. These platforms are likely to be built on "dark silicon", where a limited power budget allows only for a fraction of a chip's real-estate to be active at a time, allowing for the rest of the chip to be turned off or "dark". The Vertically-Integrated Server Architecture (VISA) project targets design for dark silicon where an integrated hardware/software approach to specialization implements performance- and energy-hungry services with minimal energy. Specialization allows future technologies to utilize dark silicon effectively and maintain a constant power envelope by keeping only the needed services active on-chip, monitoring and shutting off unneeded resources. Specialization maximizes transistor efficiency and makes better use of available real-estate, achieving two or more orders of magnitude reduction in energy through a hand-in-hand collaboration of software and hardware.