PARSA at EPFL engages in research and educational activities to pioneer future server design.

ARIES: Architectures and Algorithms for Emerging Reconfigurable Systems

As transistor scaling is slowing down, other opportunities for ensuring continuous performance increase have to be explored. Field programmable gate arrays (FPGAs) are in the spotlight these days: not only due to their malleability and energy efficiency, but also because FPGAs have recently been integrated into the cloud. The latter makes them available to everyone in need of the immense computing power and data throughput they can offer. However, one important issue needs to be resolved first—the time to compile an industrial-scale design for an FPGA must be drastically reduced. Researchers have been looking for ways to accelerate FPGA compilation through parallelism. However, the ideal solution has not been found yet. This project addresses the said challenges by exploring new and effective strategies for software and hardware acceleration of the main computational bottlenecks: FPGA placement and routing.

CloudSuite: A Benchmark Suite for Cloud Services

Cloud computing is emerging as a dominant computing platform for providing scalable online services to a global client base. Today’s popular online services (e.g., web search, social networking, and business analytics) are characterized by massive working sets, high degrees of parallelism, and real-time constraints. These characteristics set scale-out applications apart from desktop (SPEC), parallel (PARSEC), and traditional commercial server applications. In order to stimulate research in the field of cloud and data-centric computing, we have created CloudSuite, A benchmark suite consisting of popular first-party cloud workloads in prevalent ISAs. The suite consists of eight benchmarks representing popular online services and analytic workloads in datacenters and are containerized for use.

Cloud-Native Server Architecture

With the end of Dennard Scaling and a slowdown in Moore’s Law, network fabrics and storage capacity are now improving at faster rates than logic density in recent years. As such to mitigate the impending logic speed bottleneck, a careful integration and specialisation of computation and data services has emerged as a promising approach to improving performance, cost and efficiency in servers. The Cloud-Native Server Architecture project seeks server architectures where common services in datacenters are implemented through a cross-layer integration and specialisation from algorithms all the way down to silicon and data movement is minimised through a tighter coupling of logic with emerging storage and the network technologies.

 

ColTrain: Co-Located Training and Inference for ML

DNN training and inference are composed of similar basic operators but with fundamentally different requirements. The former is throughput bound and relies on high precision floating point arithmetic for convergence while the latter is latency-bound and tolerant to low-precision arithmetic. Nevertheless, both workloads exhibit high computational demands and can benefit from using hardware accelerators. Unfortunately, the disparity in resource requirements forces datacenter operators to choose between deploying custom accelerators for training and inference or using training accelerators for inference. These options are both suboptimal: the former results in datacenter heterogeneity, increasing management costs while the latter results in inefficient inference. Furthermore, dedicated inference accelerators face load fluctuations, leading to overprovisioning and, ultimately, low average utilization. The ColTrain project’s mission is to restore datacenter homogeneity by thorough technologies to co-locate training and inference without sacrificing either inference efficiency or quality of service (QoS) guarantees.

Midgard: Reinventing Virtual Memory

Virtual Memory is a pillar for memory isolation, protection, and security in digital platforms and is used even in widely-used hardware accelerators like GPU, NICs, FPGAs, and secure CPU architectures. As services host more data in server memory for faster access, the traditional virtual memory implementation has emerged as a bottleneck. Modern graph analytics workloads (e.g., on social media) spend over 20% of their time in virtual memory translation and protection checks. Midgard introduces an intermediate namespace for data lookups and memory protection in the memory system without making any modifications to the application software or the programming interface in modern platforms (e.g., Linux, Android, macOS/iOS). With Midgard, data lookups are done directly in the Midgard namespace in on-chip memory, and translation to fixed-size pages is only needed to access physical memory. Midgard future-proofs virtual memory as the overhead of translation and protection check to physical memory decrease with future products’ growing on-chip memory capacity.

Quick & Flexible Computer Architecture Simulation

Computer systems hardware and software designers have traditionally relied on fast emulation and full-system simulation to instrument a design of interest, develop and debug system software, model new hardware components and measure design metrics of interest. Post-Moore platforms in recent years have not only seen a proliferation of accelerators but also the need for system hardware/software co-design to help integrate heterogeneity into the system stack. Effective integration requires open-source tools that enable fast instrumentation of application and system software, full-system models for network and storage controllers, and models of multi-node computer systems. Full-system server instrumentation and modeling requires several orders of magnitude in speed to enable practical turnaround. QFlex is a family of full-system instrumentation tools based on QEMU which currently supports the ARM ISA including a trace-based model to quickly instrument existing QEMU images, timing models to simulate multi-core CPUs in detail and an FPGA-accelerated mode which enables high-performing instrumented code. 

Secure FPGAs in the Cloud

Today, the biggest cloud datacenters are heterogeneous by design: they incorporate CPUs, GPUs, and FPGAs, with the goal of providing best computing performance for very versatile services and workloads. Cloud computing implies a multitenancy environment where users share the same computing environment. In that milieu, it is important to consider the security risks arising from FPGAs being shared by multiple users, knowing that FPGAs are susceptible to security hazards, such as denial-of-service attacks, side-channel attacks, and timing-fault attacks. This project aims to systematically explore the security vulnerabilities of FPGAs and to develop methodologies, algorithms, and tools that will enable cloud providers to prevent, protect from, recover from and identify (locate) malicious users.

HARNESS: Servers for the Post-Moore Era

Server architecture is entering an age of heterogeneity, as silicon performance scaling approaches its horizon and the economics of scale enables cost-effective deployment of custom silicon in the datacentre. Traditionally, customized components have been deployed as discrete expansion boards to reduce cost and design complexity to ensure compatibility with rigidly designed CPU silicon and its surrounding infrastructure. A prime example of this pattern is the tethering of todays Remote Direct Memory Access (RDMA) network interface cards to commodity PCIe interconnects. Although using a commodity I/O interconnect has enabled RDMA to be deployed at large scales in today’s datacentres, our prior work on the “Scale-Out NUMA” project has shown that judiciously integrating architectural support directly on the CPU silicon provides significant benefits. Namely, integration affords lower RDMA latency and the ability to perform richer operations such as atomic accesses to software objects, and remote procedure calls. The HARNESS project therefore aims at co-designing server silicon with software to support the performance-critical primitives in the datacentre – in particular those pertaining to networked systems and storage stacks.

Scale-Out NUMA

Emerging datacenter applications operate on vast datasets that are kept in DRAM to minimize latency. The large number of servers needed to accommodate this massive memory footprint requires frequent server-to-server communication in applications such as key-value stores and graph-based applications that rely on large irregular data structures. The fine-grained nature of the accesses is a poor match to commodity networking technologies, including RDMA, which incur delays of 10-1000x over local DRAM operations. Scale-Out NUMA is an architecture, programming model, and communication protocol for low-latency, distributed in-memory processing, designed to bridge the latency gap between local and remote memory access.