| Reader 1: Introduction | 
        	|   | Chapter 6 (Sections 6.1 & 6.2) of Hennessy & Patterson's Computer Architecture | 
|---|
	|   | Smith, Alan Jay. The task of the referee. Computer 23.4 (1990) | 
|---|
		| Reader 2: Project | 
        	|   | Platt, John R. Strong Inference: Certain systematic methods of scientific thinking may produce much more rapid progress than others. science 146.3642 (1964) | 
|---|
		| Reader 3: Evaluation | 
        	|   | Gupta, Udit, et al. Chasing carbon: The elusive environmental footprint of computing | 
|---|
	|   | Wunderlich, Roland E., et al. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. | 
|---|
		| Reader 4: Parallel Software Construction I | 
        	|   | Chapter 1.2 of Culler, D., Singh, J. P., & Gupta, A. (1999). Parallel computer architecture: a hardware/software approach. | 
|---|
		| Reader 5: Parallel Software Construction II | 
        	|   | Paszke, Adam, et al. Pytorch: An imperative style, high-performance deep learning library. | 
|---|
	|   | Birrell, Andrew D., and Bruce Jay Nelson. Implementing remote procedure calls | 
|---|
		| Reader 6: Coherence | 
        	|   | Background: Chapter 6 and 7 of Nagarajan, Sorin, Hill, Wood's A Primer on Memory Consistency and Cache Coherence | 
|---|
	|   | Background slides from CS-307 regarding coherence | 
|---|
	|   | Moshovos, Andreas, et al. JETTY: Filtering snoops for reduced energy consumption in SMP servers | 
|---|
	|   | Ferdman, Michael, et al. Cuckoo directory: A scalable directory for many-core systems | 
|---|
		| Reader 7: Memory Ordering | 
        	|   | Background slides from CS-307 regarding hardware memory reordering | 
|---|
	|   | Background slides from CS-307 regarding compiler memory reordering | 
|---|
	|   | Adve, Sarita V., and Kourosh Gharachorloo. Shared memory consistency models: A tutorial | 
|---|
	|   | Blundell, Colin, Milo MK Martin, and Thomas F. Wenisch. Invisifence: performance-transparent memory ordering in conventional multiprocessors | 
|---|
		| Reader 8: Manycore Caches | 
        	|   | Chapter 2 of Balasubramonian, Jouppi & Muralimanohar's Multi-Core Cache Hierarchies | 
|---|
	|   | Xie, Yuejian, and Gabriel H. Loh. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches | 
|---|
	|   | Hardavellas, Nikos, et al. Reactive NUCA: near-optimal block placement and replication in distributed caches | 
|---|
		| Reader 9: GPU and Multithreading | 
        	|   | Background slides from CS307 regarding GPU introduction | 
|---|
	|   | Background slides from CS307 regarding GPU programming | 
|---|
	|   | Chapter 1 and 2 of Nemirovsky & Tullsen's Multithreading architecture | 
|---|
	|   | Choquette, Jack. Nvidia hopper h100 gpu: Scaling performance | 
|---|
		| Reader 10: Interconnect I | 
        	|   | Appendix F of Hennessy & Patterson's Computer Architecture | 
|---|
		| Reader 11: Interconnect II | 
        	|   | Chapter 6 of Jerger & Peh's On-Chip Networks | 
|---|
		| Reader 12: DRAM Caches | 
        	|   | Volos, Stavros, et al. Fat caches for scale-out servers | 
|---|
	|   | Sodani, Avinash, et al. Knights landing: Second-generation Intel XEON Phi product | 
|---|
		| Reader 13: Workload I | 
        	|   | Ferdman, Michael, et al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware | 
|---|
	|   | Gan, Yu, et al. An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems | 
|---|
		| Reader 14: Workload II | 
        	|   | Ustiugov, Dmitrii, et al. Benchmarking, analysis, and optimization of serverless function snapshots | 
|---|
	|   | Reddi, Vijay Janapa, et al. Mlperf inference benchmark | 
|---|
		| Reader 15: Servers | 
        	|   | Dean, Jeffrey, and Luiz André Barroso. The tail at scale. | 
|---|
	|   | Barroso, Luiz, et al. Attack of the killer microseconds. | 
|---|
		| Reader 16: Cloud-Native CPUs | 
        	|   | Lotfi-Kamran, Pejman, et al. Scale-out processors | 
|---|
	|   | Lotfi-Kamran, Pejman, Boris Grot, and Babak Falsafi. NOC-Out: Microarchitecting a scale-out processor | 
|---|
		| Reader 17: Cloud-Native Acclerators | 
        	|   | Biswas, Arijit, and Sailesh Kottapalli. Next-Gen Intel Xeon CPU-Sapphire Rapids | 
|---|
	|   | Kocberber, Onur, et al. Meet the walkers: Accelerating index traversals for in-memory databases | 
|---|
		| Reader 18: AI Acclerators | 
        	|   | Jouppi, Norman P., et al. Ten lessons from three generations shaped google's tpuv4i | 
|---|
	|   | Drumond, Mario, et al. Equinox: Training (for free) on a custom inference accelerator | 
|---|
		| Reader 19: Near-Memory Computing | 
        	|   | Drumond, Mario, et al. The mondrian data engine | 
|---|
	|   | Aga, Shaizeen, Supreet Jeloka, Arun Subramaniyan, Satish Narayanasamy, David Blaauw, and Reetuparna Das. Compute caches. | 
|---|
		| Reader 20: Cloud-Native Memory I | 
        	|   | Li, Huaicheng, et al. Pond: CXL-based memory pooling systems for cloud platforms | 
|---|
	|   | Ousterhout, John, et al. The case for RAMClouds: scalable high-performance storage entirely in DRAM. | 
|---|
		| Reader 21: Cloud-Native Memory II | 
        	|   | Gupta, Siddharth, et al. Rebooting virtual memory with Midgard | 
|---|
	|   | Bhattacharyya, Atri, et al. Securecells: A secure compartmentalized architecture. | 
|---|
		| Reader 22: Cloud-Native Networks I | 
        	|   | Daglis, Alexandros, Mark Sutherland, and Babak Falsafi. RPCValet: NI-driven tail-aware balancing of µs-scale RPCs | 
|---|
	|   | Sutherland, Mark, et al. The NeBuLa RPC-optimized architecture | 
|---|
		| Reader 23: Cloud-Native Networks II | 
        	|   | Karandikar, Sagar, et al. A hardware accelerator for protocol buffers. | 
|---|
	|   | Pourhabibi, Arash, et al. Cerebros: Evading the rpc tax in datacenters | 
|---|
		| Reader 24: Datacenters | 
        	|   | Chapter 1 and 2 of Barroso & Hölzle's The Datacenter as a Computer - An Introduction to the Design of Warehouse-Scale Machines | 
|---|