| Reader 1: Introduction |
|
Chapter 6 (Sections 6.1 & 6.2) of Hennessy & Patterson's Computer Architecture |
| Reader 2: Evaluation Methodologies |
|
A. R. Alameldeen and D. A. Wood, IPC Considered Harmful for Multiprocessor Workloads |
|
T. F. Wenisch, R. E. Wunderlich, M. Ferdman, A. Ailamaki, Babak Falsafi, and James C. Hoe, SimFlex: Statistical Sampling of Computer System Simulation |
|
J. Demme and S. Sethumadhavan, Rapid Identification of Architectural Bottlenecks via Precise Event Counting |
| Reader 3: Programming Models |
|
Chapter 1 (Section 1.3.2 & 1.3.3) of Culler, Singh & Gupta’s Parallel Computer Architecture |
|
J. Dean and S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters |
|
Chapter 1 of Kirk & Hwu's Programming Massively Parallel Processors – A Hands-on Approach |
| Reader 4: Coherence |
|
Chapter 6 and 7 of Sorin, Hill & Wood’s A Primer on Memory Consistency and Cache Coherence |
|
M. Ferdman, P. Lotfi-Kamran, K. Balet, and B. Falsafi, Cuckoo Directory: A Scalable Directory for Many-Core Systems |
|
J. H. Kelm, D. R. Johnson, W. Tuohy, S. S. Lumetta, and S. J. Patel, Cohesion: A Hybrid Memory Model for Accelerators |
| Reader 5: Consistency |
|
S. Adve and K. Gharachorloo, Shared Memory Consistency Models: A Tutorial |
|
C. Blundell, M. M. K. Martin, and T. F. Wenisch, InvisiFence: Performance-Transparent Memory Ordering in Conventional Multiprocessors |
| Reader 6: Synchronization |
|
A. Kagi, D. Burger, and J. Goodman, Efficient Synchronization: Let Them Eat QOLB |
| Reader 7: Transactional Memory |
|
Chapter 5 (Section 5.1 & 5.2) of Harris, Larus & Rajwar's Transactional Memory |
| Reader 8: CMP Caches |
|
Chapter 2 of Balasubramonian, Jouppi & Muralimanohar's Multi-Core Cache Hierarchies |
|
N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki, Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches |
|
D. Sanchez and C. Kozyrakis, Vantage: Scalable and Efficient Fine-Grain Cache Partitioning |
| Reader 9: Interconnects |
|
L. M. Ni and P. K. McKinley, A Survey of Wormhole Routing in Direct Networks |
|
Chapter 1, 2, and 6 of Jerger & Peh's On-Chip Networks |
| Reader 10: Storage |
|
S. Perumal and P. Kritzinger, A Tutorial on RAID Storage Systems |
|
A. M. Caulfield, L. M. Grupp, and S. Swanson, Gordon: Using Flash Memory to Build Fast, Power-Efficient Clusters for Data-Intensive Applications |
| Reader 11: Scaling Trends |
|
N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki, Toward Dark Silicon in Servers |
|
M. Bohr, B. Joy, M. Muller, D. Patterson, C. Rowen, and L. Su, The Future of Microprocessors - Industry Luminaries Take Turns at the Crystal Ball |
| Reader 12: Servers |
|
M. Ferdman, A. Adileh, Y. O. Koçberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, Clearing the Clouds: A Study of Emerging Scale-Out Workloads on Modern Hardware |
|
P. Lotfi-Kamran, B. Grot, M. Ferdman, S. Volos, Y. O. Koçberber, J. Picorel, A. Adileh, D. Jevdjic, S. Idgunji, E. Ozer, and B. Falsafi, Scale-Out Processors |
| Reader 13: Datacenters/Supercomputers |
|
Chapter 1 and 2 of Barroso & Hölzle's The Datacenter as a Computer - An Introduction to the Design of Warehouse-Scale Machines |
|
D. Meisner, C. Sadler, L. A. Barroso, W.-D. Weber, and T. F. Wenisch, Power Management of On-line Data Intensive Services |
|
The BlueGene/L Team, An Overview of the BlueGene/L Supercomputer |
| Reader 14: GPUs/Smart phones/Tablets |
|
S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, and D. Glasco, GPUs and the Future of Parallel Computing |
|
A. Gutierrez, R. G. Dreslinski, T. F. Wenisch, T. Mudge, A. Saidi, C. Emmons, and N. Paver, Full-System Analysis and Characterization of Interactive Smartphone Applications |
|
P. Greenhalgh, Big.LITTLE Processing with ARM Cortex-A15 & Cortex-A7: Improving Energy Efficiency in High-Performance Mobile Platforms |