Week |
#1 | Feb. 21: Introduction |
|
The Task of the Referee |
|
#2 | Feb. 28: Benchmarks and Analytics |
|
MLPerf training benchmark [Google et al., MLSys 20] |
|
|
MLPerf Inference Benchmark [Harvard et al., ISCA 20] |
|
#3 | Mar. 07: ML inference at scale |
|
Efficiently Scaling Transformer Inference [arXiv 22] |
|
|
Orca: A Distributed Serving System for Transformer-Based Generative Models [OSDI 22] |
|
#4 | Mar. 14: Large Language Models (LLMs) |
|
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model [arXiv 22] |
|
|
LLaMA: Open and Efficient Foundation Language Models [MetaAI 23] |
|
#5 | Mar. 21: Sustainability |
|
Sustainable AI: Environmental Implications, Challenges and Opportunities [MlSys 22] |
|
|
Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training [NSDI 2023] |
|
#6 | Mar. 28: Systems & ML |
|
A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms [Harvard, MLSys 2020] |
|
|
Walle: An End-to-End, General-Purpose, and Large-Scale Production System for Device-Cloud Collaborative Machine Learning [OSDI 22] |
|
#7 | Apr. 04: Deep Learning with Low-Precision Encoding |
|
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale [NeurIPS 22] |
|
|
FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding [HPCA 22] |
|
#8 | Apr. 18: Hardware Accelerators for Deep Learning |
|
Ten Lessons From Three Generations Shaped Google’s TPUv4i [Google, ISCA 21] |
|
|
RaPiD: AI Accelerator for Ultra-low Precision Training and Inference [IBM, ISCA 21] |
|
#9 | Apr. 25: Sparsity in Deep Neural Networks |
|
Pixelated Butterfly: Simple and Efficient Sparse Training for Neural Network Models [ICLR 22] |
|
|
CrAM: A Compression-Aware Minimizer [arXiv 22] |
|
#10 | May. 02: Domain Specific Languages for ML |
|
Blink: Fast and Generic Collectives for Distributed ML [Microsoft et al., MLSys 20] |
|
|
MSCCLang: Microsoft Collective Communication Language [Microsoft, ASPLOS 23] |
|
#11 | May. 09: New Training Paradigms |
|
Beyond Data and Model Parallelism for Deep Neural Networks [Stanford, MLSys 19] |
|
|
Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning [OSDI 22] |
|
#12 | May. 23: Federated Learning |
|
FedScale: Benchmarking Model and System Performance of Federated Learning at Scale [PMLR 22] |
|
|
PAPAYA: Practical, Private, and Scalable Federated Learning [MLSys 2022] |
|
#13 | May. 30: Decentralized Learning in Heterogeneous Environments |
|
Decentralized Training of Foundation Models in Heterogeneous Environments [NeurIPS 22] |
|
|
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient [arXiv 23] |
|