Week |
Date |
Topic |
Paper1 |
Presenter(s) |
Paper2 |
Presenter(s) |
Slides |
#1 |
Feb. 21 |
Introduction |
The Task of the Referee |
Anne-Marie, Babak, Martin |
|
|
|
#2 |
Feb. 28 |
Benchmarks and Analytics |
MLPerf training benchmark |
Ayan |
MLPerf inference benchmark |
Ayan |
|
#3 |
Mar. 07 |
ML inference at Scale |
Efficiently Scaling Transformer Inference |
Bugra |
Orca: A Distributed Serving System for Transformer-Based Generative Models |
Bugra |
|
#4 |
Mar. 14 |
Large Language Models (LLMs) |
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model |
Amirkeivan |
LLaMA: Open and Efficient Foundation Language Models |
Amirkeivan |
|
#5 |
Mar. 21 |
Sustainability |
Sustainable AI: Environmental Implications, Challenges and Opportunities |
Siping |
Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training |
Shanqing |
|
#6 |
Mar. 28 |
Systems & ML |
A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms |
Qingxuan |
Walle: An End-to-End, General-Purpose, and Large-Scale Production System for Device-Cloud Collaborative Machine Learning |
Qingxuan |
|
#7 |
Apr. 04 |
Deep Learning with Low-Precision Encoding |
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale |
Martin |
FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding |
Ayan |
|
#8 |
Apr. 18 |
Hardware Accelerators for Deep Learning |
Ten Lessons From Three Generations Shaped Google’s TPUv4i |
Vladimir |
RaPiD: AI Accelerator for Ultra-low Precision Training and Inference |
Sipin |
|
#9 |
Apr. 25 |
Sparsity in ML |
Pixelated Butterfly: Simple and Efficient Sparse Training for Neural Network Models |
Bettina |
CrAM: A Compression-Aware Minimizer |
Bettina |
|
#10 |
May. 02 |
Domain Specific Languages for ML |
Blink: Fast and Generic Collectives for Distributed ML |
Shanqing |
MSCCLang: Microsoft Collective Communication Language |
Shanqing |
|
#11 |
May. 09 |
New Training Paradigms |
Beyond Data and Model Parallelism for Deep Neural Networks |
Bugra |
Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning |
Simla |
|
#12 |
May. 23 |
Federated Learning |
FedScale: Benchmarking Model and System Performance of Federated Learning at Scale |
Vladimir |
PAPAYA: Practical, Private, and Scalable Federated Learning |
Rishi |
|
#13 |
May. 30 |
Decentralized Learning in Heterogeneous Environments |
Decentralized Training of Foundation Models in Heterogeneous Environments |
Siping |
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient |
Rishi |
|