x

Week Date Topic Paper1 Presenter(s) Paper2 Presenter(s) Slides
#1 Feb. 21 Introduction The Task of the Referee Anne-Marie, Babak, Martin PDF
#2 Feb. 28 Benchmarks and Analytics MLPerf training benchmark Ayan MLPerf inference benchmark Ayan PDF
#3 Mar. 07 ML inference at Scale Efficiently Scaling Transformer Inference Bugra Orca: A Distributed Serving System for Transformer-Based Generative Models Bugra PDF
#4 Mar. 14 Large Language Models (LLMs) BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Amirkeivan LLaMA: Open and Efficient Foundation Language Models Amirkeivan PDF
#5 Mar. 21 Sustainability Sustainable AI: Environmental Implications, Challenges and Opportunities Siping Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training Shanqing PDF
#6 Mar. 28 Systems & ML A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms Qingxuan Walle: An End-to-End, General-Purpose, and Large-Scale Production System for Device-Cloud Collaborative Machine Learning Qingxuan PDF
#7 Apr. 04 Deep Learning with Low-Precision Encoding LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale Martin FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding Ayan PDF
#8 Apr. 18 Hardware Accelerators for Deep Learning Ten Lessons From Three Generations Shaped Google’s TPUv4i Vladimir RaPiD: AI Accelerator for Ultra-low Precision Training and Inference Sipin PDF
#9 Apr. 25 Sparsity in ML Pixelated Butterfly: Simple and Efficient Sparse Training for Neural Network Models Bettina CrAM: A Compression-Aware Minimizer Bettina PDF
#10 May. 02 Domain Specific Languages for ML Blink: Fast and Generic Collectives for Distributed ML Shanqing MSCCLang: Microsoft Collective Communication Language Shanqing PDF
#11 May. 09 New Training Paradigms Beyond Data and Model Parallelism for Deep Neural Networks Bugra Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning Simla
#12 May. 23 Federated Learning FedScale: Benchmarking Model and System Performance of Federated Learning at Scale Vladimir PAPAYA: Practical, Private, and Scalable Federated Learning Rishi
#13 May. 30 Decentralized Learning in Heterogeneous Environments Decentralized Training of Foundation Models in Heterogeneous Environments Siping SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient Rishi