Week |
#1 | Sep. 17: Introduction |
|
The Task of the Referee |
|
#2 | Sep. 24: Benchmarks and Analytics |
|
MLPerf training benchmark [Google et al., MLSys 20] |
|
|
MLPerf Inference Benchmark [Harvard et al., ISCA 20] |
|
#3 | Oct. 01: Systems & ML |
|
A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms [Harvard, MLSys 2020] |
|
|
The Case for Learned Index Structures [MIT and Google, SIGMOD 18] |
|
#4 | Oct. 08: Federated Learning |
|
Towards federated learning at scale: System design [Google, MLSys 19] |
|
|
The Non-IID Data Quagmire of Decentralized Machine Learning [ETHZ and CMU, ICML 20] |
|
#5 | Oct. 15: Decentralized Learning |
|
SwarmSGD: Scalable Decentralized SGD with Local Updates [IST Austria, Arxiv 2020] |
|
|
Byzantine-Resilient Multi-Agent Optimization [MIT, IEEE 2020] |
|
#6 | Oct. 22: Deep Learning with Low-precision Computations |
|
Training DNNs with Hybrid Block Floating Point [EPFL, NeurIPS 18] |
|
|
Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks [Xilinx, MLSys 20] |
|
#7 | Oct. 29: Training with Low-precision Gradients |
|
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training [Tsinghua & Nvidia, ICLR 18] |
|
|
PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning [EPFL, arXiv 20] |
|
#8 | Nov. 05: Distributed Imagenet & Transformers Training |
|
Beyond Data and Model Parallelism for Deep Neural Networks [Stanford, MLSys 19] |
|
|
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism [Nvidia, arXiv 20] |
|
#9 | Nov. 12: Training with Model Parallelism |
|
PipeDream: generalized pipeline parallelism for DNN training [Mircosoft et al., SOSP 19] |
|
|
Decoupled Parallel Backpropagation with Convergence Guarantee [Pittsburgh, ICML 18] |
|
#10 | Nov. 19: Neural Architecture Search |
|
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware [MIT, ICLR 19] |
|
|
MnasNet: Platform-Aware Neural Architecture Search for Mobile [Google, CVPR 19] |
|
#11 | Nov. 26: Domain Specific Languages for ML |
|
Blink: Fast and Generic Collectives for Distributed ML [Microsoft et al., MLSys 20] |
|
|
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning [Washington, OSDI 18] |
|
#12 | Dec. 03: ML inference at scale |
|
Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective [Facebook, HPCA 18] |
|
|
The Architectural Implications of Facebookâs DNN-based Personalized Recommendation [Facebook, HPCA 20] |
|
#13 | Dec. 10: Hardware Accelerators for Deep Learning |
|
Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture [UCB et al., MICRO 19] |
|
|
Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices [MIT, IEEE JETCAS 19] |
|
#14 | Dec. 17: Security |
|
Game of Threads: Enabling Asynchronous Poisoning Attacks [UIUC, ASPLOS 20] |
|
|
Stealing Machine Learning Models via Prediction APIs [EPFL, USENIX Security, 16] |
|