Media Summary: Episode 83 of the Stanford MLSys Seminar Series! Shashank Shekhar, Independent Researcher About the Speaker: Shashank Shekhar is an independent machine learning ... Deploying an ML model is just the beginning! Discover why sustained performance in production demands constant vigilance and ...

Mlsys22 Talk Efficient Strong Scaling Through Burst Parallel Training Deeppool - Detailed Analysis & Overview

Episode 83 of the Stanford MLSys Seminar Series! Shashank Shekhar, Independent Researcher About the Speaker: Shashank Shekhar is an independent machine learning ... Deploying an ML model is just the beginning! Discover why sustained performance in production demands constant vigilance and ... Drawing from multiple Scala LLM workshops we conducted this past year, I will share insights to significantly enhance your AI ... As datasets and models grow in complexity, mastering distributed 00:00 Week 05 Kahoot! (Winston/Min) 15:00 LECTURE START -

ASPLOS'24: International Conference on Architectural Support for Programming Languages and Operating Systems Lightning ... Discover how DDP harnesses multiple GPUs across machines to handle larger models and datasets, accelerating the

Photo Gallery

MLSys22 talk: Efficient Strong Scaling Through Burst Parallel Training (DeepPool)
Training LLMs at Scale - Deepak Narayanan | Stanford MLSys #83
Lecture 3 - Parallel scaling concerns
Scaling Large Language Models: Getting Started with Large-Scale Parallel Training of LLMs
How to Train Billion-Parameter Models: DeepSpeed ZeRO vs. PyTorch FSDP
Keeping ML Models Healthy Monitoring Alerting and Retraining
Keras 3 Distributed Training: Scaling Models with JAX using DataParallel, and ModelParallel
Efficient programming with Scala and LLMs by Tomasz Godzik | Scalar 2026
Scaling PyTorch: Distributed Data Parallel & Model Parallelism
Distributed ML Talk @ UC Berkeley
06: Scaling Up, Training and Parallelism – Large Language Models (NUS CS6101 NUS.WING)
ASPLOS'24 - Lightning Talks - Session 11C - AdaPipe: Optimizing Pipeline Parallelism with Adaptive R
Sponsored
View Detailed Profile
MLSys22 talk: Efficient Strong Scaling Through Burst Parallel Training (DeepPool)

MLSys22 talk: Efficient Strong Scaling Through Burst Parallel Training (DeepPool)

A pre-recording of the

Training LLMs at Scale - Deepak Narayanan | Stanford MLSys #83

Training LLMs at Scale - Deepak Narayanan | Stanford MLSys #83

Episode 83 of the Stanford MLSys Seminar Series!

Lecture 3 - Parallel scaling concerns

Lecture 3 - Parallel scaling concerns

Once you have split your problem up into

Scaling Large Language Models: Getting Started with Large-Scale Parallel Training of LLMs

Scaling Large Language Models: Getting Started with Large-Scale Parallel Training of LLMs

Shashank Shekhar, Independent Researcher About the Speaker: Shashank Shekhar is an independent machine learning ...

How to Train Billion-Parameter Models: DeepSpeed ZeRO vs. PyTorch FSDP

How to Train Billion-Parameter Models: DeepSpeed ZeRO vs. PyTorch FSDP

Ever wonder how companies

Sponsored
Keeping ML Models Healthy Monitoring Alerting and Retraining

Keeping ML Models Healthy Monitoring Alerting and Retraining

Deploying an ML model is just the beginning! Discover why sustained performance in production demands constant vigilance and ...

Keras 3 Distributed Training: Scaling Models with JAX using DataParallel, and ModelParallel

Keras 3 Distributed Training: Scaling Models with JAX using DataParallel, and ModelParallel

Training

Efficient programming with Scala and LLMs by Tomasz Godzik | Scalar 2026

Efficient programming with Scala and LLMs by Tomasz Godzik | Scalar 2026

Drawing from multiple Scala LLM workshops we conducted this past year, I will share insights to significantly enhance your AI ...

Scaling PyTorch: Distributed Data Parallel & Model Parallelism

Scaling PyTorch: Distributed Data Parallel & Model Parallelism

As datasets and models grow in complexity, mastering distributed

Distributed ML Talk @ UC Berkeley

Distributed ML Talk @ UC Berkeley

Here's a

06: Scaling Up, Training and Parallelism – Large Language Models (NUS CS6101 NUS.WING)

06: Scaling Up, Training and Parallelism – Large Language Models (NUS CS6101 NUS.WING)

00:00 Week 05 Kahoot! (Winston/Min) 15:00 LECTURE START -

ASPLOS'24 - Lightning Talks - Session 11C - AdaPipe: Optimizing Pipeline Parallelism with Adaptive R

ASPLOS'24 - Lightning Talks - Session 11C - AdaPipe: Optimizing Pipeline Parallelism with Adaptive R

ASPLOS'24: International Conference on Architectural Support for Programming Languages and Operating Systems Lightning ...

How DDP works || Distributed Data Parallel || Quick explained

How DDP works || Distributed Data Parallel || Quick explained

Discover how DDP harnesses multiple GPUs across machines to handle larger models and datasets, accelerating the