Skip to content

๐Ÿ“ Ready-to-go Examplesโš“๏ธŽ

This section contains ready-to-run examples demonstrating various features of the ezpz library.

New to ezpz? Start with ezpz.examples.test for a minimal DDP training loop, then move to ezpz.examples.fsdp when you're ready to shard model parameters. The remaining examples layer on additional features โ€” Vision Transformers, tensor parallelism, Hugging Face integration โ€” so pick whichever matches your workload.

Prerequisitesโš“๏ธŽ

  • ezpz installed: pip install ezpz or pip install git+https://github.com/saforem2/ezpz
  • PyTorch (auto-detected at runtime)
  • MPI for multi-GPU training (mpi4py + MPICH or OpenMPI). Single-GPU and CPU work without MPI.

Running Examplesโš“๏ธŽ

All examples use the same launch pattern:

ezpz launch python3 -m ezpz.examples.<name> [args]

For example, to run the test example:

ezpz launch python3 -m ezpz.examples.test

On a laptop this uses a local mpirun; inside a PBS or SLURM job, the scheduler is auto-detected.

Picking an Exampleโš“๏ธŽ

Links Example When to use Level
ยท ยท test Starting point โ€” simplest DDP training loop Beginner
ยท ยท fsdp Model too large for one GPU, or memory-efficient training Beginner
ยท ยท vit Vision Transformer with FSDP + optional torch.compile Intermediate
ยท ยท fsdp_tp Very large models needing 2D parallelism (FSDP + TP) Advanced
ยท ยท diffusion Diffusion model training with FSDP Intermediate
ยท ยท hf Fine-tune causal LM with Accelerate + FSDP Intermediate
ยท ยท hf_trainer Using HF Trainer with ezpz's launcher Beginner
ยท ยท inference Distributed HF inference (benchmark / generate / eval modes) Intermediate