🧰 ezpz CLI⚓︎
Once installed, ezpz provides a CLI with a few useful utilities to help launch
distributed PyTorch applications.
Explicitly, these are ezpz <command>:
- 🚀
ezpz launch: Launch commands with automatic job scheduler detection (PBS, Slurm)- 💯
ezpz test: Run simple distributed smoke test1. - 📊
ezpz benchmark: Run all examples and generate a report - 📮
ezpz submit: Submit jobs to PBS (qsub) or SLURM (sbatch); generates job scripts automatically
- 💯
- 📦
ezpz yeet: Distribute files (envs, models, datasets, etc.) to all worker nodes via parallel rsync- 🗜️
ezpz tar-env: Package current Python environment as a tarball
- 🗜️
- 🩺
ezpz doctor: Health check your environment - 💀
ezpz kill: Kill ezpz-launched python processes (local node or--all-nodes) -
📝
ezpz.examples: Collection of distributed training examples (DDP, FSDP, ViT, FSDP+TP, diffusion, HF, HF Trainer, inference) -
ezpz --helpTo see the list of available commands, run:
$ ezpz --help Usage: ezpz [OPTIONS] COMMAND [ARGS]... ezpz distributed utilities. Options: --version Show the version and exit. -h, --help Show this message and exit. Commands: benchmark Run all ezpz examples sequentially and generate a report. doctor Inspect the environment for ezpz launch readiness. kill Kill ezpz-launched python processes (or any matching pattern). launch Launch a command across the active scheduler. submit Submit a job to the active scheduler (PBS/SLURM). tar-env Create (or locate) a tarball for the current environment. test Run the distributed smoke test. yeet Distribute files (envs, models, datasets, etc.) to worker nodes.
-
This is really just a wrapper around:
↩