Skip to content

🧰 ezpz CLI⚓︎

Once installed, ezpz provides a CLI with a few useful utilities to help launch distributed PyTorch applications.

Explicitly, these are ezpz <command>:

  • 🚀 ezpz launch: Launch commands with automatic job scheduler detection (PBS, Slurm)
    • 💯 ezpz test: Run simple distributed smoke test1.
    • 📊 ezpz benchmark: Run all examples and generate a report
    • 📮 ezpz submit: Submit jobs to PBS (qsub) or SLURM (sbatch); generates job scripts automatically
  • 📦 ezpz yeet: Distribute files (envs, models, datasets, etc.) to all worker nodes via parallel rsync
    • 🗜️ ezpz tar-env: Package current Python environment as a tarball
  • 🩺 ezpz doctor: Health check your environment
  • 💀 ezpz kill: Kill ezpz-launched python processes (local node or --all-nodes)
  • 📝 ezpz.examples: Collection of distributed training examples (DDP, FSDP, ViT, FSDP+TP, diffusion, HF, HF Trainer, inference)

  • ezpz --help

    To see the list of available commands, run:

    $ ezpz --help
    Usage: ezpz [OPTIONS] COMMAND [ARGS]...
    
    ezpz distributed utilities.
    
    Options:
    --version   Show the version and exit.
    -h, --help  Show this message and exit.
    
    Commands:
    benchmark  Run all ezpz examples sequentially and generate a report.
    doctor     Inspect the environment for ezpz launch readiness.
    kill       Kill ezpz-launched python processes (or any matching pattern).
    launch     Launch a command across the active scheduler.
    submit     Submit a job to the active scheduler (PBS/SLURM).
    tar-env    Create (or locate) a tarball for the current environment.
    test       Run the distributed smoke test.
    yeet       Distribute files (envs, models, datasets, etc.) to worker nodes.
    

  1. This is really just a wrapper around:

    ezpz launch python3 -m ezpz.examples.test