๐ ezpz launchโ๏ธ
Single entry point for launching distributed applications.
This will:
- Automatically detect your PBS/Slurm job and
- Launch
<cmd>across all available accelerators.
This is done by detecting if ezpz launch is being executed from inside a
PBS/Slurm job2.
If so, it determines the specifics of the active job (number of nodes, and
number of GPUs per node), and uses this information to build and execute the
appropriate launch command (e.g. mpiexec, srun).
When not running inside a PBS/Slurm job, ezpz launch falls back to mpirun
with sensible defaults.
Arguments can be passed through to the mpiexec/srun launcher by separating
them from the <cmd> with --1, e.g.:
For example, to run with 8 processes total, 4 processes per node, on 2 hosts, we can:
Assuming your current job can satisfy this (i.e. at least 4 accelerators per
node, and at least 2 nodes), this would launch python3 -m
ezpz.examples.fsdp_tp across 8 processes, 4 per node, on the first two hosts
allocated to your job.
-
ezpz launch --helpezpz launch --help usage: ezpz launch [-h] [--print-source] [--filter FILTER [FILTER ...]] [-n NPROC] [-ppn NPROC_PER_NODE] [-nh NHOSTS] [--hostfile HOSTFILE] ... Launch a command on the current PBS/SLURM job. Additional `<launcher flags>` can be passed through directly to the launcher by including '--' as a separator before the command. Examples: ezpz launch <launcher flags> -- <command> <args> ezpz launch -n 8 -ppn 4 --verbose --tag-output -- python3 -m ezpz.examples.fsdp_tp ezpz launch --nproc 8 -x EZPZ_LOG_LEVEL=DEBUG -- python3 my_script.py --my-arg val positional arguments: command Command (and arguments) to execute. Use '--' to separate options when needed. options: -h, --help show this help message and exit --print-source Print the location of the launch CLI source and exit. --filter FILTER [FILTER ...] Filter output lines by these strings. -n NPROC, -np NPROC, --n NPROC, --np NPROC, --nproc NPROC, --world_size NPROC, --nprocs NPROC Number of processes. -ppn NPROC_PER_NODE, --ppn NPROC_PER_NODE, --nproc_per_node NPROC_PER_NODE Processes per node. -nh NHOSTS, --nh NHOSTS, --nhost NHOSTS, --nnode NHOSTS, --nnodes NHOSTS, --nhosts NHOSTS, --nhosts NHOSTS Number of nodes to use. --hostfile HOSTFILE Hostfile to use for launching.
Examplesโ๏ธ
Use it to launch:
-
Arbitrary command(s):
-
Arbitrary Python string:
-
One of the Distributed Training examples:
-
Your own distributed training script:
to launch
your_app.trainacross 16 processes, 8 per node.
Sequence Diagram
Two primary control paths drive ezpz launch: a scheduler-aware path used when
running inside PBS/SLURM allocations, and a local fallback that shells out to
mpirun when no scheduler metadata is available.
sequenceDiagram
autonumber
actor User
participant CLI as ezpz_launch
participant Scheduler as PBS_or_Slurm
participant MPI as mpirun_mpiexec
participant App as User_application
User->>CLI: ezpz launch <launch_flags> -- <cmd> <cmd_flags>
CLI->>Scheduler: detect_scheduler()
alt scheduler_detected
Scheduler-->>CLI: scheduler_type, job_metadata
CLI->>Scheduler: build_scheduler_command(cmd_to_launch)
Scheduler-->>CLI: launch_cmd (mpiexec_or_srun)
CLI->>MPI: run_command(launch_cmd)
MPI->>App: start_ranks_and_execute
App-->>MPI: return_codes
MPI-->>CLI: aggregate_status
else no_scheduler_detected
Scheduler-->>CLI: unknown
CLI->>MPI: mpirun -np 2 <cmd> <cmd_flags>
MPI->>App: start_local_ranks
App-->>MPI: return_codes
MPI-->>CLI: aggregate_status
end
CLI-->>User: exit_code
Distributed Training Examplesโ๏ธ
-
๐ Examples: Scalable and ready-to-go!
Any of the examples can be launched with:
๐ค HF Integration
-
ezpz.examples.{fsdp_tp,diffusion,hf,hf_trainer} all support arbitrary ๐ค Hugging Face datasets e.g.:dataset="stanfordnlp/imdb" # or any other HF dataset ezpz launch python3 -m ezpz.examples.fsdp_tp --dataset "${dataset}" ezpz launch python3 -m ezpz.examples.diffusion --dataset "${dataset}" ezpz launch python3 -m ezpz.examples.hf \ --model_name_or_path meta-llama/Llama-3.2-1B \ --dataset_name="${dataset}" \ --streaming \ --bf16=true ezpz launch python3 -m ezpz.examples.hf_trainer \ --model_name_or_path meta-llama/Llama-3.2-1B \ --dataset_name="${dataset}" \ --streaming \ --bf16=true -
ezpz.examples.hfandezpz.examples.hf_trainerboth support arbitrary combinations of (compatible)transformers.from_pretrainedmodels, and HF Datasets (with support for streaming!).hfuses an explicit training loop with Accelerate, whilehf_trainerwraps the HFTrainerAPI.ezpz launch python3 -m ezpz.examples.hf \ --streaming \ --dataset_name=eliplutchok/fineweb-small-sample \ --tokenizer_name meta-llama/Llama-3.2-1B \ --model_name_or_path meta-llama/Llama-3.2-1B \ --bf16=true ezpz launch python3 -m ezpz.examples.hf_trainer \ --streaming \ --dataset_name=eliplutchok/fineweb-small-sample \ --tokenizer_name meta-llama/Llama-3.2-1B \ --model_name_or_path meta-llama/Llama-3.2-1B \ --bf16=true
Simple Example
Output
Macbook Pro
#[01/08/26 @ 14:56:50][~/v/s/ezpz][dev][$โ!?] [4s] ; ezpz launch python3 -c 'import ezpz; print(ezpz.setup_torch())' [2026-01-08 14:56:54,307030][I][ezpz/launch:515:run] No active scheduler detected; falling back to local mpirun: mpirun -np 2 python3 -c 'import ezpz; print(ezpz.setup_torch())' Using [2 / 2] available "mps" devices !! 0 1 [2025-12-23-162222] Execution time: 4s secAurora (2 Nodes)
#[aurora_frameworks-2025.2.0](torchtitan-aurora_frameworks-2025.2.0)[1m9s] #[01/08/26,14:56:42][x4418c6s1b0n0][/f/d/f/p/p/torchtitan][main][?] ; ezpz launch python3 -c 'import ezpz; print(ezpz.setup_torch())' [2026-01-08 14:58:01,994729][I][numexpr/utils:148:_init_num_threads] Note: detected 208 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable. [2026-01-08 14:58:01,997067][I][numexpr/utils:151:_init_num_threads] Note: NumExpr detected 208 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16. [2026-01-08 14:58:01,997545][I][numexpr/utils:164:_init_num_threads] NumExpr defaulting to 16 threads. [2026-01-08 14:58:02,465850][I][ezpz/launch:396:launch] ----[๐ ezpz.launch][started][2026-01-08-145802]---- [2026-01-08 14:58:04,765720][I][ezpz/launch:416:launch] Job ID: 8247203 [2026-01-08 14:58:04,766527][I][ezpz/launch:417:launch] nodelist: ['x4418c6s1b0n0', 'x4717c0s6b0n0'] [2026-01-08 14:58:04,766930][I][ezpz/launch:418:launch] hostfile: /var/spool/pbs/aux/8247203.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov [2026-01-08 14:58:04,767616][I][ezpz/pbs:264:get_pbs_launch_cmd] โ Using [24/24] GPUs [2 hosts] x [12 GPU/host] [2026-01-08 14:58:04,768399][I][ezpz/launch:367:build_executable] Building command to execute by piecing together: [2026-01-08 14:58:04,768802][I][ezpz/launch:368:build_executable] (1.) launch_cmd: mpiexec --envall --np=24 --ppn=12 --hostfile=/var/spool/pbs/aux/8247203.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --no-vni --cpu-bind=verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96 [2026-01-08 14:58:04,769517][I][ezpz/launch:369:build_executable] (2.) cmd_to_launch: python3 -c 'import ezpz; print(ezpz.setup_torch())' [2026-01-08 14:58:04,770278][I][ezpz/launch:433:launch] Took: 3.01 seconds to build command. [2026-01-08 14:58:04,770660][I][ezpz/launch:436:launch] Executing: mpiexec --envall --np=24 --ppn=12 --hostfile=/var/spool/pbs/aux/8247203.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --no-vni --cpu-bind=verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96 python3 -c import ezpz; print(ezpz.setup_torch()) [2026-01-08 14:58:04,772125][I][ezpz/launch:220:get_aurora_filters] Filtering for Aurora-specific messages. To view list of filters, run with EZPZ_LOG_LEVEL=DEBUG [2026-01-08 14:58:04,772651][I][ezpz/launch:443:launch] Execution started @ 2026-01-08-145804... [2026-01-08 14:58:04,773070][I][ezpz/launch:138:run_command] Caught 24 filters [2026-01-08 14:58:04,773429][I][ezpz/launch:139:run_command] Running command: mpiexec --envall --np=24 --ppn=12 --hostfile=/var/spool/pbs/aux/8247203.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --no-vni --cpu-bind=verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96 python3 -c 'import ezpz; print(ezpz.setup_torch())'CPU bind output (24 lines)
Raw rank output (24 lines)
[2026-01-08 14:58:14,252433][I][ezpz/launch:447:launch] ----[๐ ezpz.launch][stop][2026-01-08-145814]---- [2026-01-08 14:58:14,253726][I][ezpz/launch:448:launch] Execution finished with 0. [2026-01-08 14:58:14,254184][I][ezpz/launch:449:launch] Executing finished in 9.48 seconds. [2026-01-08 14:58:14,254555][I][ezpz/launch:450:launch] Took 9.48 seconds to run. Exiting. took: 18sdemo.pydemo.pyimport ezpz # automatic device + backend setup for distributed PyTorch _ = ezpz.setup_torch() # CUDA/NCCL, XPU/XCCL, {MPS, CPU}/GLOO, ... device = ezpz.get_torch_device() # {cuda, xpu, mps, cpu, ...} rank = ezpz.get_rank() world_size = ezpz.get_world_size() # ...etc if rank == 0: print(f"Hello from rank {rank} / {world_size} on {device}!")We can launch this script with:
Output(s)
MacBook Pro
Aurora (2 nodes)
# from 2 nodes of Aurora: #[aurora_frameworks-2025.2.0](foremans-aurora_frameworks-2025.2.0)[C v7.5.0-gcc][43s] #[01/08/26,07:26:10][x4604c5s2b0n0][~] ; ezpz launch python3 demo.py [2026-01-08 07:26:19,723138][I][numexpr/utils:148:_init_num_threads] Note: detected 208 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable. [2026-01-08 07:26:19,725453][I][numexpr/utils:151:_init_num_threads] Note: NumExpr detected 208 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16. [2026-01-08 07:26:19,725932][I][numexpr/utils:164:_init_num_threads] NumExpr defaulting to 16 threads. [2026-01-08 07:26:20,290222][I][ezpz/launch:396:launch] ----[๐ ezpz.launch][started][2026-01-08-072620]---- [2026-01-08 07:26:21,566797][I][ezpz/launch:416:launch] Job ID: 8246832 [2026-01-08 07:26:21,567684][I][ezpz/launch:417:launch] nodelist: ['x4604c5s2b0n0', 'x4604c5s3b0n0'] [2026-01-08 07:26:21,568082][I][ezpz/launch:418:launch] hostfile: /var/spool/pbs/aux/8246832.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov [2026-01-08 07:26:21,568770][I][ezpz/pbs:264:get_pbs_launch_cmd] โ Using [24/24] GPUs [2 hosts] x [12 GPU/host] [2026-01-08 07:26:21,569557][I][ezpz/launch:367:build_executable] Building command to execute by piecing together: [2026-01-08 07:26:21,569959][I][ezpz/launch:368:build_executable] (1.) launch_cmd: mpiexec --envall --np=24 --ppn=12 --hostfile=/var/spool/pbs/aux/8246832.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --no-vni --cpu-bind=verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96 [2026-01-08 07:26:21,570821][I][ezpz/launch:369:build_executable] (2.) cmd_to_launch: python3 demo.py [2026-01-08 07:26:21,571548][I][ezpz/launch:433:launch] Took: 2.11 seconds to build command. [2026-01-08 07:26:21,571918][I][ezpz/launch:436:launch] Executing: mpiexec --envall --np=24 --ppn=12 --hostfile=/var/spool/pbs/aux/8246832.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --no-vni --cpu-bind=verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96 python3 demo.py [2026-01-08 07:26:21,573262][I][ezpz/launch:220:get_aurora_filters] Filtering for Aurora-specific messages. To view list of filters, run with EZPZ_LOG_LEVEL=DEBUG [2026-01-08 07:26:21,573781][I][ezpz/launch:443:launch] Execution started @ 2026-01-08-072621... [2026-01-08 07:26:21,574195][I][ezpz/launch:138:run_command] Caught 24 filters [2026-01-08 07:26:21,574532][I][ezpz/launch:139:run_command] Running command: mpiexec --envall --np=24 --ppn=12 --hostfile=/var/spool/pbs/aux/8246832.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --no-vni --cpu-bind=verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96 python3 demo.pyCPU bind output (24 lines)
Using [24 / 24] available "xpu" devices !! Hello from rank 0 / 24 on xpu! [2026-01-08 07:26:33,060432][I][ezpz/launch:447:launch] ----[๐ ezpz.launch][stop][2026-01-08-072633]---- [2026-01-08 07:26:33,061512][I][ezpz/launch:448:launch] Execution finished with 0. [2026-01-08 07:26:33,062045][I][ezpz/launch:449:launch] Executing finished in 11.49 seconds. [2026-01-08 07:26:33,062531][I][ezpz/launch:450:launch] Took 11.49 seconds to run. Exiting. took: 22s -
-
When no
--is present, all arguments are treated as part of the command to run. ↩ -
By default, this will detect if we're running behind a job scheduler (e.g. PBS or Slurm).
If so, we automatically determine the specifics of the currently active job; explicitly, this will determine:- The number of available nodes
- How many GPUs are present on each of these nodes
- How many GPUs we have total
It will then use this information to automatically construct the appropriate {
mpiexec,srun} command to launch, and finally, execute the launch cmd. ↩ -
The
ezpz.Historyclass automatically computes distributed statistics (min, max, mean, std. dev) across ranks for all recorded metrics.
NOTE: This is automatically disabled whenezpz.get_world_size() >= 384(e.g. >= {32, 96} {Aurora, Polaris} nodes) due to the additional overhead introduced (but can be manually enabled, if desired). ↩