🚀 `ezpz launch`⚓︎

Single entry point for launching distributed applications.

ezpz launch <cmd>

This will:

Automatically detect your PBS/Slurm job and
Launch <cmd> across all available accelerators.

This is done by detecting if ezpz launch is being executed from inside a PBS/Slurm job².

If so, it determines the specifics of the active job (number of nodes, and number of GPUs per node), and uses this information to build and execute the appropriate launch command (e.g. mpiexec, srun).

When not running inside a PBS/Slurm job, ezpz launch falls back to mpirun with sensible defaults.

Arguments can be passed through to the mpiexec/srun launcher by separating them from the <cmd> with --¹, e.g.:

ezpz launch <launch-args> -- <cmd> <cmd-args>

For example, to run with 8 processes total, 4 processes per node, on 2 hosts, we can:

ezpz launch -n 8 -ppn 4 -nh 2 -- python3 -m ezpz.examples.fsdp_tp

Assuming your current job can satisfy this (i.e. at least 4 accelerators per node, and at least 2 nodes), this would launch python3 -m ezpz.examples.fsdp_tp across 8 processes, 4 per node, on the first two hosts allocated to your job.

ezpz launch --help

ezpz launch --help
usage: ezpz launch [-h] [--print-source] [--filter FILTER [FILTER ...]] [-n NPROC] [-ppn NPROC_PER_NODE] [-nh NHOSTS] [--hostfile HOSTFILE] ...

Launch a command on the current PBS/SLURM job.

Additional `<launcher flags>` can be passed through directly
to the launcher by including '--' as a separator before
the command.

Examples:

    ezpz launch <launcher flags> -- <command> <args>

    ezpz launch -n 8 -ppn 4 --verbose --tag-output -- python3 -m ezpz.examples.fsdp_tp

    ezpz launch --nproc 8 -x EZPZ_LOG_LEVEL=DEBUG -- python3 my_script.py --my-arg val

positional arguments:
command               Command (and arguments) to execute. Use '--' to separate options when needed.

options:
-h, --help            show this help message and exit
--print-source        Print the location of the launch CLI source and exit.
--filter FILTER [FILTER ...]
                        Filter output lines by these strings.
-n NPROC, -np NPROC, --n NPROC, --np NPROC, --nproc NPROC, --world_size NPROC, --nprocs NPROC
                        Number of processes.
-ppn NPROC_PER_NODE, --ppn NPROC_PER_NODE, --nproc_per_node NPROC_PER_NODE
                        Processes per node.
-nh NHOSTS, --nh NHOSTS, --nhost NHOSTS, --nnode NHOSTS, --nnodes NHOSTS, --nhosts NHOSTS, --nhosts NHOSTS
                        Number of nodes to use.
--hostfile HOSTFILE   Hostfile to use for launching.

Examples⚓︎

Use it to launch:

Arbitrary command(s):
```
ezpz launch hostname
```

Arbitrary Python string:

ezpz launch python3 -c 'import ezpz; ezpz.setup_torch()'

One of the Distributed Training examples:

ezpz launch python3 -m ezpz.examples.test --profile
ezpz launch -n 8 -- python3 -m ezpz.examples.fsdp_tp --tp 4

Your own distributed training script:

ezpz launch -n 16 -ppn 8 -- python3 -m your_app.train --config configs/your_config.yaml

to launch your_app.train across 16 processes, 8 per node.

Sequence Diagram

Two primary control paths drive ezpz launch: a scheduler-aware path used when running inside PBS/SLURM allocations, and a local fallback that shells out to mpirun when no scheduler metadata is available.

sequenceDiagram
    autonumber
    actor User
    participant CLI as ezpz_launch
    participant Scheduler as PBS_or_Slurm
    participant MPI as mpirun_mpiexec
    participant App as User_application

    User->>CLI: ezpz launch <launch_flags> -- <cmd> <cmd_flags>
    CLI->>Scheduler: detect_scheduler()
    alt scheduler_detected
        Scheduler-->>CLI: scheduler_type, job_metadata
        CLI->>Scheduler: build_scheduler_command(cmd_to_launch)
        Scheduler-->>CLI: launch_cmd (mpiexec_or_srun)
        CLI->>MPI: run_command(launch_cmd)
        MPI->>App: start_ranks_and_execute
        App-->>MPI: return_codes
        MPI-->>CLI: aggregate_status
    else no_scheduler_detected
        Scheduler-->>CLI: unknown
        CLI->>MPI: mpirun -np 2 <cmd> <cmd_flags>
        MPI->>App: start_local_ranks
        App-->>MPI: return_codes
        MPI-->>CLI: aggregate_status
    end
    CLI-->>User: exit_code

Distributed Training Examples⚓︎

📝 Examples: Scalable and ready-to-go!

Links	Example Module	What it Does
· ·	`ezpz.examples.test`	Train MLP with DDP on MNIST
· ·	`ezpz.examples.fsdp`	Train CNN with FSDP on MNIST
· ·	`ezpz.examples.vit`	Train ViT with FSDP on MNIST
· ·	`ezpz.examples.fsdp_tp`	Train Transformer with FSDP + TP on HF Datasets
· ·	`ezpz.examples.diffusion`	Train Diffusion LLM with FSDP on HF Datasets
· ·	`ezpz.examples.hf`	Fine-tune causal LM with Accelerate + FSDP
· ·	`ezpz.examples.hf_trainer`	Train LLM with FSDP + HF Trainer on HF Datasets

Any of the examples can be launched with:

ezpz launch python3 -m ezpz.examples.<example>

🤗 HF Integration

ezpz.examples.{fsdp_tp, diffusion, hf, hf_trainer} all support arbitrary 🤗 Hugging Face datasets e.g.:

dataset="stanfordnlp/imdb"  # or any other HF dataset
ezpz launch python3 -m ezpz.examples.fsdp_tp --dataset "${dataset}"
ezpz launch python3 -m ezpz.examples.diffusion --dataset "${dataset}"
ezpz launch python3 -m ezpz.examples.hf \
    --model_name_or_path meta-llama/Llama-3.2-1B \
    --dataset_name="${dataset}" \
    --streaming \
    --bf16=true
ezpz launch python3 -m ezpz.examples.hf_trainer \
    --model_name_or_path meta-llama/Llama-3.2-1B \
    --dataset_name="${dataset}" \
    --streaming \
    --bf16=true

ezpz.examples.hf and ezpz.examples.hf_trainer both support arbitrary combinations of (compatible) transformers.from_pretrained models, and HF Datasets (with support for streaming!). hf uses an explicit training loop with Accelerate, while hf_trainer wraps the HF Trainer API.

ezpz launch python3 -m ezpz.examples.hf \
    --streaming \
    --dataset_name=eliplutchok/fineweb-small-sample \
    --tokenizer_name meta-llama/Llama-3.2-1B \
    --model_name_or_path meta-llama/Llama-3.2-1B \
    --bf16=true

ezpz launch python3 -m ezpz.examples.hf_trainer \
    --streaming \
    --dataset_name=eliplutchok/fineweb-small-sample \
    --tokenizer_name meta-llama/Llama-3.2-1B \
    --model_name_or_path meta-llama/Llama-3.2-1B \
    --bf16=true

Simple Example

ezpz launch python3 -c 'import ezpz; print(ezpz.setup_torch())'

Output

Macbook Pro

#[01/08/26 @ 14:56:50][~/v/s/ezpz][dev][$✘!?] [4s]
; ezpz launch python3 -c 'import ezpz; print(ezpz.setup_torch())'
[2026-01-08 14:56:54,307030][I][ezpz/launch:515:run] No active scheduler detected; falling back to local mpirun: mpirun -np 2 python3 -c 'import ezpz; print(ezpz.setup_torch())'
Using [2 / 2] available "mps" devices !!
0
1
[2025-12-23-162222] Execution time: 4s sec

Aurora (2 Nodes)

#[aurora_frameworks-2025.2.0](torchtitan-aurora_frameworks-2025.2.0)[1m9s]
#[01/08/26,14:56:42][x4418c6s1b0n0][/f/d/f/p/p/torchtitan][main][?]
; ezpz launch python3 -c 'import ezpz; print(ezpz.setup_torch())'


[2026-01-08 14:58:01,994729][I][numexpr/utils:148:_init_num_threads] Note: detected 208 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
[2026-01-08 14:58:01,997067][I][numexpr/utils:151:_init_num_threads] Note: NumExpr detected 208 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
[2026-01-08 14:58:01,997545][I][numexpr/utils:164:_init_num_threads] NumExpr defaulting to 16 threads.
[2026-01-08 14:58:02,465850][I][ezpz/launch:396:launch] ----[🍋 ezpz.launch][started][2026-01-08-145802]----
[2026-01-08 14:58:04,765720][I][ezpz/launch:416:launch] Job ID: 8247203
[2026-01-08 14:58:04,766527][I][ezpz/launch:417:launch] nodelist: ['x4418c6s1b0n0', 'x4717c0s6b0n0']
[2026-01-08 14:58:04,766930][I][ezpz/launch:418:launch] hostfile: /var/spool/pbs/aux/8247203.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
[2026-01-08 14:58:04,767616][I][ezpz/pbs:264:get_pbs_launch_cmd] ✅ Using [24/24] GPUs [2 hosts] x [12 GPU/host]
[2026-01-08 14:58:04,768399][I][ezpz/launch:367:build_executable] Building command to execute by piecing together:
[2026-01-08 14:58:04,768802][I][ezpz/launch:368:build_executable] (1.) launch_cmd: mpiexec --envall --np=24 --ppn=12 --hostfile=/var/spool/pbs/aux/8247203.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --no-vni --cpu-bind=verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96
[2026-01-08 14:58:04,769517][I][ezpz/launch:369:build_executable] (2.) cmd_to_launch: python3 -c 'import ezpz; print(ezpz.setup_torch())'
[2026-01-08 14:58:04,770278][I][ezpz/launch:433:launch] Took: 3.01 seconds to build command.
[2026-01-08 14:58:04,770660][I][ezpz/launch:436:launch] Executing:
mpiexec
--envall
--np=24
--ppn=12
--hostfile=/var/spool/pbs/aux/8247203.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
--no-vni
--cpu-bind=verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96
python3
-c
import ezpz; print(ezpz.setup_torch())
[2026-01-08 14:58:04,772125][I][ezpz/launch:220:get_aurora_filters] Filtering for Aurora-specific messages. To view list of filters, run with EZPZ_LOG_LEVEL=DEBUG
[2026-01-08 14:58:04,772651][I][ezpz/launch:443:launch] Execution started @ 2026-01-08-145804...
[2026-01-08 14:58:04,773070][I][ezpz/launch:138:run_command] Caught 24 filters
[2026-01-08 14:58:04,773429][I][ezpz/launch:139:run_command] Running command:
mpiexec --envall --np=24 --ppn=12 --hostfile=/var/spool/pbs/aux/8247203.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --no-vni --cpu-bind=verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96 python3 -c 'import ezpz; print(ezpz.setup_torch())'

CPU bind output (24 lines)

cpubind:list x4717c0s6b0n0 pid 118589 rank 12 0: mask 0x1c
cpubind:list x4717c0s6b0n0 pid 118590 rank 13 1: mask 0x1c00
...
cpubind:list x4418c6s1b0n0 pid 66460 rank 10 10: mask 0x1c000000000000000000000
cpubind:list x4418c6s1b0n0 pid 66461 rank 11 11: mask 0x1c00000000000000000000000

Using [24 / 24] available "xpu" devices !!

Raw rank output (24 lines)

[2026-01-08 14:58:14,252433][I][ezpz/launch:447:launch] ----[🍋 ezpz.launch][stop][2026-01-08-145814]----
[2026-01-08 14:58:14,253726][I][ezpz/launch:448:launch] Execution finished with 0.
[2026-01-08 14:58:14,254184][I][ezpz/launch:449:launch] Executing finished in 9.48 seconds.
[2026-01-08 14:58:14,254555][I][ezpz/launch:450:launch] Took 9.48 seconds to run. Exiting.
took: 18s

demo.py

demo.py

import ezpz

# automatic device + backend setup for distributed PyTorch
_ = ezpz.setup_torch()  # CUDA/NCCL, XPU/XCCL, {MPS, CPU}/GLOO, ...

device = ezpz.get_torch_device() # {cuda, xpu, mps, cpu, ...}
rank = ezpz.get_rank()
world_size = ezpz.get_world_size()
# ...etc

if rank == 0:
    print(f"Hello from rank {rank} / {world_size} on {device}!")

We can launch this script with:

ezpz launch python3 demo.py

Output(s)

MacBook Pro

# from MacBook Pro
$ ezpz launch python3 demo.py
[2026-01-08 07:22:31,989741][I][ezpz/launch:515:run] No active scheduler detected; falling back to local mpirun: mpirun -np 2 python3 /Users/samforeman/python/ezpz_demo.py
Using [2 / 2] available "mps" devices !!
Hello from rank 0 / 2 on mps!

Aurora (2 nodes)

# from 2 nodes of Aurora:
#[aurora_frameworks-2025.2.0](foremans-aurora_frameworks-2025.2.0)[C v7.5.0-gcc][43s]
#[01/08/26,07:26:10][x4604c5s2b0n0][~]
; ezpz launch python3 demo.py

[2026-01-08 07:26:19,723138][I][numexpr/utils:148:_init_num_threads] Note: detected 208 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
[2026-01-08 07:26:19,725453][I][numexpr/utils:151:_init_num_threads] Note: NumExpr detected 208 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
[2026-01-08 07:26:19,725932][I][numexpr/utils:164:_init_num_threads] NumExpr defaulting to 16 threads.
[2026-01-08 07:26:20,290222][I][ezpz/launch:396:launch] ----[🍋 ezpz.launch][started][2026-01-08-072620]----
[2026-01-08 07:26:21,566797][I][ezpz/launch:416:launch] Job ID: 8246832
[2026-01-08 07:26:21,567684][I][ezpz/launch:417:launch] nodelist: ['x4604c5s2b0n0', 'x4604c5s3b0n0']
[2026-01-08 07:26:21,568082][I][ezpz/launch:418:launch] hostfile: /var/spool/pbs/aux/8246832.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
[2026-01-08 07:26:21,568770][I][ezpz/pbs:264:get_pbs_launch_cmd] ✅ Using [24/24] GPUs [2 hosts] x [12 GPU/host]
[2026-01-08 07:26:21,569557][I][ezpz/launch:367:build_executable] Building command to execute by piecing together:
[2026-01-08 07:26:21,569959][I][ezpz/launch:368:build_executable] (1.) launch_cmd: mpiexec --envall --np=24 --ppn=12 --hostfile=/var/spool/pbs/aux/8246832.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --no-vni --cpu-bind=verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96
[2026-01-08 07:26:21,570821][I][ezpz/launch:369:build_executable] (2.) cmd_to_launch: python3 demo.py
[2026-01-08 07:26:21,571548][I][ezpz/launch:433:launch] Took: 2.11 seconds to build command.
[2026-01-08 07:26:21,571918][I][ezpz/launch:436:launch] Executing:
mpiexec
--envall
--np=24
--ppn=12
--hostfile=/var/spool/pbs/aux/8246832.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
--no-vni
--cpu-bind=verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96
python3
demo.py
[2026-01-08 07:26:21,573262][I][ezpz/launch:220:get_aurora_filters] Filtering for Aurora-specific messages. To view list of filters, run with EZPZ_LOG_LEVEL=DEBUG
[2026-01-08 07:26:21,573781][I][ezpz/launch:443:launch] Execution started @ 2026-01-08-072621...
[2026-01-08 07:26:21,574195][I][ezpz/launch:138:run_command] Caught 24 filters
[2026-01-08 07:26:21,574532][I][ezpz/launch:139:run_command] Running command:
mpiexec --envall --np=24 --ppn=12 --hostfile=/var/spool/pbs/aux/8246832.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --no-vni --cpu-bind=verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96 python3 demo.py

CPU bind output (24 lines)

cpubind:list x4604c5s3b0n0 pid 131587 rank 12 0: mask 0x1c
cpubind:list x4604c5s3b0n0 pid 131588 rank 13 1: mask 0x1c00
...
cpubind:list x4604c5s2b0n0 pid 121235 rank 10 10: mask 0x1c000000000000000000000
cpubind:list x4604c5s2b0n0 pid 121236 rank 11 11: mask 0x1c00000000000000000000000

Using [24 / 24] available "xpu" devices !!
Hello from rank 0 / 24 on xpu!
[2026-01-08 07:26:33,060432][I][ezpz/launch:447:launch] ----[🍋 ezpz.launch][stop][2026-01-08-072633]----
[2026-01-08 07:26:33,061512][I][ezpz/launch:448:launch] Execution finished with 0.
[2026-01-08 07:26:33,062045][I][ezpz/launch:449:launch] Executing finished in 11.49 seconds.
[2026-01-08 07:26:33,062531][I][ezpz/launch:450:launch] Took 11.49 seconds to run. Exiting.
took: 22s

When no -- is present, all arguments are treated as part of the command to run. ↩
By default, this will detect if we're running behind a job scheduler (e.g. PBS or Slurm).
If so, we automatically determine the specifics of the currently active job; explicitly, this will determine:
1. The number of available nodes
2. How many GPUs are present on each of these nodes
3. How many GPUs we have total
It will then use this information to automatically construct the appropriate {mpiexec, srun} command to launch, and finally, execute the launch cmd. ↩
The ezpz.History class automatically computes distributed statistics (min, max, mean, std. dev) across ranks for all recorded metrics.
NOTE: This is automatically disabled when ezpz.get_world_size() >= 384 (e.g. >= {32, 96} {Aurora, Polaris} nodes) due to the additional overhead introduced (but can be manually enabled, if desired). ↩

🚀 ezpz launch⚓︎

Examples⚓︎

Distributed Training Examples⚓︎

🚀 `ezpz launch`⚓︎