Skip to content

๐Ÿš€ ezpz launchโš“๏ธŽ

Single entry point for launching distributed applications.

ezpz launch <cmd>

This will:

  1. Automatically detect your PBS/Slurm job and
  2. Launch <cmd> across all available accelerators.

This is done by detecting if ezpz launch is being executed from inside a PBS/Slurm job2.

If so, it determines the specifics of the active job (number of nodes, and number of GPUs per node), and uses this information to build and execute the appropriate launch command (e.g. mpiexec, srun).

When not running inside a PBS/Slurm job, ezpz launch falls back to mpirun with sensible defaults.

Arguments can be passed through to the mpiexec/srun launcher by separating them from the <cmd> with --1, e.g.:

ezpz launch <launch-args> -- <cmd> <cmd-args>

For example, to run with 8 processes total, 4 processes per node, on 2 hosts, we can:

ezpz launch -n 8 -ppn 4 -nh 2 -- python3 -m ezpz.examples.fsdp_tp

Assuming your current job can satisfy this (i.e. at least 4 accelerators per node, and at least 2 nodes), this would launch python3 -m ezpz.examples.fsdp_tp across 8 processes, 4 per node, on the first two hosts allocated to your job.

  • ezpz launch --help
    ezpz launch --help
    usage: ezpz launch [-h] [--print-source] [--filter FILTER [FILTER ...]] [-n NPROC] [-ppn NPROC_PER_NODE] [-nh NHOSTS] [--hostfile HOSTFILE] ...
    
    Launch a command on the current PBS/SLURM job.
    
    Additional `<launcher flags>` can be passed through directly
    to the launcher by including '--' as a separator before
    the command.
    
    Examples:
    
        ezpz launch <launcher flags> -- <command> <args>
    
        ezpz launch -n 8 -ppn 4 --verbose --tag-output -- python3 -m ezpz.examples.fsdp_tp
    
        ezpz launch --nproc 8 -x EZPZ_LOG_LEVEL=DEBUG -- python3 my_script.py --my-arg val
    
    positional arguments:
    command               Command (and arguments) to execute. Use '--' to separate options when needed.
    
    options:
    -h, --help            show this help message and exit
    --print-source        Print the location of the launch CLI source and exit.
    --filter FILTER [FILTER ...]
                            Filter output lines by these strings.
    -n NPROC, -np NPROC, --n NPROC, --np NPROC, --nproc NPROC, --world_size NPROC, --nprocs NPROC
                            Number of processes.
    -ppn NPROC_PER_NODE, --ppn NPROC_PER_NODE, --nproc_per_node NPROC_PER_NODE
                            Processes per node.
    -nh NHOSTS, --nh NHOSTS, --nhost NHOSTS, --nnode NHOSTS, --nnodes NHOSTS, --nhosts NHOSTS, --nhosts NHOSTS
                            Number of nodes to use.
    --hostfile HOSTFILE   Hostfile to use for launching.
    

Examplesโš“๏ธŽ

Use it to launch:

  • Arbitrary command(s):

    ezpz launch hostname
    
  • Arbitrary Python string:

    ezpz launch python3 -c 'import ezpz; ezpz.setup_torch()'
    
  • One of the Distributed Training examples:

    ezpz launch python3 -m ezpz.examples.test --profile
    ezpz launch -n 8 -- python3 -m ezpz.examples.fsdp_tp --tp 4
    
  • Your own distributed training script:

    ezpz launch -n 16 -ppn 8 -- python3 -m your_app.train --config configs/your_config.yaml
    

    to launch your_app.train across 16 processes, 8 per node.

Sequence Diagram

Two primary control paths drive ezpz launch: a scheduler-aware path used when running inside PBS/SLURM allocations, and a local fallback that shells out to mpirun when no scheduler metadata is available.

sequenceDiagram
    autonumber
    actor User
    participant CLI as ezpz_launch
    participant Scheduler as PBS_or_Slurm
    participant MPI as mpirun_mpiexec
    participant App as User_application

    User->>CLI: ezpz launch <launch_flags> -- <cmd> <cmd_flags>
    CLI->>Scheduler: detect_scheduler()
    alt scheduler_detected
        Scheduler-->>CLI: scheduler_type, job_metadata
        CLI->>Scheduler: build_scheduler_command(cmd_to_launch)
        Scheduler-->>CLI: launch_cmd (mpiexec_or_srun)
        CLI->>MPI: run_command(launch_cmd)
        MPI->>App: start_ranks_and_execute
        App-->>MPI: return_codes
        MPI-->>CLI: aggregate_status
    else no_scheduler_detected
        Scheduler-->>CLI: unknown
        CLI->>MPI: mpirun -np 2 <cmd> <cmd_flags>
        MPI->>App: start_local_ranks
        App-->>MPI: return_codes
        MPI-->>CLI: aggregate_status
    end
    CLI-->>User: exit_code

Distributed Training Examplesโš“๏ธŽ

  1. ๐Ÿ“ Examples: Scalable and ready-to-go!

    Links Example Module What it Does
    ยท ยท ezpz.examples.test Train MLP with DDP on MNIST
    ยท ยท ezpz.examples.fsdp Train CNN with FSDP on MNIST
    ยท ยท ezpz.examples.vit Train ViT with FSDP on MNIST
    ยท ยท ezpz.examples.fsdp_tp Train Transformer with FSDP + TP on HF Datasets
    ยท ยท ezpz.examples.diffusion Train Diffusion LLM with FSDP on HF Datasets
    ยท ยท ezpz.examples.hf Fine-tune causal LM with Accelerate + FSDP
    ยท ยท ezpz.examples.hf_trainer Train LLM with FSDP + HF Trainer on HF Datasets

    Any of the examples can be launched with:

    ezpz launch python3 -m ezpz.examples.<example>
    
    ๐Ÿค— HF Integration
    1. ezpz.examples.{fsdp_tp, diffusion, hf, hf_trainer} all support arbitrary ๐Ÿค— Hugging Face datasets e.g.:

      dataset="stanfordnlp/imdb"  # or any other HF dataset
      ezpz launch python3 -m ezpz.examples.fsdp_tp --dataset "${dataset}"
      ezpz launch python3 -m ezpz.examples.diffusion --dataset "${dataset}"
      ezpz launch python3 -m ezpz.examples.hf \
          --model_name_or_path meta-llama/Llama-3.2-1B \
          --dataset_name="${dataset}" \
          --streaming \
          --bf16=true
      ezpz launch python3 -m ezpz.examples.hf_trainer \
          --model_name_or_path meta-llama/Llama-3.2-1B \
          --dataset_name="${dataset}" \
          --streaming \
          --bf16=true
      
    2. ezpz.examples.hf and ezpz.examples.hf_trainer both support arbitrary combinations of (compatible) transformers.from_pretrained models, and HF Datasets (with support for streaming!). hf uses an explicit training loop with Accelerate, while hf_trainer wraps the HF Trainer API.

      ezpz launch python3 -m ezpz.examples.hf \
          --streaming \
          --dataset_name=eliplutchok/fineweb-small-sample \
          --tokenizer_name meta-llama/Llama-3.2-1B \
          --model_name_or_path meta-llama/Llama-3.2-1B \
          --bf16=true
      
      ezpz launch python3 -m ezpz.examples.hf_trainer \
          --streaming \
          --dataset_name=eliplutchok/fineweb-small-sample \
          --tokenizer_name meta-llama/Llama-3.2-1B \
          --model_name_or_path meta-llama/Llama-3.2-1B \
          --bf16=true
      
    Simple Example
    ezpz launch python3 -c 'import ezpz; print(ezpz.setup_torch())'
    
    Output
    Macbook Pro
    #[01/08/26 @ 14:56:50][~/v/s/ezpz][dev][$โœ˜!?] [4s]
    ; ezpz launch python3 -c 'import ezpz; print(ezpz.setup_torch())'
    [2026-01-08 14:56:54,307030][I][ezpz/launch:515:run] No active scheduler detected; falling back to local mpirun: mpirun -np 2 python3 -c 'import ezpz; print(ezpz.setup_torch())'
    Using [2 / 2] available "mps" devices !!
    0
    1
    [2025-12-23-162222] Execution time: 4s sec
    
    Aurora (2 Nodes)
    #[aurora_frameworks-2025.2.0](torchtitan-aurora_frameworks-2025.2.0)[1m9s]
    #[01/08/26,14:56:42][x4418c6s1b0n0][/f/d/f/p/p/torchtitan][main][?]
    ; ezpz launch python3 -c 'import ezpz; print(ezpz.setup_torch())'
    
    
    [2026-01-08 14:58:01,994729][I][numexpr/utils:148:_init_num_threads] Note: detected 208 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
    [2026-01-08 14:58:01,997067][I][numexpr/utils:151:_init_num_threads] Note: NumExpr detected 208 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
    [2026-01-08 14:58:01,997545][I][numexpr/utils:164:_init_num_threads] NumExpr defaulting to 16 threads.
    [2026-01-08 14:58:02,465850][I][ezpz/launch:396:launch] ----[๐Ÿ‹ ezpz.launch][started][2026-01-08-145802]----
    [2026-01-08 14:58:04,765720][I][ezpz/launch:416:launch] Job ID: 8247203
    [2026-01-08 14:58:04,766527][I][ezpz/launch:417:launch] nodelist: ['x4418c6s1b0n0', 'x4717c0s6b0n0']
    [2026-01-08 14:58:04,766930][I][ezpz/launch:418:launch] hostfile: /var/spool/pbs/aux/8247203.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
    [2026-01-08 14:58:04,767616][I][ezpz/pbs:264:get_pbs_launch_cmd] โœ… Using [24/24] GPUs [2 hosts] x [12 GPU/host]
    [2026-01-08 14:58:04,768399][I][ezpz/launch:367:build_executable] Building command to execute by piecing together:
    [2026-01-08 14:58:04,768802][I][ezpz/launch:368:build_executable] (1.) launch_cmd: mpiexec --envall --np=24 --ppn=12 --hostfile=/var/spool/pbs/aux/8247203.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --no-vni --cpu-bind=verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96
    [2026-01-08 14:58:04,769517][I][ezpz/launch:369:build_executable] (2.) cmd_to_launch: python3 -c 'import ezpz; print(ezpz.setup_torch())'
    [2026-01-08 14:58:04,770278][I][ezpz/launch:433:launch] Took: 3.01 seconds to build command.
    [2026-01-08 14:58:04,770660][I][ezpz/launch:436:launch] Executing:
    mpiexec
    --envall
    --np=24
    --ppn=12
    --hostfile=/var/spool/pbs/aux/8247203.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
    --no-vni
    --cpu-bind=verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96
    python3
    -c
    import ezpz; print(ezpz.setup_torch())
    [2026-01-08 14:58:04,772125][I][ezpz/launch:220:get_aurora_filters] Filtering for Aurora-specific messages. To view list of filters, run with EZPZ_LOG_LEVEL=DEBUG
    [2026-01-08 14:58:04,772651][I][ezpz/launch:443:launch] Execution started @ 2026-01-08-145804...
    [2026-01-08 14:58:04,773070][I][ezpz/launch:138:run_command] Caught 24 filters
    [2026-01-08 14:58:04,773429][I][ezpz/launch:139:run_command] Running command:
    mpiexec --envall --np=24 --ppn=12 --hostfile=/var/spool/pbs/aux/8247203.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --no-vni --cpu-bind=verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96 python3 -c 'import ezpz; print(ezpz.setup_torch())'
    

    CPU bind output (24 lines)

    cpubind:list x4717c0s6b0n0 pid 118589 rank 12 0: mask 0x1c
    cpubind:list x4717c0s6b0n0 pid 118590 rank 13 1: mask 0x1c00
    ...
    cpubind:list x4418c6s1b0n0 pid 66460 rank 10 10: mask 0x1c000000000000000000000
    cpubind:list x4418c6s1b0n0 pid 66461 rank 11 11: mask 0x1c00000000000000000000000
    
    Using [24 / 24] available "xpu" devices !!
    

    Raw rank output (24 lines)

    8
    10
    0
    4
    ...
    18
    21
    
    [2026-01-08 14:58:14,252433][I][ezpz/launch:447:launch] ----[๐Ÿ‹ ezpz.launch][stop][2026-01-08-145814]----
    [2026-01-08 14:58:14,253726][I][ezpz/launch:448:launch] Execution finished with 0.
    [2026-01-08 14:58:14,254184][I][ezpz/launch:449:launch] Executing finished in 9.48 seconds.
    [2026-01-08 14:58:14,254555][I][ezpz/launch:450:launch] Took 9.48 seconds to run. Exiting.
    took: 18s
    
    demo.py
    demo.py
    import ezpz
    
    # automatic device + backend setup for distributed PyTorch
    _ = ezpz.setup_torch()  # CUDA/NCCL, XPU/XCCL, {MPS, CPU}/GLOO, ...
    
    device = ezpz.get_torch_device() # {cuda, xpu, mps, cpu, ...}
    rank = ezpz.get_rank()
    world_size = ezpz.get_world_size()
    # ...etc
    
    if rank == 0:
        print(f"Hello from rank {rank} / {world_size} on {device}!")
    

    We can launch this script with:

    ezpz launch python3 demo.py
    
    Output(s)
    MacBook Pro
    # from MacBook Pro
    $ ezpz launch python3 demo.py
    [2026-01-08 07:22:31,989741][I][ezpz/launch:515:run] No active scheduler detected; falling back to local mpirun: mpirun -np 2 python3 /Users/samforeman/python/ezpz_demo.py
    Using [2 / 2] available "mps" devices !!
    Hello from rank 0 / 2 on mps!
    
    Aurora (2 nodes)
    # from 2 nodes of Aurora:
    #[aurora_frameworks-2025.2.0](foremans-aurora_frameworks-2025.2.0)[C v7.5.0-gcc][43s]
    #[01/08/26,07:26:10][x4604c5s2b0n0][~]
    ; ezpz launch python3 demo.py
    
    [2026-01-08 07:26:19,723138][I][numexpr/utils:148:_init_num_threads] Note: detected 208 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
    [2026-01-08 07:26:19,725453][I][numexpr/utils:151:_init_num_threads] Note: NumExpr detected 208 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.
    [2026-01-08 07:26:19,725932][I][numexpr/utils:164:_init_num_threads] NumExpr defaulting to 16 threads.
    [2026-01-08 07:26:20,290222][I][ezpz/launch:396:launch] ----[๐Ÿ‹ ezpz.launch][started][2026-01-08-072620]----
    [2026-01-08 07:26:21,566797][I][ezpz/launch:416:launch] Job ID: 8246832
    [2026-01-08 07:26:21,567684][I][ezpz/launch:417:launch] nodelist: ['x4604c5s2b0n0', 'x4604c5s3b0n0']
    [2026-01-08 07:26:21,568082][I][ezpz/launch:418:launch] hostfile: /var/spool/pbs/aux/8246832.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
    [2026-01-08 07:26:21,568770][I][ezpz/pbs:264:get_pbs_launch_cmd] โœ… Using [24/24] GPUs [2 hosts] x [12 GPU/host]
    [2026-01-08 07:26:21,569557][I][ezpz/launch:367:build_executable] Building command to execute by piecing together:
    [2026-01-08 07:26:21,569959][I][ezpz/launch:368:build_executable] (1.) launch_cmd: mpiexec --envall --np=24 --ppn=12 --hostfile=/var/spool/pbs/aux/8246832.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --no-vni --cpu-bind=verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96
    [2026-01-08 07:26:21,570821][I][ezpz/launch:369:build_executable] (2.) cmd_to_launch: python3 demo.py
    [2026-01-08 07:26:21,571548][I][ezpz/launch:433:launch] Took: 2.11 seconds to build command.
    [2026-01-08 07:26:21,571918][I][ezpz/launch:436:launch] Executing:
    mpiexec
    --envall
    --np=24
    --ppn=12
    --hostfile=/var/spool/pbs/aux/8246832.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov
    --no-vni
    --cpu-bind=verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96
    python3
    demo.py
    [2026-01-08 07:26:21,573262][I][ezpz/launch:220:get_aurora_filters] Filtering for Aurora-specific messages. To view list of filters, run with EZPZ_LOG_LEVEL=DEBUG
    [2026-01-08 07:26:21,573781][I][ezpz/launch:443:launch] Execution started @ 2026-01-08-072621...
    [2026-01-08 07:26:21,574195][I][ezpz/launch:138:run_command] Caught 24 filters
    [2026-01-08 07:26:21,574532][I][ezpz/launch:139:run_command] Running command:
    mpiexec --envall --np=24 --ppn=12 --hostfile=/var/spool/pbs/aux/8246832.aurora-pbs-0001.hostmgmt.cm.aurora.alcf.anl.gov --no-vni --cpu-bind=verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96 python3 demo.py
    

    CPU bind output (24 lines)

    cpubind:list x4604c5s3b0n0 pid 131587 rank 12 0: mask 0x1c
    cpubind:list x4604c5s3b0n0 pid 131588 rank 13 1: mask 0x1c00
    ...
    cpubind:list x4604c5s2b0n0 pid 121235 rank 10 10: mask 0x1c000000000000000000000
    cpubind:list x4604c5s2b0n0 pid 121236 rank 11 11: mask 0x1c00000000000000000000000
    
    Using [24 / 24] available "xpu" devices !!
    Hello from rank 0 / 24 on xpu!
    [2026-01-08 07:26:33,060432][I][ezpz/launch:447:launch] ----[๐Ÿ‹ ezpz.launch][stop][2026-01-08-072633]----
    [2026-01-08 07:26:33,061512][I][ezpz/launch:448:launch] Execution finished with 0.
    [2026-01-08 07:26:33,062045][I][ezpz/launch:449:launch] Executing finished in 11.49 seconds.
    [2026-01-08 07:26:33,062531][I][ezpz/launch:450:launch] Took 11.49 seconds to run. Exiting.
    took: 22s
    

  1. When no -- is present, all arguments are treated as part of the command to run. 

  2. By default, this will detect if we're running behind a job scheduler (e.g. PBS or Slurm).
    If so, we automatically determine the specifics of the currently active job; explicitly, this will determine:

    1. The number of available nodes
    2. How many GPUs are present on each of these nodes
    3. How many GPUs we have total

    It will then use this information to automatically construct the appropriate {mpiexec, srun} command to launch, and finally, execute the launch cmd. 

  3. The ezpz.History class automatically computes distributed statistics (min, max, mean, std. dev) across ranks for all recorded metrics.
    NOTE: This is automatically disabled when ezpz.get_world_size() >= 384 (e.g. >= {32, 96} {Aurora, Polaris} nodes) due to the additional overhead introduced (but can be manually enabled, if desired).