βοΈ ConfigurationβοΈ
ezpz behavior can be customized through environment variables:
-
The colorized logging output can be toggled via the
NO_COLORenvironment var, e.g. to turn off colors: -
Force logging from all ranks (not just rank 0):
-
Forcing a specific torch device (useful on GPU hosts when you want CPU-only):
-
Text Based Plots:
-
Changing the plot marker used in the text-based plots:
-
Changing the plot size:
The plots will automatically scale (up to a reasonable limit) with the dimensions of the terminal in which they're run.
If desired, these can be specified explicitly by overriding the
LINESandCOLUMNSenvironment variables, e.g.:
-
Common ConfigurationsβοΈ
Copy-paste these for common scenarios:
# Run offline β no external tracking
EZPZ_TRACKER_BACKENDS=none WANDB_DISABLED=1 ezpz launch -- python3 train.py
# Force CPU mode for debugging (no GPU required)
TORCH_DEVICE=cpu TORCH_BACKEND=gloo ezpz launch -np 2 -- python3 train.py
# Log to all tracking backends simultaneously
EZPZ_TRACKER_BACKENDS=wandb,mlflow,csv ezpz launch -- python3 train.py
# Quiet logging (warnings and errors only)
EZPZ_LOG_LEVEL=WARNING ezpz launch -- python3 train.py
# MLflow with AmSC credentials (see MLflow Credential Files below)
EZPZ_TRACKER_BACKENDS=mlflow ezpz launch -- python3 train.py
Choosing a parallelism strategy
See the Distributed Training Guide for a decision table on when to use DDP vs FSDP vs FSDP+TP.
Environment VariablesβοΈ
Device & DistributionβοΈ
| Variable | Purpose | Values / Default |
|---|---|---|
TORCH_DEVICE |
Force device selection instead of auto-detection. | cpu, cuda, mps, xpu. Auto-detected. |
TORCH_BACKEND |
Override distributed backend. | nccl, gloo, mpi, xccl. Auto-detected. |
TORCH_DDP_TIMEOUT |
DDP init timeout for slow launches. | Seconds. Default: 3600. |
MASTER_ADDR |
Rendezvous address for distributed init. | Hostname/IP. Auto-detected from scheduler. |
MASTER_PORT |
Rendezvous port for distributed init. | Port number. Auto-detected (picks a free port). |
HOSTFILE |
Path to hostfile when scheduler defaults are missing. | File path. Auto-created from PBS/SLURM. |
LoggingβοΈ
| Variable | Purpose | Values / Default |
|---|---|---|
NO_COLOR / NOCOLOR / COLOR / COLORTERM |
Enable/disable colored output to suit terminals or log sinks. | Set NO_COLOR=1 to disable. |
EZPZ_LOG_LEVEL |
Set ezpz logging verbosity. | DEBUG, INFO, WARNING, ERROR. Default: INFO. |
LOG_LEVEL |
General log level for various modules. | Same as above. |
LOG_FROM_ALL_RANKS |
Allow logs from all ranks (not just rank 0). | Set to 1 to enable. Default: rank 0 only. |
EZPZ_LOG_FROM_ALL_RANKS |
Alias for LOG_FROM_ALL_RANKS. |
Same as above. |
Log FormattingβοΈ
Prefix Style PresetsβοΈ
Set EZPZ_LOG_PREFIX_STYLE to apply a preset. Individual EZPZ_LOG_* vars
still override after the preset is applied.
| Style | Output | Description |
|---|---|---|
full |
[2026-04-04 21:04][I][ezpz/launch:601.run] msg |
Default β all components, full brackets |
minimal |
I ezpz/launch:601.run -- msg |
Level + path, no time, no brackets |
time |
21:04:13 I -- msg |
Time + level, no path, no brackets |
plain |
2026-04-04 21:04 I ezpz/launch:601.run -- msg |
All components, no brackets, no color |
none |
msg |
No prefix at all |
Individual OptionsβοΈ
| Variable | Purpose | Values / Default |
|---|---|---|
EZPZ_LOG_PREFIX_STYLE |
Apply a preset that configures multiple options at once (see table above). Individual vars override. | full (default), minimal, time, plain, none. |
EZPZ_LOG_USE_BRACKETS |
Wrap each prefix component in brackets: [time][level][path]. |
1 (default) / 0. |
EZPZ_LOG_USE_SINGLE_BRACKET |
Wrap all prefix components in one bracket: [time level path]. Only applies when EZPZ_LOG_USE_BRACKETS=0. |
0 (default) / 1. |
EZPZ_LOG_USE_COLORED_PREFIX |
Colorize the log prefix (time, level, path). Message content is unaffected. | 1 (default) / 0. |
EZPZ_LOG_SHOW_TIME |
Show timestamp in log output. | 1 (default) / 0. |
EZPZ_LOG_SHOW_LEVEL |
Show log level (I, W, E, etc.) in log output. |
1 (default) / 0. |
EZPZ_LOG_SHOW_PATH |
Show module/file path in log output. | 1 (default) / 0. |
EZPZ_LOG_RANK |
Show rank number in log output. | 0 (default) / 1. |
EZPZ_LOG_ENABLE_LINK_PATH |
Make file paths clickable links (terminal support required). | 0 (default) / 1. |
EZPZ_LOG_TIME_FORMAT |
Override the timestamp format string. | strftime format. Default: %Y-%m-%d %H:%M:%S. |
EZPZ_LOG_TIME_DETAILED |
Use microsecond-precision timestamps. | 0 (default) / 1. |
EZPZ_LOG_DAY_SEPARATOR |
Separator between year/month/day in timestamps. | Default: -. |
EZPZ_LOG_DAY_TIME_SEPARATOR |
Separator between date and time in timestamps. | Default: (space). |
Experiment TrackingβοΈ
| Variable | Purpose | Values / Default |
|---|---|---|
EZPZ_TRACKER_BACKENDS / EZPZ_TRACKERS |
Comma-separated tracker backends. EZPZ_TRACKERS is a shorthand alias. |
wandb, csv, mlflow. e.g. wandb,csv,mlflow. Default: wandb. |
WANDB_DISABLED |
Disable Weights & Biases logging. | Set to 1 to disable. |
WANDB_MODE |
Set W&B mode. | online, offline, dryrun. |
WANDB_PROJECT / WB_PROJECT / WB_PROJECT_NAME |
Set project name for W&B runs. Also used as MLflow experiment name if MLFLOW_EXPERIMENT_NAME is not set. |
String (aliases for the same setting). |
WANDB_API_KEY |
Supply W&B API key for authentication. | API key string. |
MLFLOW_TRACKING_URI |
MLflow tracking server URL or local path. | e.g. https://mlflow.example.com or file:///path/to/mlruns. |
MLFLOW_EXPERIMENT_NAME |
MLflow experiment name. Falls back to WANDB_PROJECT, then auto-derived from script name. |
String. |
MLFLOW_TRACKING_TOKEN |
Bearer token for MLflow server auth. | Token string. |
MLFLOW_TRACKING_INSECURE_TLS |
Skip TLS certificate verification for MLflow server. | true / false. |
AMSC_API_KEY |
API key for AMSC MLflow server. Automatically sent as X-API-Key header. |
API key string. |
EZPZ_LOCAL_HISTORY |
Enable local-only history (skip distributed aggregation). | Set to any truthy value (e.g. 1). |
EZPZ_NO_DISTRIBUTED_HISTORY |
Disable distributed history aggregation. | Set to any truthy value. Auto-enabled at 384+ ranks. |
EZPZ_TRACK_MEMORY |
Gate per-step device-memory tracking by ezpz.get_memory_metrics. Set to 0 to suppress the mem_alloc / mem_peak_alloc / mem_reserved / mem_peak_reserved keys even on supported devices. |
1 (default) / 0. Returns {} on CPU/MPS regardless. |
MLflow Credential FilesβοΈ
The MLflow backend automatically loads environment variables from dotenv files in this order:
-
~/.amsc.envβ User-level credentials (loaded first). Put yourAMSC_API_KEYandMLFLOW_TRACKING_URIhere so they work across all projects without committing secrets to version control. -
Project
.envβ Found by walking upward from the working directory (loaded second, overrides values from~/.amsc.env). Use this for project-specific tracking URIs or experiment names.
Example ~/.amsc.env:
AMSC_API_KEY=your-api-key-here
MLFLOW_TRACKING_URI=https://mlflow.american-science-cloud.org
MLFLOW_TRACKING_INSECURE_TLS=true
No python-dotenv?
If python-dotenv is not installed, set the variables directly in your
shell environment or job script. The dotenv loading is a convenience,
not a requirement.
PlottingβοΈ
| Variable | Purpose | Values / Default |
|---|---|---|
EZPZ_TPLOT_TYPE |
Select timeline plot type. | Plot type string. |
EZPZ_TPLOT_MARKER |
Marker style for terminal plots. | braille (highest resolution), fhd (more compatible). |
EZPZ_TPLOT_MAX_HEIGHT |
Max height (rows) for terminal plots. | Integer. Auto-scales to terminal height. |
EZPZ_TPLOT_MAX_WIDTH |
Max width (columns) for terminal plots. | Integer. Auto-scales to terminal width. |
EZPZ_TPLOT_RAW_MARKER |
Marker for raw timeline data points. | Same values as EZPZ_TPLOT_MARKER. |
Model & DebugβοΈ
| Variable | Purpose | Values / Default |
|---|---|---|
EZPZ_ATTENTION_FP32 |
Force FP32 attention in LLaMA models. | Set to 1 to enable. |
EZPZ_DEBUG_NAN |
Enable NaN debugging in model forward pass. | Set to 1 to enable. |
PYINSTRUMENT_PROFILER |
Enable pyinstrument profiling. | Set to 1 to enable. |
Launcher & JobsβοΈ
| Variable | Purpose | Values / Default |
|---|---|---|
CPU_BIND |
CPU binding strategy for MPI process pinning. | depth (with --depth=N), or list:core_ranges (e.g. list:2-4:10-12). Machine-specific defaults set automatically. |
EZPZ_RUN_COMMAND |
The command being executed (set by launcher, read-only). | Auto-set. |
EZPZ_JOB_NAME |
Job name for logging. | String. Falls back to scheduler job name. |
EZPZ_LOG_TIMESTAMP |
Timestamp for logging context. | Auto-set. |
EZPZ_JSON_LOG_JOB_NAME |
Job name for JSON log handler. | String. |
HYDRA_JOB_NAME |
Hydra job name (used as fallback for EZPZ_JOB_NAME). |
String. |
Auto-set / Read-onlyβοΈ
These are set automatically and don't typically need to be configured by users.
| Variable | Purpose |
|---|---|
EZPZ_VERSION |
Auto-set to package version at import time. |
PYTHONHASHSEED |
Fix Python hash seed for reproducibility. |