π Experiment TrackingβοΈ
The History class handles the full experiment tracking lifecycle:
recording metrics, computing distributed statistics, dispatching to
external backends (Weights & Biases, CSV, or custom), and producing
plots and reports at the end of a run.
Recent changes
Tracker integration (v0.11): History now owns a Tracker
internally. Pass backends= directly to History() instead of
managing wandb separately:
# Before (deprecated)
import ezpz
ezpz.setup_wandb(project_name="my-project")
history = History()
history.update({"loss": 0.42}, use_wandb=True)
# After
history = History(project_name="my-project", backends="wandb,csv")
history.update({"loss": 0.42}, step=step)
use_wandbparameter onupdate()is deprecated β usebackends="wandb"in the constructor- If an active
wandb.runis detected withoutbackends=being set, History will use it automatically with a deprecation warning - Backend errors are isolated β a failing backend logs a warning but never crashes your training run
Quick StartβοΈ
import ezpz
rank = ezpz.setup_torch()
history = ezpz.History(
project_name="my-project",
backends="wandb,csv",
outdir="./outputs",
config={"lr": 1e-4, "batch_size": 32},
)
for step in range(100):
loss = train_step(...)
summary = history.update({"loss": loss.item()}, step=step)
logger.info(summary) # "loss=0.420000"
if rank == 0:
history.finalize(outdir="./outputs")
History() with no backend arguments works identically for local-only
tracking β metrics are still accumulated, plotted, and saved to JSONL,
but nothing is dispatched externally.
ConstructorβοΈ
history = History(
# --- Local outputs ---
distributed_history=True, # aggregate stats across ranks (default: auto)
report_dir="./outputs/history", # markdown report directory
jsonl_path="./metrics.jsonl", # per-step JSONL log
# --- External backends ---
project_name="my-project", # passed to wandb
backends="wandb,csv", # comma-separated or list
config={"lr": 1e-4}, # run-level hyperparameters
outdir="./outputs", # directory for file-based backends
)
| Parameter | Default | Description |
|---|---|---|
distributed_history |
auto | Compute min/max/mean/std across ranks via all-reduce |
report_dir |
./outputs/history |
Directory for the markdown report |
report_enabled |
True |
Generate a markdown report on finalize() |
jsonl_path |
None |
Path for per-step JSONL log. When None, defaults to <report_dir>/<run_id>.jsonl |
jsonl_overwrite |
False |
Truncate existing JSONL file |
project_name |
None |
Project name for backends that support it (e.g. wandb) |
backends |
None |
Comma-separated string or list of backend names |
config |
None |
Run-level config dict logged via the tracker on init |
outdir |
None |
Output directory for file-based backends (e.g. CSV) |
tracker |
None |
Inject a pre-built Tracker instance directly |
Distributed auto-detectionβοΈ
Distributed aggregation is enabled when world_size <= 384
and disabled above that threshold to avoid all-reduce overhead on
very large jobs. Override with:
History(distributed_history=False)EZPZ_NO_DISTRIBUTED_HISTORY=1orEZPZ_LOCAL_HISTORY=1env vars
Recording MetricsβοΈ
summary = history.update(
{"loss": 0.42, "lr": 1e-3},
step=42, # forwarded to tracker backends
precision=6, # decimal places in summary string
)
Each call to update():
- Appends values to the internal history dict
- Computes min, max, mean, std across all ranks (when distributed)
- Dispatches metrics to all configured backends
- Writes a JSONL entry to disk
- Returns a compact summary string suitable for the console
Console summary formatβοΈ
update() returns a column-aligned key=value(Β±std) string designed to
collapse the noisy loss=β¦ loss/mean=β¦ loss/max=β¦ loss/min=β¦ loss/std=β¦
shape into a single scannable line:
iter=180 loss=0.078(Β± 0.026) accuracy=0.984(Β±9.9e-3) dtf=0.014(Β±7.4e-4) lr=0.000059 memory=0.01/0.01GiB (0%)
Format rules (all handled by ezpz.utils.format_compact_summary):
- For each base metric
Xwith a siblingX/std(auto-computed when distributed history is on), the std is appended inline asX=value(Β±std). The/mean,/min,/max,/avgcompanions are dropped from the console β trackers still receive them. - Std values are right-aligned in a 6-char column so a row with
(Β±0.070)lines up under one with(Β±5.1e-4)or(Β± 0.12). - Counter-like keys (
iter,step,epoch,batch,idx) are bare (no(Β±std)) and left-edge padded so the next field aligns across rows:iter=8 loss=β¦lines up underiter=180 loss=β¦. - Replicated hyperparameters (
lr,momentum,weight_decay,beta1,beta2,eps,clip_grad,warmup_steps, β¦) are bare β their per-step std is always 0 across ranks, so no parenthetical and no padding gap is emitted. - Std formatting: 2 sig figs, fixed-point for magnitudes β₯ 0.01
(
0.16,0.020,1.3), scientific for magnitudes < 0.01 with the leading exponent zero stripped (5.1e-4, not5.1e-04). Bounded width keeps columns stable across runs.
Device memory trackingβοΈ
To include per-step GPU memory in the same line, merge in
ezpz.get_memory_metrics before calling
update():
import ezpz
metrics = {"loss": loss.item(), "dtf": t_forward}
metrics |= ezpz.get_memory_metrics(device, prefix="train/")
summary = history.update(metrics)
The helper returns 4 keys when supported (CUDA, XPU): mem_alloc,
mem_peak_alloc, mem_reserved, mem_peak_reserved (units: GiB).
Returns {} on CPU / MPS so the laptop smoke path stays clean. Opt
out on supported devices via EZPZ_TRACK_MEMORY=0. The 4 memory keys
are auto-collapsed in the console line into a single
memory=alloc/reserved (peak%) token β full per-rank aggregations
still flow to the trackers.
Distributed statisticsβοΈ
For each scalar metric "loss", distributed history creates:
| Key | Value |
|---|---|
loss/mean |
Mean across all ranks |
loss/max |
Maximum across all ranks |
loss/min |
Minimum across all ranks |
loss/std |
Standard deviation across ranks |
Tracking throughput and MFU
To report TFLOPS and MFU (Model FLOPS Utilization) alongside
loss, use ezpz.flops:
from ezpz.flops import compute_mfu, try_estimate
model_flops = try_estimate(model, input_shape=(batch, seq))
# ... per step:
history.update({
"loss": loss.item(),
"tflops": model_flops / dt / 1e12,
"mfu": compute_mfu(model_flops, dt),
})
See the MFU Tracking recipe for the full pattern.
BackendsβοΈ
Backends control where metrics are dispatched when update() is called.
Pass one or more backend names via backends= in the constructor.
Weights & BiasesβοΈ
history = History(
project_name="my-project",
backends="wandb",
config={"lr": 1e-4, "batch_size": 32},
)
The wandb backend:
- Rank 0 initializes a real run (online/offline per
WANDB_MODE) - Rank != 0 gets
mode="disabled"β no network calls, no duplicate runs - Resolves project name from: argument >
WB_PROJECT>WANDB_PROJECT>WB_PROJECT_NAMEenv vars > script-derived default - Auto-logs system info (hostname, torch version, ezpz version)
- Logs metrics on each
update(), uploads training history table onfinalize() - Logs matplotlib plots as image artifacts
Set WANDB_DISABLED=1 or backends="none" to disable.
MLflowβοΈ
Built-in backend that logs to an MLflow Tracking server or local filesystem.
Standalone setup
For one-call MLflow initialization outside of History, use
ezpz.setup_mlflow() β it mirrors ezpz.setup_wandb():
# Enable MLflow tracking
EZPZ_TRACKER_BACKENDS=mlflow ezpz launch python3 -m ezpz.examples.vit
# Use alongside wandb
EZPZ_TRACKER_BACKENDS=wandb,mlflow ezpz launch python3 -m ezpz.examples.vit
Features:
- Automatic time-series: Step counter auto-increments so MLflow shows line charts (not bar charts) by default
- System metrics: CPU/GPU/memory usage logged automatically via
mlflow.enable_system_metrics_logging()(requirespsutil) - Environment params: Hostname, device, world size, git branch, torch
version, etc. logged under
ezpz.*prefix - User config: Logged under
config.*prefix (nested dicts flattened with dot-separated keys) - Metric grouping: When distributed stats are present (
loss/mean,/min,/max,/std), the raw per-rank value is renamed toloss/localso MLflow groups them under a collapsibleloss/section - Artifact uploads: On
finalize(), JSONL logs, markdown reports, plots, and datasets are uploaded as run artifacts - Rank-aware: Only rank 0 creates a run; all other ranks are silent no-ops
Experiment name resolution (first match wins):
project_nameargumentMLFLOW_EXPERIMENT_NAMEenv varWB_PROJECT/WANDB_PROJECT/WB_PROJECT_NAMEenv vars- Auto-derived from script:
ezpz.{parent}.{stem}(e.g.ezpz.examples.vit)
Authentication:
| Env var | Auth method |
|---|---|
MLFLOW_TRACKING_TOKEN |
Bearer token (MLflow native) |
AMSC_API_KEY |
X-API-Key header (for AMSC servers) |
Credentials are loaded automatically from dotenv files:
~/.amsc.envβ user-level credentials (loaded first)- Project
.envβ project-level overrides (loaded second)
Example ~/.amsc.env:
AMSC_API_KEY=your-api-key-here
MLFLOW_TRACKING_URI=https://mlflow.american-science-cloud.org
MLFLOW_TRACKING_INSECURE_TLS=true
python-dotenv not installed?
Set the variables directly in your shell or job script. Dotenv loading is a convenience, not a requirement.
CSVβοΈ
history = History(backends="csv", outdir="./logs")
history.update({"loss": 0.5, "lr": 1e-4})
history.update({"loss": 0.3, "lr": 1e-4, "grad_norm": 0.8}) # columns auto-extend
The CSV backend writes to the outdir:
| File | Content |
|---|---|
metrics.csv |
One row per update() call, columns auto-extend as new keys appear |
config.json |
Merged config from the config= constructor argument |
training_history.csv |
Written by log_table() on finalize() |
- Rank 0 only β non-rank-0 processes buffer rows in memory but skip all file I/O
NoneβοΈ
Pass backends="none" (or set EZPZ_TRACKER_BACKENDS=none) to disable
tracking entirely. Returns a NullTracker where all methods are no-ops.
Multiple BackendsβοΈ
Combine backends to dispatch everywhere at once:
history = History(
project_name="my-project",
backends="wandb,mlflow,csv",
outdir="./outputs",
config={"lr": 1e-4, "batch_size": 32},
)
Every update() call fans out to all backends.
Error IsolationβοΈ
Backend errors are isolated β if one backend fails during log(),
log_config(), or finish(), the others still receive the call. A
warning is logged but your training run is never interrupted by a
tracking failure. This means a flaky network connection to your MLflow
server won't crash a multi-day training job.
Custom BackendsβοΈ
Register your own backend by subclassing TrackerBackend:
from ezpz.tracker import TrackerBackend, register_backend
class MyBackend(TrackerBackend):
name = "my_backend"
def __init__(self, **kwargs):
self.entries = []
def log(self, metrics, step=None, commit=True):
self.entries.append(metrics)
def log_config(self, config):
pass
def finish(self):
pass
register_backend("my_backend", MyBackend)
# Now usable in History
history = History(backends="my_backend,csv", outdir="./logs")
Override the optional methods (log_table, log_image, watch) for
richer functionality.
Accessing Backend-Specific FeaturesβοΈ
Use the tracker property to reach backend-specific APIs:
# Attach wandb gradient tracking
wb = history.tracker.get_backend("wandb")
if wb is not None:
wb.watch(model, log="all")
# Access the underlying wandb.Run
if history.tracker.wandb_run is not None:
history.tracker.wandb_run.summary["best_loss"] = 0.01
# Log an image (wandb only)
history.tracker.log_image("sample", "outputs/sample.png", caption="Epoch 10")
FinalizationβοΈ
result = history.finalize(
outdir="./outputs",
run_name="my-experiment",
warmup=0.05, # drop first 5% of samples
num_chains=128, # chains in ridge plots
)
finalize() produces:
| Output | Path |
|---|---|
| xarray Dataset(s) | {outdir}/train.h5, {outdir}/eval.h5 (one per group) |
| Matplotlib plots | {outdir}/plots/mplot/{group}/*.png |
| Terminal plots | {outdir}/plots/tplot/{group}/*.txt |
| Markdown report | {outdir}/report.md |
| Metrics JSONL | {outdir}/metrics.jsonl |
| Metrics CSV | {outdir}/metrics.csv (when csv backend is active) |
| JSON log symlink | {outdir}/{timestamp}-rank0.jsonl β logs/... |
All output is co-located in {outdir}:
- The CSV backend is automatically redirected to
{outdir}sometrics.csv,config.json, andtraining_history.csvland alongside plots and reports. - A symlink to the structured JSON log file is created in
{outdir}so you don't have to hunt for it underlogs/.
Grouped outputβοΈ
When metrics use prefixed keys (e.g. "train/loss", "eval/acc"),
finalize() produces separate datasets, plots, and output
directories per group. Each group has its own draw dimension β no
NaN padding between groups with different numbers of steps.
outputs/my-experiment/
βββ train.h5 # train metrics only (101 steps)
βββ eval.h5 # eval metrics only (1250 steps)
βββ plots/
β βββ mplot/
β β βββ train/ # train/ matplotlib plots
β β βββ eval/ # eval/ matplotlib plots
β βββ tplot/
β βββ train/ # train/ terminal plots
β βββ eval/ # eval/ terminal plots
βββ report.md
βββ metrics.jsonl
Return value: When multiple groups exist, finalize() returns
dict[str, xr.Dataset] mapping group prefix to its dataset. With a
single group (or unprefixed metrics), it returns a single xr.Dataset
for backward compatibility.
result = history.finalize(outdir="./outputs")
# Grouped (train/ + eval/ prefixes):
result["train"] # xr.Dataset with train metrics
result["eval"] # xr.Dataset with eval metrics
# Unprefixed (single group):
result.data_vars # xr.Dataset directly
It also uploads the training history table to any active backends and
calls tracker.finish() to flush and close all backend connections.
Terminal plot output
finalize() generates text-based plots directly in the terminal.
Each metric gets individual plots, and distributed runs include
min/max/mean/std variants:
accuracy accuracy/min
ββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββ
0.984β€ ββ βββ ββ βββ0.969β€ -- -- -----------------β
0.921β€ β βββββββββββββββββββββββββ0.902β€ - --------------------- -------β
β β β ββββββββββββββ ββββ ββββ ββ0.770β€ ------ - β
0.857β€ β βββββ ββ β β0.703β€--- β
0.793β€ ββββ β0.570β€-- β
βββββ β ββ¬ββββββββ¬ββββββββ¬βββββββ¬ββββββββ¬β
0.729β€ββ β β 1.0 49.2 97.5 145.8 194.0
βββ βaccuracy/min iter
0.665β€β β accuracy/std
0.602β€β β ββββββββββββββββββββββββββββββββββ
ββ¬ββββββββ¬ββββββββ¬βββββββ¬ββββββββ¬β0.078β€ * β
1.0 49.2 97.5 145.8 194.0 0.065β€ ***** β
accuracy iter 0.039β€******** ** * * β
accuracy/mean 0.026β€************************ *******β
ββββββββββββββββββββββββββββββββββ0.000β€ ******************************β
0.977β€ Β·Β· Β· Β·Β·Β·Β·β ββ¬ββββββββ¬ββββββββ¬βββββββ¬ββββββββ¬β
0.916β€ Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·β 1.0 49.2 97.5 145.8 194.0
β Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β· Β·Β·Β·Β· Β· Β· Β·Β·βaccuracy/std iter
0.855β€ Β· Β·Β·Β·Β·Β·Β· Β· Β· Β· β accuracy/max
0.795β€ Β·Β·Β·Β· Β· β ββββββββββββββββββββββββββββββββββ
β Β·Β·Β· β0.984β€ + + +++++++++++++++β
0.734β€ Β·Β· β0.930β€ + ++++++++++++++++++++++ ++++++β
0.674β€Β·Β· β0.820β€ ++++ ++ + β
βΒ·Β· β0.766β€++ β
0.613β€Β· β0.656β€++ β
ββ¬ββββββββ¬ββββββββ¬βββββββ¬ββββββββ¬β ββ¬ββββββββ¬ββββββββ¬βββββββ¬ββββββββ¬β
1.0 49.2 97.5 145.8 194.0 1.0 49.2 97.5 145.8 194.0
accuracy/mean iter accuracy/max iter
Combined summary with all statistics overlaid
(Β· mean, - min, + max, β raw):
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
0.984β€ ++ accuracy/max ββ ββ + β β +β+β β
β -- accuracy/min ββββ β ββ β+ ββ ββ Β·βββ β
β Β·Β· accuracy/mean β+ ββ + βββββ+ββ+ βββββββββββββββββββΒ·β
β ββ accuracy Β·+ββ β+βββββ βββΒ·+Β·βΒ·βββΒ·ββββββββββββββββββΒ·βββββββββ
0.915β€ + +ββ ββΒ·βββββββββββββββββββΒ·ββ-ββ-βββββββββββΒ·ββΒ·Β·ββββ ββββ
β + βββββββββΒ·βββββββββββββββββββ-β Β·β ββββββΒ· ββ--β ββ ββββ
β β β +βββββββββ-ββΒ·β-ββ ββββ- β β β - β Β·- ββ β Β· β
β β βββ+βββββββΒ·--ββΒ·β-ββ - β - β
β β+ βββββΒ·βββββ β Β·β β β
0.846β€ β+ βββββ-Β· Β· - -β β
β +ββββββββ - Β· - - β
β ++ββββββββ - - - β
β ++βββββ-- - - β
0.777β€ ++ββΒ·-β - β
β βΒ·ββ- β β
βββΒ·βΒ· β β
βββββ- β β
0.708β€ββββ- β
βββββ- β
βββΒ·β- β
βββΒ· β
βββ- β
0.639β€Β·β- β
βΒ·β β
β-β β
β- β
0.570β€- β
ββ¬ββββββββββββββββββ¬βββββββββββββββββ¬ββββββββββββββββββ¬βββββββββββββββββ¬β
1.0 49.2 97.5 145.8 194.0
Histograms for each statistic:
accuracy/mean hist accuracy/max hist
βββββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββ
80.0β€ ββββ β77.0β€ ββββ β
β ββββ β β ββββ β
66.7β€ ββββ β64.2β€ ββββ β
53.3β€ ββββ β51.3β€ ββββ β
β ββββ β β ββββ β
40.0β€ βββββββ β38.5β€ βββββββββββ
β βββββββββββ β βββββββββββ
26.7β€ βββββββββββ25.7β€ βββββββββββ
13.3β€ ββββββββββββββββββ12.8β€ ββββββββββββββ
β βββββββββββββββββββββ β βββββββββββββββββββββ
0.0β€ββββββββββββββββββββββββββββββββββ 0.0β€βββββββ ββββββββββββββββββββββββ
ββ¬ββββββββ¬ββββββββ¬ββββββββ¬ββββββββ¬β ββ¬ββββββββ¬ββββββββ¬ββββββββ¬ββββββββ¬β
0.60 0.70 0.79 0.89 0.99 0.64 0.73 0.82 0.91 1.00
accuracy/min hist accuracy/std hist
βββββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββ
83.0β€ ββββ β67.0β€ ββββ β
β ββββ β β ββββ β
69.2β€ ββββ β55.8β€βββββββ β
55.3β€ ββββ β44.7β€βββββββ β
β ββββ β ββββββββ β
41.5β€ ββββββββ33.5β€ββββββββββ β
β βββββββββββ βββββββββββ β
27.7β€ βββββββββββ22.3β€βββββββββββββ β
13.8β€ βββββββββββ11.2β€βββββββββββββ β
β ββββββββββββββββββ βββββββββββββββββββββ β
0.0β€ββββββββββββββββββββββββββββββββββ 0.0β€ββββββββββββββββββββββββββ βββββ
ββ¬ββββββββ¬ββββββββ¬ββββββββ¬ββββββββ¬β ββ¬ββββββββ¬ββββββββ¬ββββββββ¬ββββββββ¬β
0.55 0.66 0.77 0.88 0.99 -0.003 0.018 0.039 0.060 0.082
StopWatch: Timing Context ManagerβοΈ
from ezpz.history import StopWatch
with StopWatch("forward pass", wbtag="timing/forward"):
output = model(batch)
Logs elapsed time to the logger and optionally to W&B. Useful for profiling individual phases within your training loop.
Environment VariablesβοΈ
| Variable | Effect |
|---|---|
EZPZ_NO_DISTRIBUTED_HISTORY |
Disable distributed aggregation |
EZPZ_LOCAL_HISTORY |
Alias for above |
EZPZ_TRACKER_BACKENDS |
Fallback backend list when backends arg is None |
EZPZ_TRACK_MEMORY |
Set to 0 to suppress per-step device memory keys emitted by ezpz.get_memory_metrics. Default: 1 (enabled on CUDA/XPU, no-op on CPU/MPS) |
WANDB_MODE |
Controls wandb mode (disabled, offline, online) |
WANDB_DISABLED |
Set to 1 to disable wandb entirely |
WB_PROJECT / WANDB_PROJECT |
Default wandb project name |
EZPZ_TPLOT_MARKER |
Marker style for terminal plots (braille, fhd, hd) |
EZPZ_TPLOT_TYPE |
Default plot type (line, hist) |
See AlsoβοΈ
- Quick Start for minimal setup
- Python API:
ezpz.historyfor the full History API reference