Train ViT with FSDP on MNISTβοΈ
Key API Functions
setup_torch()β Initialize distributed trainingwrap_model()β Wrap model for DDP / FSDPViTConfigβ Vision Transformer configurationHistoryβ Track training metrics
See:
- π examples/ViT
- π src/ezpz/examples/vit.py
SourceβοΈ
src/ezpz/examples/vit.py
"""Train a lightweight Vision Transformer on fake or MNIST data.
Launch with:
ezpz launch -m ezpz.examples.vit --dataset mnist --batch_size 256
Quick smoke test on a laptop:
python -m ezpz.examples.vit --dataset fake --max_iters 1 \
--batch_size 4 --img_size 64 --patch_size 8 \
--num_heads 2 --head_dim 16 --depth 2 --num_classes 10
Model presets:
--model debug|small|medium|med|large
Help output (``python3 -m ezpz.examples.vit --help``):
usage: ezpz.examples.vit [-h] [--img_size IMG_SIZE] [--batch_size BATCH_SIZE]
[--num_heads NUM_HEADS] [--head_dim HEAD_DIM]
[--hidden-dim HIDDEN_DIM] [--mlp-dim MLP_DIM]
[--dropout DROPOUT]
[--attention-dropout ATTENTION_DROPOUT]
[--num_classes NUM_CLASSES] [--dataset {fake,mnist}]
[--depth DEPTH] [--patch_size PATCH_SIZE]
[--dtype DTYPE] [--compile]
[--num_workers NUM_WORKERS] [--max_iters MAX_ITERS]
[--warmup WARMUP] [--attn_type {native,sdpa}]
[--cuda_sdpa_backend {flash_sdp,mem_efficient_sdp,math_sdp,cudnn_sdp,all}]
[--fsdp]
Train a simple ViT
options:
-h, --help show this help message and exit
--img_size IMG_SIZE, --img-size IMG_SIZE
Image size
--batch_size BATCH_SIZE, --batch-size BATCH_SIZE
Batch size
--num_heads NUM_HEADS, --num-heads NUM_HEADS
Number of heads
--head_dim HEAD_DIM, --head-dim HEAD_DIM
Hidden Dimension
--hidden-dim HIDDEN_DIM, --hidden_dim HIDDEN_DIM
Hidden Dimension
--mlp-dim MLP_DIM, --mlp_dim MLP_DIM
MLP Dimension
--dropout DROPOUT Dropout rate
--attention-dropout ATTENTION_DROPOUT, --attention_dropout ATTENTION_DROPOUT
Attention Dropout rate
--num_classes NUM_CLASSES, --num-classes NUM_CLASSES
Number of classes
--dataset {fake,mnist}
Dataset to use
--depth DEPTH Depth
--patch_size PATCH_SIZE, --patch-size PATCH_SIZE
Patch size
--dtype DTYPE Data type
--compile Compile model
--num_workers NUM_WORKERS, --num-workers NUM_WORKERS
Number of workers
--max_iters MAX_ITERS, --max-iters MAX_ITERS
Maximum iterations
--warmup WARMUP Warmup iterations (or fraction) before starting to collect metrics.
--attn_type {native,sdpa}, --attn-type {native,sdpa}
Attention function to use.
--cuda_sdpa_backend {flash_sdp,mem_efficient_sdp,math_sdp,cudnn_sdp,all}, --cuda-sdpa-backend {flash_sdp,mem_efficient_sdp,math_sdp,cudnn_sdp,all}
CUDA SDPA backend to use.
--fsdp Use FSDP
"""
import argparse
import functools
import math
from pathlib import Path
import sys
import time
from typing import Any, Optional
import torch
import ezpz
import ezpz.distributed
# from TORCH_DTYPES_MAP
from ezpz.data.vision import get_fake_data, get_mnist
from ezpz.examples import get_example_outdir
from ezpz.models import summarize_model
from ezpz.models.vit.attention import AttentionBlock
logger = ezpz.get_logger(__name__)
fp = Path(__file__)
WBPROJ_NAME = f"ezpz.{fp.parent.stem}.{fp.stem}"
MODEL_PRESETS = {
"debug": {
"batch_size": 4,
"num_heads": 2,
"head_dim": 16,
"depth": 2,
},
"small": {
"batch_size": 128,
"num_heads": 16,
"head_dim": 64,
"depth": 24,
},
"medium": {
"batch_size": 64,
"num_heads": 12,
"head_dim": 64,
"depth": 16,
},
"large": {
"batch_size": 32,
"num_heads": 16,
"head_dim": 64,
"depth": 32,
},
}
MODEL_ALIASES = {"med": "medium"}
MODEL_PRESET_FLAGS = {
"batch_size": ["--batch_size", "--batch-size"],
"num_heads": ["--num_heads", "--num-heads"],
"head_dim": ["--head_dim", "--head-dim"],
"depth": ["--depth"],
}
MNIST_DEFAULTS = {
"img_size": 28,
"num_classes": 10,
"patch_size": 4,
}
MNIST_DEFAULT_FLAGS = {
"img_size": ["--img_size", "--img-size"],
"num_classes": ["--num_classes", "--num-classes"],
"patch_size": ["--patch_size", "--patch-size"],
}
try:
import wandb
except Exception:
wandb = None # type:ignore
logger.warning("Failed to import wandb")
class PatchEmbed(torch.nn.Module):
"""Convert images into patch embeddings."""
def __init__(
self,
img_size: int,
patch_size: int,
in_chans: int,
embed_dim: int,
) -> None:
super().__init__()
if img_size % patch_size != 0:
raise ValueError("img_size must be divisible by patch_size")
self.img_size = img_size
self.patch_size = patch_size
self.num_patches = (img_size // patch_size) ** 2
self.proj = torch.nn.Conv2d(
in_chans,
embed_dim,
kernel_size=patch_size,
stride=patch_size,
)
def forward(self, x: torch.Tensor) -> torch.Tensor:
x = self.proj(x)
x = x.flatten(2).transpose(1, 2)
return x
class SimpleVisionTransformer(torch.nn.Module):
"""Minimal Vision Transformer implementation without timm."""
def __init__(
self,
img_size: int,
patch_size: int,
in_chans: int,
embed_dim: int,
depth: int,
num_heads: int,
num_classes: int,
block_fn: Any,
class_token: bool = False,
global_pool: str = "avg",
dropout: float = 0.0,
) -> None:
super().__init__()
self.patch_embed = PatchEmbed(
img_size=img_size,
patch_size=patch_size,
in_chans=in_chans,
embed_dim=embed_dim,
)
num_patches = self.patch_embed.num_patches
self.class_token = class_token
self.global_pool = global_pool
if class_token:
self.cls_token = torch.nn.Parameter(torch.zeros(1, 1, embed_dim))
num_patches += 1
else:
self.cls_token = None
self.pos_embed = torch.nn.Parameter(
torch.zeros(1, num_patches, embed_dim)
)
self.pos_drop = torch.nn.Dropout(p=dropout)
self.blocks = torch.nn.ModuleList(
[
block_fn(dim=embed_dim, num_heads=num_heads)
for _ in range(depth)
]
)
self.norm = torch.nn.LayerNorm(embed_dim)
self.head = (
torch.nn.Linear(embed_dim, num_classes)
if num_classes > 0
else torch.nn.Identity()
)
self._init_weights()
def _init_weights(self) -> None:
torch.nn.init.trunc_normal_(self.pos_embed, std=0.02)
if self.cls_token is not None:
torch.nn.init.trunc_normal_(self.cls_token, std=0.02)
def forward(self, x: torch.Tensor) -> torch.Tensor:
x = self.patch_embed(x)
if self.cls_token is not None:
cls_tokens = self.cls_token.expand(x.size(0), -1, -1)
x = torch.cat((cls_tokens, x), dim=1)
x = self.pos_drop(x + self.pos_embed)
for block in self.blocks:
x = block(x)
x = self.norm(x)
if self.global_pool == "avg":
if self.cls_token is not None:
x = x[:, 1:]
x = x.mean(dim=1)
elif self.cls_token is not None:
x = x[:, 0]
else:
x = x.mean(dim=1)
return self.head(x)
def _arg_provided(argv: list[str], flags: list[str]) -> bool:
return any(flag in argv for flag in flags)
def apply_model_preset(args: argparse.Namespace, argv: list[str]) -> None:
if args.model is None:
return
model_name = args.model
model_key = MODEL_ALIASES.get(model_name)
if model_key is None:
model_key = model_name
preset = MODEL_PRESETS[model_key]
for field_name, value in preset.items():
flags = MODEL_PRESET_FLAGS.get(field_name, [])
if not _arg_provided(argv, flags):
setattr(args, field_name, value)
def apply_dataset_overrides(args: argparse.Namespace, argv: list[str]) -> None:
if args.dataset != "mnist":
return
for field_name, value in MNIST_DEFAULTS.items():
flags = MNIST_DEFAULT_FLAGS.get(field_name, [])
if not _arg_provided(argv, flags):
setattr(args, field_name, value)
def validate_dataset_args(args: argparse.Namespace) -> None:
if args.dataset != "mnist":
return
if args.img_size % args.patch_size != 0:
raise ValueError(
"MNIST img_size must be divisible by patch_size; "
f"got img_size={args.img_size} and patch_size={args.patch_size}."
)
def parse_args(argv: Optional[list[str]] = None) -> argparse.Namespace:
"""Parse CLI arguments for ViT training."""
if argv is None:
argv = sys.argv[1:]
parser = argparse.ArgumentParser(
prog="ezpz.examples.vit",
description="Train a simple ViT",
)
parser.add_argument(
"--img_size",
"--img-size",
type=int,
default=224,
help="Image size",
)
parser.add_argument(
"--batch_size",
"--batch-size",
type=int,
default=128,
help="Batch size",
)
parser.add_argument(
"--num_heads",
"--num-heads",
type=int,
default=16,
help="Number of heads",
)
parser.add_argument(
"--head_dim",
"--head-dim",
type=int,
default=64,
help="Hidden Dimension",
)
parser.add_argument(
"--hidden-dim",
"--hidden_dim",
type=int,
default=1024,
help="Hidden Dimension",
)
parser.add_argument(
"--mlp-dim", "--mlp_dim", type=int, default=2048, help="MLP Dimension"
)
parser.add_argument(
"--dropout", type=float, default=0.1, help="Dropout rate"
)
parser.add_argument(
"--attention-dropout",
"--attention_dropout",
type=float,
default=0.0,
help="Attention Dropout rate",
)
parser.add_argument(
"--num_classes",
"--num-classes",
type=int,
default=1000,
help="Number of classes",
)
parser.add_argument(
"--dataset",
default="mnist",
choices=["fake", "mnist"],
help="Dataset to use",
)
parser.add_argument(
"--model",
default=None,
choices=sorted([*MODEL_PRESETS.keys(), *MODEL_ALIASES.keys()]),
help="Model size preset (overrides defaults)",
)
parser.add_argument("--depth", type=int, default=24, help="Depth")
parser.add_argument(
"--patch_size",
"--patch-size",
type=int,
default=16,
help="Patch size",
)
parser.add_argument("--dtype", type=str, default="bf16", help="Data type")
parser.add_argument("--compile", action="store_true", help="Compile model")
parser.add_argument(
"--num_workers",
"--num-workers",
type=int,
default=0,
help="Number of workers",
)
parser.add_argument(
"--max_iters",
"--max-iters",
type=int,
default=100,
help="Maximum iterations",
)
parser.add_argument(
"--warmup",
type=float,
default=0.1,
help="Warmup iterations (or fraction) before starting to collect metrics.",
)
parser.add_argument(
"--attn_type",
"--attn-type",
default="native",
choices=["native", "sdpa"],
help="Attention function to use.",
)
parser.add_argument(
"--cuda_sdpa_backend",
"--cuda-sdpa-backend",
default="all",
choices=[
"flash_sdp",
"mem_efficient_sdp",
"math_sdp",
"cudnn_sdp",
"all",
],
help="CUDA SDPA backend to use.",
)
parser.add_argument("--fsdp", action="store_true", help="Use FSDP")
args = parser.parse_args(argv)
apply_model_preset(args, argv)
apply_dataset_overrides(args, argv)
validate_dataset_args(args)
return args
# def get_device_type():
# """Resolve the torch device type, falling back if MPS lacks collectives."""
# import os
#
# device_override = os.environ.get("TORCH_DEVICE")
# device_type = device_override or ezpz.get_torch_device()
# if isinstance(device_type, str) and device_type.startswith("mps"):
# logger.warning(
# "MPS does not support torch.distributed collectives; falling back to CPU"
# )
# return "cpu"
# return ezpz.get_torch_device_type()
@ezpz.timeitlogit(rank=ezpz.get_rank())
def train_fn(
block_fn: Any,
args: argparse.Namespace,
dataset: Optional[str] = "fake",
) -> ezpz.History:
"""Train the Vision Transformer on fake or MNIST data.
Args:
block_fn: Attention block constructor with attn_fn injected.
args: Training hyperparameters.
dataset: Dataset choice, either ``fake`` or ``mnist``.
Returns:
History of training metrics.
"""
# seed = int(os.environ.get('SEED', '0'))
# rank = ezpz.setup(backend='DDP', seed=seed)
world_size = ezpz.distributed.get_world_size()
local_rank = ezpz.distributed.get_local_rank()
# device_type = str(ezpz.get_torch_device(as_torch_device=False))
device_type = ezpz.distributed.get_torch_device_type()
device = torch.device(f"{device_type}:{local_rank}")
# torch.set_default_device(device)
logger.info("train_args=%s", vars(args))
if dataset == "fake":
dataset_dict = get_fake_data(
img_size=args.img_size,
batch_size=args.batch_size,
num_workers=args.num_workers,
)
elif dataset == "mnist":
dataset_dict = get_mnist(
train_batch_size=args.batch_size,
test_batch_size=args.batch_size,
download=(ezpz.distributed.get_rank() == 0),
)
else:
raise ValueError(
f"Unknown dataset: {dataset}. Expected 'fake' or 'mnist'."
)
# data = get
# train_set = FakeImageDataset(config.img_size)
# logger.info(f'{len(train_set)=}')
# train_loader = DataLoader(
# train_set,
# batch_size=config.batch_size,
# num_workers=args.num_workers,
# pin_memory=True,
# drop_last=True,
# )
in_chans = 1 if dataset == "mnist" else 3
model = SimpleVisionTransformer(
img_size=args.img_size,
patch_size=args.patch_size,
in_chans=in_chans,
embed_dim=(args.num_heads * args.head_dim),
depth=args.depth,
num_heads=args.num_heads,
num_classes=args.num_classes,
class_token=False,
global_pool="avg",
block_fn=block_fn,
dropout=args.dropout,
)
mstr = summarize_model(
model,
verbose=False,
depth=1,
input_size=(
args.batch_size,
in_chans,
args.img_size,
args.img_size,
),
)
model.to(device)
num_params = sum(
[
sum(
[
getattr(p, "ds_numel", 0)
if hasattr(p, "ds_id")
else p.nelement()
for p in model_module.parameters()
]
)
for model_module in model.modules()
]
)
model_size_in_billions = num_params / 1e9
logger.info(f"\n{mstr}")
logger.info(f"Model size: nparams={model_size_in_billions:.2f} B")
if wandb is not None and ezpz.verify_wandb():
if (run := getattr(wandb, "run")) is not None and run is wandb.run:
try:
if args.compile:
logger.info("Skipping wandb watch while compiling")
else:
wandb.run.watch(model, log="all") # type:ignore
except Exception as e:
logger.exception(e)
logger.warning(
"Failed to watch model with wandb; continuing..."
)
# model = ezpz.distributed.wrap_model(
# model=model,
# use_fsdp=args.fsdp,
# dtype=args.dtype,
# # device_id=int(ezpz.get_local_rank())
# )
if world_size > 1:
model = ezpz.distributed.wrap_model(
model=model,
use_fsdp=args.fsdp,
dtype=args.dtype,
device_id=int(ezpz.get_local_rank()),
)
# if args.fsdp:
# logger.info("Using FSDP for distributed training")
# if args.dtype in {"fp16", "bf16", "fp32"}:
# try:
# model = FSDP(
# model,
# mixed_precision=MixedPrecision(
# param_dtype=TORCH_DTYPES_MAP[args.dtype],
# reduce_dtype=torch.float32,
# cast_forward_inputs=True,
# ),
# )
# except Exception as exc:
# logger.warning(f"Encountered exception: {exc}")
# logger.warning(
# "Unable to wrap model with FSDP. Falling back to DDP..."
# )
# model = ezpz.distributed.wrap_model(model=model, f)
# else:
# try:
# model = FSDP(model)
# except Exception:
# model = ezpz.distributed.wrap_model(args=args, model=model)
# else:
# logger.info("Using DDP for distributed training")
# model = ezpz.distributed.prepare_model_for_ddp(model)
if args.compile:
logger.info("Compiling model")
model = torch.compile(model)
torch_dtype = ezpz.distributed.TORCH_DTYPES_MAP[args.dtype]
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(model.parameters()) # type:ignore
model.train() # type:ignore
outdir = get_example_outdir(WBPROJ_NAME)
logger.info("Outputs will be saved to %s", outdir)
metrics_path = outdir.joinpath("metrics.jsonl")
eval_metrics_path = outdir.joinpath("metrics-eval.jsonl")
history = ezpz.history.History(
report_dir=outdir,
report_enabled=True,
jsonl_path=metrics_path,
jsonl_overwrite=True,
distributed_history=(1 < world_size <= 384),
)
eval_history = ezpz.history.History(
report_dir=outdir,
report_enabled=True,
jsonl_path=eval_metrics_path,
jsonl_overwrite=True,
distributed_history=(1 < world_size <= 384),
)
logger.info(
f"Training with {world_size} x {device_type} (s), using {torch_dtype=}"
)
warmup_iters = (
int(args.warmup)
if args.warmup >= 1.0
else int(
args.warmup
* (
args.max_iters
if args.max_iters is not None
else len(dataset_dict["train"]["loader"])
)
)
)
# data["train"].to(ezpz.distributed.get_torch_device_type())
last_step = -1
for step, batch in enumerate(dataset_dict["train"]["loader"]):
last_step = step
if args.max_iters is not None and step > int(args.max_iters):
break
if step < warmup_iters:
logger.info("warmup step %d / %d", step, warmup_iters)
t0 = time.perf_counter()
inputs = batch[0].to(device=device, non_blocking=True)
label = batch[1].to(device=device, non_blocking=True)
ezpz.distributed.synchronize()
with torch.autocast(device_type=device_type, dtype=torch_dtype):
t1 = time.perf_counter()
outputs = model(inputs)
loss = criterion(outputs, label)
acc = (outputs.argmax(dim=-1) == label).float().mean()
t2 = time.perf_counter()
ezpz.distributed.synchronize()
optimizer.zero_grad(set_to_none=True)
loss.backward()
ezpz.distributed.synchronize()
t3 = time.perf_counter()
optimizer.step()
ezpz.distributed.synchronize()
t4 = time.perf_counter()
if step >= warmup_iters:
loss_value = float(loss.detach().item())
acc_value = float(acc.detach().item())
if not math.isfinite(loss_value) or not math.isfinite(acc_value):
logger.warning(
"Skipping non-finite train metrics at step=%s", step
)
continue
train_msg = history.update(
{
"train/iter": step,
"train/loss": loss_value,
"train/acc": acc_value,
"train/dt": t4 - t0,
"train/dtd": t1 - t0,
"train/dtf": t2 - t1,
"train/dto": t3 - t2,
"train/dtb": t4 - t3,
}
).replace("train/", "")
logger.info("[train] %s", train_msg)
if "test" in dataset_dict:
model.eval() # type:ignore
eval_loss = 0.0
eval_acc = 0.0
eval_count = 0
eval_step = 0
with torch.no_grad():
for batch in dataset_dict["test"]["loader"]:
inputs = batch[0].to(device=device, non_blocking=True)
labels = batch[1].to(device=device, non_blocking=True)
with torch.autocast(device_type=device_type, dtype=torch_dtype):
outputs = model(inputs)
loss = criterion(outputs, labels)
correct = (outputs.argmax(dim=-1) == labels).sum()
batch_size = labels.numel()
eval_loss += loss.item() * batch_size
eval_acc += correct.item()
eval_count += batch_size
batch_loss = float(loss.detach().item())
batch_acc = float(correct.item() / batch_size)
if math.isfinite(batch_loss) and math.isfinite(batch_acc):
eval_msg = eval_history.update(
{
"eval/iter": eval_step,
"eval/loss": batch_loss,
"eval/acc": batch_acc,
}
).replace("eval/", "")
logger.info("[eval] %s", eval_msg)
eval_step += 1
if eval_count:
total_loss = torch.tensor(eval_loss, device=device)
total_correct = torch.tensor(eval_acc, device=device)
total_count = torch.tensor(eval_count, device=device)
if torch.distributed.is_available() and torch.distributed.is_initialized():
torch.distributed.all_reduce(total_loss)
torch.distributed.all_reduce(total_correct)
torch.distributed.all_reduce(total_count)
eval_loss_value = float(total_loss.item() / total_count.item())
eval_acc_value = float(total_correct.item() / total_count.item())
if not math.isfinite(eval_loss_value) or not math.isfinite(
eval_acc_value
):
logger.warning("Skipping non-finite eval metrics")
model.train() # type:ignore
return history
summary_rows = [
("loss", f"{eval_loss_value:.6f}"),
("acc", f"{eval_acc_value:.6f}"),
("samples", f"{int(total_count.item())}"),
]
header = ("eval metric", "value")
col1 = max(len(header[0]), *(len(row[0]) for row in summary_rows))
col2 = max(len(header[1]), *(len(row[1]) for row in summary_rows))
summary_table = [
"Eval summary:",
f"| {header[0]:<{col1}} | {header[1]:>{col2}} |",
f"|:{'-' * (col1 - 1)} | {'-' * (col2 - 1)}:|",
]
summary_table.extend(
f"| {name:<{col1}} | {value:>{col2}} |"
for name, value in summary_rows
)
logger.info("\n".join(f"[eval] {line}" for line in summary_table))
model.train() # type:ignore
if ezpz.distributed.get_rank() == 0:
if history.history and any(len(v) for v in history.history.values()):
dataset = history.finalize(
outdir=outdir,
run_name=WBPROJ_NAME,
dataset_fname="train",
verbose=False,
)
logger.info(f"{dataset=}")
else:
logger.warning("No train metrics recorded; skipping dataset save")
if "test" in dataset_dict:
if eval_history.history and any(
len(v) for v in eval_history.history.values()
):
eval_dataset = eval_history.finalize(
outdir=outdir,
run_name=WBPROJ_NAME,
dataset_fname="eval",
verbose=False,
)
logger.info(f"{eval_dataset=}")
else:
logger.warning("No eval metrics recorded; skipping dataset save")
return history
@ezpz.timeitlogit(rank=ezpz.get_rank())
def main(args: argparse.Namespace):
"""CLI entrypoint to configure logging and launch ViT training."""
t0 = time.perf_counter()
rank = ezpz.distributed.setup_torch()
t_setup = time.perf_counter()
if rank == 0 and ezpz.verify_wandb():
try:
fp = Path(__file__).resolve()
run = ezpz.setup_wandb(
project_name=f"ezpz.{fp.parent.name}.{fp.stem}"
)
if wandb is not None and run is not None and run is wandb.run:
# assert run is not None and run is wandb.run
wandb.config.update(ezpz.get_dist_info())
wandb.config.update({"args": {**vars(args)}})
except Exception:
logger.warning("Failed to setup wandb, continuing without!")
def attn_fn(
q: torch.Tensor, k: torch.Tensor, v: torch.Tensor
) -> torch.Tensor:
"""Scaled dot-product attention with configurable backend."""
scale = args.head_dim ** (-0.5)
q = q * scale
attn = q @ k.transpose(-2, -1)
attn = attn.softmax(dim=-1)
if args.attention_dropout > 0.0:
attn = torch.nn.functional.dropout(
attn,
p=args.attention_dropout,
training=torch.is_grad_enabled(),
)
x = attn @ v
return x
logger.info(f"Using {args.attn_type} for SDPA backend")
if args.attn_type == "native":
block_fn = functools.partial(AttentionBlock, attn_fn=attn_fn)
# if args.sdpa_backend == 'by_hand':
elif args.attn_type == "sdpa":
if torch.cuda.is_available():
torch.backends.cuda.enable_flash_sdp(False)
torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_math_sdp(False)
torch.backends.cuda.enable_cudnn_sdp(False)
if args.cuda_sdpa_backend in ["flash_sdp", "all"]:
torch.backends.cuda.enable_flash_sdp(True)
if args.cuda_sdpa_backend in ["mem_efficient_sdp", "all"]:
torch.backends.cuda.enable_mem_efficient_sdp(True)
if args.cuda_sdpa_backend in ["math_sdp", "all"]:
torch.backends.cuda.enable_math_sdp(True)
if args.cuda_sdpa_backend in ["cudnn_sdp", "all"]:
torch.backends.cuda.enable_cudnn_sdp(True)
block_fn = functools.partial(
AttentionBlock,
attn_fn=lambda q, k, v: torch.nn.functional.scaled_dot_product_attention(
q,
k,
v,
dropout_p=(
args.attention_dropout if torch.is_grad_enabled() else 0.0
),
),
)
else:
raise ValueError(f"Unknown attention type: {args.attn_type}")
logger.info(f"Using AttentionBlock Attention with {args.compile=}")
train_start = time.perf_counter()
train_fn(block_fn, args=args, dataset=args.dataset)
train_end = time.perf_counter()
t1 = time.perf_counter()
timings = {
"main/setup_torch": t_setup - t0,
"main/train": train_end - train_start,
"main/total": t1 - t0,
}
logger.info("Timings: %s", timings)
if wandb is not None and getattr(wandb, "run", None) is not None:
try:
wandb.log(
{
(f"timings/{k}" if not k.startswith("timings/") else k): v
for k, v in timings.items()
}
)
# wandb.log(
# {
# (f"timings/{k}" if not k.startswith("timings/") else k): v
# for k, v in timings.items()
# }
# )
except Exception:
logger.warning("Failed to log timings to wandb")
if __name__ == "__main__":
args = parse_args()
main(args)
ezpz.distributed.cleanup()
Code WalkthroughβοΈ
Imports and Setup
AttentionBlock is the reusable multi-head self-attention layer shared
across ezpz's model zoo β importing it here lets the ViT assemble
transformer blocks without duplicating attention code. The get_fake_data
/ get_mnist helpers abstract dataset loading so the same training loop
works with randomly generated tensors (for quick smoke tests) or real MNIST
data. summarize_model prints a parameter-count breakdown useful for
verifying FSDP sharding.
import argparse
import functools
import math
from pathlib import Path
import sys
import time
from typing import Any, Optional
import torch
import ezpz
import ezpz.distributed
# from TORCH_DTYPES_MAP
from ezpz.data.vision import get_fake_data, get_mnist
from ezpz.examples import get_example_outdir
from ezpz.models import summarize_model
from ezpz.models.vit.attention import AttentionBlock
logger = ezpz.get_logger(__name__)
fp = Path(__file__)
WBPROJ_NAME = f"ezpz.{fp.parent.stem}.{fp.stem}"
Model Presets
Four named presets (debug, small, medium, large) bundle
batch_size, num_heads, head_dim, and depth together. The med
alias maps to medium. MNIST-specific defaults override img_size,
num_classes, and patch_size when --dataset mnist is used.
MODEL_PRESETS = {
"debug": {
"batch_size": 4,
"num_heads": 2,
"head_dim": 16,
"depth": 2,
},
"small": {
"batch_size": 128,
"num_heads": 16,
"head_dim": 64,
"depth": 24,
},
"medium": {
"batch_size": 64,
"num_heads": 12,
"head_dim": 64,
"depth": 16,
},
"large": {
"batch_size": 32,
"num_heads": 16,
"head_dim": 64,
"depth": 32,
},
}
MODEL_ALIASES = {"med": "medium"}
MODEL_PRESET_FLAGS = {
"batch_size": ["--batch_size", "--batch-size"],
"num_heads": ["--num_heads", "--num-heads"],
"head_dim": ["--head_dim", "--head-dim"],
"depth": ["--depth"],
}
MNIST_DEFAULTS = {
"img_size": 28,
"num_classes": 10,
"patch_size": 4,
}
MNIST_DEFAULT_FLAGS = {
"img_size": ["--img_size", "--img-size"],
"num_classes": ["--num_classes", "--num-classes"],
"patch_size": ["--patch_size", "--patch-size"],
}
PatchEmbed
Converts an image into a sequence of patch embeddings using a strided
Conv2d. The kernel size and stride both equal patch_size.
class PatchEmbed(torch.nn.Module):
"""Convert images into patch embeddings."""
def __init__(
self,
img_size: int,
patch_size: int,
in_chans: int,
embed_dim: int,
) -> None:
super().__init__()
if img_size % patch_size != 0:
raise ValueError("img_size must be divisible by patch_size")
self.img_size = img_size
self.patch_size = patch_size
self.num_patches = (img_size // patch_size) ** 2
self.proj = torch.nn.Conv2d(
in_chans,
embed_dim,
kernel_size=patch_size,
stride=patch_size,
)
def forward(self, x: torch.Tensor) -> torch.Tensor:
x = self.proj(x)
x = x.flatten(2).transpose(1, 2)
return x
SimpleVisionTransformer
Standard ViT: patch embedding, optional class token, learnable positional
embeddings, a stack of AttentionBlock layers, layer norm, and a linear
classification head.
class SimpleVisionTransformer(torch.nn.Module):
"""Minimal Vision Transformer implementation without timm."""
def __init__(
self,
img_size: int,
patch_size: int,
in_chans: int,
embed_dim: int,
depth: int,
num_heads: int,
num_classes: int,
block_fn: Any,
class_token: bool = False,
global_pool: str = "avg",
dropout: float = 0.0,
) -> None:
super().__init__()
self.patch_embed = PatchEmbed(
img_size=img_size,
patch_size=patch_size,
in_chans=in_chans,
embed_dim=embed_dim,
)
num_patches = self.patch_embed.num_patches
self.class_token = class_token
self.global_pool = global_pool
if class_token:
self.cls_token = torch.nn.Parameter(torch.zeros(1, 1, embed_dim))
num_patches += 1
else:
self.cls_token = None
self.pos_embed = torch.nn.Parameter(
torch.zeros(1, num_patches, embed_dim)
)
self.pos_drop = torch.nn.Dropout(p=dropout)
self.blocks = torch.nn.ModuleList(
[
block_fn(dim=embed_dim, num_heads=num_heads)
for _ in range(depth)
]
)
self.norm = torch.nn.LayerNorm(embed_dim)
self.head = (
torch.nn.Linear(embed_dim, num_classes)
if num_classes > 0
else torch.nn.Identity()
)
self._init_weights()
Weight initialization uses truncated normal for positional and class-token embeddings.
def _init_weights(self) -> None:
torch.nn.init.trunc_normal_(self.pos_embed, std=0.02)
if self.cls_token is not None:
torch.nn.init.trunc_normal_(self.cls_token, std=0.02)
The forward pass embeds patches, adds positional encodings, runs through transformer blocks, and pools to a single vector for classification.
def forward(self, x: torch.Tensor) -> torch.Tensor:
x = self.patch_embed(x)
if self.cls_token is not None:
cls_tokens = self.cls_token.expand(x.size(0), -1, -1)
x = torch.cat((cls_tokens, x), dim=1)
x = self.pos_drop(x + self.pos_embed)
for block in self.blocks:
x = block(x)
x = self.norm(x)
if self.global_pool == "avg":
if self.cls_token is not None:
x = x[:, 1:]
x = x.mean(dim=1)
elif self.cls_token is not None:
x = x[:, 0]
else:
x = x.mean(dim=1)
return self.head(x)
Argument Parsing and Preset Application
parse_args builds the CLI, then applies model-size presets and
MNIST-specific defaults for any flags that were not explicitly provided by
the user.
def _arg_provided(argv: list[str], flags: list[str]) -> bool:
return any(flag in argv for flag in flags)
def apply_model_preset(args: argparse.Namespace, argv: list[str]) -> None:
if args.model is None:
return
model_name = args.model
model_key = MODEL_ALIASES.get(model_name)
if model_key is None:
model_key = model_name
preset = MODEL_PRESETS[model_key]
for field_name, value in preset.items():
flags = MODEL_PRESET_FLAGS.get(field_name, [])
if not _arg_provided(argv, flags):
setattr(args, field_name, value)
def apply_dataset_overrides(args: argparse.Namespace, argv: list[str]) -> None:
if args.dataset != "mnist":
return
for field_name, value in MNIST_DEFAULTS.items():
flags = MNIST_DEFAULT_FLAGS.get(field_name, [])
if not _arg_provided(argv, flags):
setattr(args, field_name, value)
train_fn -- Distributed Setup and Data Loading
train_fn is the core training routine. It begins by querying the
distributed environment for world_size, local_rank, and device, then
loads either fake or MNIST data.
@ezpz.timeitlogit(rank=ezpz.get_rank())
def train_fn(
block_fn: Any,
args: argparse.Namespace,
dataset: Optional[str] = "fake",
) -> ezpz.History:
"""Train the Vision Transformer on fake or MNIST data."""
world_size = ezpz.distributed.get_world_size()
local_rank = ezpz.distributed.get_local_rank()
device_type = ezpz.distributed.get_torch_device_type()
device = torch.device(f"{device_type}:{local_rank}")
logger.info("train_args=%s", vars(args))
if dataset == "fake":
dataset_dict = get_fake_data(
img_size=args.img_size,
batch_size=args.batch_size,
num_workers=args.num_workers,
)
elif dataset == "mnist":
dataset_dict = get_mnist(
train_batch_size=args.batch_size,
test_batch_size=args.batch_size,
download=(ezpz.distributed.get_rank() == 0),
)
else:
raise ValueError(
f"Unknown dataset: {dataset}. Expected 'fake' or 'mnist'."
)
train_fn -- Model Creation, Wrapping, and Compilation
The SimpleVisionTransformer is instantiated, moved to the target device,
then wrapped with DDP or FSDP via ezpz.distributed.wrap_model when
running multi-GPU. Optional torch.compile follows.
in_chans = 1 if dataset == "mnist" else 3
model = SimpleVisionTransformer(
img_size=args.img_size,
patch_size=args.patch_size,
in_chans=in_chans,
embed_dim=(args.num_heads * args.head_dim),
depth=args.depth,
num_heads=args.num_heads,
num_classes=args.num_classes,
class_token=False,
global_pool="avg",
block_fn=block_fn,
dropout=args.dropout,
)
train_fn -- Training Loop
The loop iterates over the training dataloader, with a configurable warmup
period. Each step runs a forward pass under torch.autocast, computes
cross-entropy loss and accuracy, then backpropagates. Timing is recorded
for data-transfer, forward, backward, and optimizer phases.
torch_dtype = ezpz.distributed.TORCH_DTYPES_MAP[args.dtype]
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(model.parameters()) # type:ignore
model.train() # type:ignore
for step, batch in enumerate(dataset_dict["train"]["loader"]):
last_step = step
if args.max_iters is not None and step > int(args.max_iters):
break
if step < warmup_iters:
logger.info("warmup step %d / %d", step, warmup_iters)
t0 = time.perf_counter()
inputs = batch[0].to(device=device, non_blocking=True)
label = batch[1].to(device=device, non_blocking=True)
ezpz.distributed.synchronize()
with torch.autocast(device_type=device_type, dtype=torch_dtype):
t1 = time.perf_counter()
outputs = model(inputs)
loss = criterion(outputs, label)
acc = (outputs.argmax(dim=-1) == label).float().mean()
t2 = time.perf_counter()
ezpz.distributed.synchronize()
optimizer.zero_grad(set_to_none=True)
loss.backward()
ezpz.distributed.synchronize()
t3 = time.perf_counter()
optimizer.step()
ezpz.distributed.synchronize()
t4 = time.perf_counter()
train_fn -- Metrics and Evaluation
After warmup, each step's loss, accuracy, and timing breakdown are logged
to an ezpz.History object. After training, if a test split exists, the
model is evaluated and per-batch eval metrics are accumulated and
all-reduced across ranks.
if step >= warmup_iters:
loss_value = float(loss.detach().item())
acc_value = float(acc.detach().item())
if not math.isfinite(loss_value) or not math.isfinite(acc_value):
logger.warning(
"Skipping non-finite train metrics at step=%s", step
)
continue
train_msg = history.update(
{
"train/iter": step,
"train/loss": loss_value,
"train/acc": acc_value,
"train/dt": t4 - t0,
"train/dtd": t1 - t0,
"train/dtf": t2 - t1,
"train/dto": t3 - t2,
"train/dtb": t4 - t3,
}
).replace("train/", "")
logger.info("[train] %s", train_msg)
if "test" in dataset_dict:
model.eval() # type:ignore
eval_loss = 0.0
eval_acc = 0.0
eval_count = 0
eval_step = 0
with torch.no_grad():
for batch in dataset_dict["test"]["loader"]:
inputs = batch[0].to(device=device, non_blocking=True)
labels = batch[1].to(device=device, non_blocking=True)
with torch.autocast(device_type=device_type, dtype=torch_dtype):
outputs = model(inputs)
loss = criterion(outputs, labels)
correct = (outputs.argmax(dim=-1) == labels).sum()
batch_size = labels.numel()
eval_loss += loss.item() * batch_size
eval_acc += correct.item()
eval_count += batch_size
On rank 0, history.finalize() persists training and eval metrics to disk
and optionally uploads them to W&B.
main -- Entrypoint
main calls ezpz.distributed.setup_torch() to initialize the process
group, optionally sets up W&B, configures the attention function (native
manual attention or torch.nn.functional.scaled_dot_product_attention),
and delegates to train_fn.
@ezpz.timeitlogit(rank=ezpz.get_rank())
def main(args: argparse.Namespace):
"""CLI entrypoint to configure logging and launch ViT training."""
t0 = time.perf_counter()
rank = ezpz.distributed.setup_torch()
t_setup = time.perf_counter()
if rank == 0 and ezpz.verify_wandb():
try:
fp = Path(__file__).resolve()
run = ezpz.setup_wandb(
project_name=f"ezpz.{fp.parent.name}.{fp.stem}"
)
if wandb is not None and run is not None and run is wandb.run:
wandb.config.update(ezpz.get_dist_info())
wandb.config.update({"args": {**vars(args)}})
except Exception:
logger.warning("Failed to setup wandb, continuing without!")
The attention function closure is defined inline and injected into
AttentionBlock via functools.partial. When --attn_type sdpa is used,
CUDA SDPA backends are selectively enabled.
def attn_fn(
q: torch.Tensor, k: torch.Tensor, v: torch.Tensor
) -> torch.Tensor:
"""Scaled dot-product attention with configurable backend."""
scale = args.head_dim ** (-0.5)
q = q * scale
attn = q @ k.transpose(-2, -1)
attn = attn.softmax(dim=-1)
if args.attention_dropout > 0.0:
attn = torch.nn.functional.dropout(
attn,
p=args.attention_dropout,
training=torch.is_grad_enabled(),
)
x = attn @ v
return x
logger.info(f"Using {args.attn_type} for SDPA backend")
if args.attn_type == "native":
block_fn = functools.partial(AttentionBlock, attn_fn=attn_fn)
Finally, train_fn is called and wall-clock timings for setup and training
are logged (and sent to W&B if available).
train_start = time.perf_counter()
train_fn(block_fn, args=args, dataset=args.dataset)
train_end = time.perf_counter()
t1 = time.perf_counter()
timings = {
"main/setup_torch": t_setup - t0,
"main/train": train_end - train_start,
"main/total": t1 - t0,
}
logger.info("Timings: %s", timings)
Script Entry Point
When run as a module (python -m ezpz.examples.vit), arguments are parsed
and main is called.
HelpβοΈ
--help
$ python3 -m ezpz.examples.vit --help
usage: ezpz.examples.vit [-h] [--img_size IMG_SIZE] [--batch_size BATCH_SIZE]
[--num_heads NUM_HEADS] [--head_dim HEAD_DIM]
[--hidden-dim HIDDEN_DIM] [--mlp-dim MLP_DIM]
[--dropout DROPOUT]
[--attention-dropout ATTENTION_DROPOUT]
[--num_classes NUM_CLASSES] [--dataset {fake,mnist}]
[--depth DEPTH] [--patch_size PATCH_SIZE]
[--dtype DTYPE] [--compile]
[--num_workers NUM_WORKERS] [--max_iters MAX_ITERS]
[--warmup WARMUP] [--attn_type {native,sdpa}]
[--cuda_sdpa_backend {flash_sdp,mem_efficient_sdp,math_sdp,cudnn_sdp,all}]
[--fsdp]
Train a simple ViT
options:
-h, --help show this help message and exit
--img_size IMG_SIZE, --img-size IMG_SIZE
Image size
--batch_size BATCH_SIZE, --batch-size BATCH_SIZE
Batch size
--num_heads NUM_HEADS, --num-heads NUM_HEADS
Number of heads
--head_dim HEAD_DIM, --head-dim HEAD_DIM
Hidden Dimension
--hidden-dim HIDDEN_DIM, --hidden_dim HIDDEN_DIM
Hidden Dimension
--mlp-dim MLP_DIM, --mlp_dim MLP_DIM
MLP Dimension
--dropout DROPOUT Dropout rate
--attention-dropout ATTENTION_DROPOUT, --attention_dropout ATTENTION_DROPOUT
Attention Dropout rate
--num_classes NUM_CLASSES, --num-classes NUM_CLASSES
Number of classes
--dataset {fake,mnist}
Dataset to use
--depth DEPTH Depth
--patch_size PATCH_SIZE, --patch-size PATCH_SIZE
Patch size
--dtype DTYPE Data type
--compile Compile model
--num_workers NUM_WORKERS, --num-workers NUM_WORKERS
Number of workers
--max_iters MAX_ITERS, --max-iters MAX_ITERS
Maximum iterations
--warmup WARMUP Warmup iterations (or fraction) before starting to
collect metrics.
--attn_type {native,sdpa}, --attn-type {native,sdpa}
Attention function to use.
--cuda_sdpa_backend {flash_sdp,mem_efficient_sdp,math_sdp,cudnn_sdp,all}, --cuda-sdpa-backend {flash_sdp,mem_efficient_sdp,math_sdp,cudnn_sdp,all}
CUDA SDPA backend to use.
--fsdp Use FSDP
OutputβοΈ
Output on Sunspot
$ ezpz launch python3 -m ezpz.examples.vit
[2025-12-31 12:13:01,324304][I][ezpz/launch:396:launch] ----[π ezpz.launch][started][2025-12-31-121301]----
[2025-12-31 12:13:02,176169][I][ezpz/launch:416:launch] Job ID: 12458339
[2025-12-31 12:13:02,176953][I][ezpz/launch:417:launch] nodelist: ['x1921c0s3b0n0', 'x1921c0s7b0n0']
[2025-12-31 12:13:02,177350][I][ezpz/launch:418:launch] hostfile: /var/spool/pbs/aux/12458339.sunspot-pbs-0001.head.cm.sunspot.alcf.anl.gov
[2025-12-31 12:13:02,178010][I][ezpz/pbs:264:get_pbs_launch_cmd] β
Using [24/24] GPUs [2 hosts] x [12 GPU/host]
[2025-12-31 12:13:02,178699][I][ezpz/launch:367:build_executable] Building command to execute by piecing together:
[2025-12-31 12:13:02,179082][I][ezpz/launch:368:build_executable] (1.) launch_cmd: mpiexec --envall --np=24 --ppn=12 --hostfile=/var/spool/pbs/aux/12458339.sunspot-pbs-0001.head.cm.sunspot.alcf.anl.gov --no-vni --cpu-bind=verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96
[2025-12-31 12:13:02,179891][I][ezpz/launch:369:build_executable] (2.) cmd_to_launch: python3 -m ezpz.examples.vit
[2025-12-31 12:13:02,180622][I][ezpz/launch:433:launch] Took: 1.46 seconds to build command.
[2025-12-31 12:13:02,180965][I][ezpz/launch:436:launch] Executing:
mpiexec
--envall
--np=24
--ppn=12
--hostfile=/var/spool/pbs/aux/12458339.sunspot-pbs-0001.head.cm.sunspot.alcf.anl.gov
--no-vni
--cpu-bind=verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96
python3
-m
ezpz.examples.vit
[2025-12-31 12:13:02,182157][I][ezpz/launch:443:launch] Execution started @ 2025-12-31-121302...
[2025-12-31 12:13:02,182600][I][ezpz/launch:139:run_command] Running command:
mpiexec --envall --np=24 --ppn=12 --hostfile=/var/spool/pbs/aux/12458339.sunspot-pbs-0001.head.cm.sunspot.alcf.anl.gov --no-vni --cpu-bind=verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96 python3 -m ezpz.examples.vit
cpubind:list x1921c0s7b0n0 pid 108722 rank 12 0: mask 0x1c
cpubind:list x1921c0s7b0n0 pid 108723 rank 13 1: mask 0x1c00
cpubind:list x1921c0s7b0n0 pid 108724 rank 14 2: mask 0x1c0000
cpubind:list x1921c0s7b0n0 pid 108725 rank 15 3: mask 0x1c000000
cpubind:list x1921c0s7b0n0 pid 108726 rank 16 4: mask 0x1c00000000
cpubind:list x1921c0s7b0n0 pid 108727 rank 17 5: mask 0x1c0000000000
cpubind:list x1921c0s7b0n0 pid 108728 rank 18 6: mask 0x1c0000000000000
cpubind:list x1921c0s7b0n0 pid 108729 rank 19 7: mask 0x1c000000000000000
cpubind:list x1921c0s7b0n0 pid 108730 rank 20 8: mask 0x1c00000000000000000
cpubind:list x1921c0s7b0n0 pid 108731 rank 21 9: mask 0x1c0000000000000000000
cpubind:list x1921c0s7b0n0 pid 108732 rank 22 10: mask 0x1c000000000000000000000
cpubind:list x1921c0s7b0n0 pid 108733 rank 23 11: mask 0x1c00000000000000000000000
cpubind:list x1921c0s3b0n0 pid 105486 rank 0 0: mask 0x1c
cpubind:list x1921c0s3b0n0 pid 105487 rank 1 1: mask 0x1c00
cpubind:list x1921c0s3b0n0 pid 105488 rank 2 2: mask 0x1c0000
cpubind:list x1921c0s3b0n0 pid 105489 rank 3 3: mask 0x1c000000
cpubind:list x1921c0s3b0n0 pid 105490 rank 4 4: mask 0x1c00000000
cpubind:list x1921c0s3b0n0 pid 105491 rank 5 5: mask 0x1c0000000000
cpubind:list x1921c0s3b0n0 pid 105492 rank 6 6: mask 0x1c0000000000000
cpubind:list x1921c0s3b0n0 pid 105493 rank 7 7: mask 0x1c000000000000000
cpubind:list x1921c0s3b0n0 pid 105494 rank 8 8: mask 0x1c00000000000000000
cpubind:list x1921c0s3b0n0 pid 105495 rank 9 9: mask 0x1c0000000000000000000
cpubind:list x1921c0s3b0n0 pid 105496 rank 10 10: mask 0x1c000000000000000000000
cpubind:list x1921c0s3b0n0 pid 105497 rank 11 11: mask 0x1c00000000000000000000000
[2025-12-31 12:13:08,706913][I][ezpz/dist:1501:setup_torch_distributed] Using torch_{device,backend}= {xpu, xccl}
[2025-12-31 12:13:08,709436][I][ezpz/dist:1366:setup_torch_DDP] Caught MASTER_PORT=45161 from environment!
[2025-12-31 12:13:08,710117][I][ezpz/dist:1382:setup_torch_DDP] Using torch.distributed.init_process_group with
- master_addr='x1921c0s3b0n0'
- master_port='45161'
- world_size=24
- rank=0
- local_rank=0
- timeout=datetime.timedelta(seconds=3600)
- backend='xccl'
[2025-12-31 12:13:08,711400][I][ezpz/dist:1014:init_process_group] Calling torch.distributed.init_process_group_with: rank=0 world_size=24 backend=xccl
[2025-12-31 12:13:09,470261][I][ezpz/dist:1727:setup_torch] Using device='xpu' with backend='xccl' + 'xccl' for distributed training.
[2025-12-31 12:13:09,471063][W][ezpz/dist:544:print_dist_setup] Using [24 / 24] available "xpu" devices !!
[2025-12-31 12:13:09,471499][I][ezpz/dist:1774:setup_torch] ['x1921c0s3b0n0'][device='xpu'][node=0/1][rank=00/23][local_rank=00/11]
[2025-12-31 12:13:09,470671][I][ezpz/dist:1774:setup_torch] ['x1921c0s3b0n0'][device='xpu'][node=0/1][rank=10/23][local_rank=10/11]
[2025-12-31 12:13:09,470709][I][ezpz/dist:1774:setup_torch] ['x1921c0s3b0n0'][device='xpu'][node=1/1][rank=01/23][local_rank=01/11]
[2025-12-31 12:13:09,470724][I][ezpz/dist:1774:setup_torch] ['x1921c0s3b0n0'][device='xpu'][node=0/1][rank=02/23][local_rank=02/11]
[2025-12-31 12:13:09,470717][I][ezpz/dist:1774:setup_torch] ['x1921c0s3b0n0'][device='xpu'][node=1/1][rank=03/23][local_rank=03/11]
[2025-12-31 12:13:09,470725][I][ezpz/dist:1774:setup_torch] ['x1921c0s3b0n0'][device='xpu'][node=0/1][rank=04/23][local_rank=04/11]
[2025-12-31 12:13:09,470729][I][ezpz/dist:1774:setup_torch] ['x1921c0s3b0n0'][device='xpu'][node=1/1][rank=05/23][local_rank=05/11]
[2025-12-31 12:13:09,470727][I][ezpz/dist:1774:setup_torch] ['x1921c0s3b0n0'][device='xpu'][node=0/1][rank=06/23][local_rank=06/11]
[2025-12-31 12:13:09,470702][I][ezpz/dist:1774:setup_torch] ['x1921c0s3b0n0'][device='xpu'][node=1/1][rank=07/23][local_rank=07/11]
[2025-12-31 12:13:09,470697][I][ezpz/dist:1774:setup_torch] ['x1921c0s3b0n0'][device='xpu'][node=0/1][rank=08/23][local_rank=08/11]
[2025-12-31 12:13:09,470703][I][ezpz/dist:1774:setup_torch] ['x1921c0s3b0n0'][device='xpu'][node=1/1][rank=09/23][local_rank=09/11]
[2025-12-31 12:13:09,470729][I][ezpz/dist:1774:setup_torch] ['x1921c0s3b0n0'][device='xpu'][node=1/1][rank=11/23][local_rank=11/11]
[2025-12-31 12:13:09,474499][I][ezpz/dist:2039:setup_wandb] Setting up wandb from rank=0
[2025-12-31 12:13:09,474926][I][ezpz/dist:2040:setup_wandb] Using WB_PROJECT=ezpz.examples.vit
[2025-12-31 12:13:09,470772][I][ezpz/dist:1774:setup_torch] ['x1921c0s7b0n0'][device='xpu'][node=0/1][rank=12/23][local_rank=00/11]
[2025-12-31 12:13:09,470811][I][ezpz/dist:1774:setup_torch] ['x1921c0s7b0n0'][device='xpu'][node=1/1][rank=13/23][local_rank=01/11]
[2025-12-31 12:13:09,470827][I][ezpz/dist:1774:setup_torch] ['x1921c0s7b0n0'][device='xpu'][node=0/1][rank=14/23][local_rank=02/11]
[2025-12-31 12:13:09,470866][I][ezpz/dist:1774:setup_torch] ['x1921c0s7b0n0'][device='xpu'][node=1/1][rank=15/23][local_rank=03/11]
[2025-12-31 12:13:09,470869][I][ezpz/dist:1774:setup_torch] ['x1921c0s7b0n0'][device='xpu'][node=0/1][rank=16/23][local_rank=04/11]
[2025-12-31 12:13:09,470813][I][ezpz/dist:1774:setup_torch] ['x1921c0s7b0n0'][device='xpu'][node=1/1][rank=17/23][local_rank=05/11]
[2025-12-31 12:13:09,470869][I][ezpz/dist:1774:setup_torch] ['x1921c0s7b0n0'][device='xpu'][node=0/1][rank=18/23][local_rank=06/11]
[2025-12-31 12:13:09,470871][I][ezpz/dist:1774:setup_torch] ['x1921c0s7b0n0'][device='xpu'][node=1/1][rank=19/23][local_rank=07/11]
[2025-12-31 12:13:09,470827][I][ezpz/dist:1774:setup_torch] ['x1921c0s7b0n0'][device='xpu'][node=0/1][rank=20/23][local_rank=08/11]
[2025-12-31 12:13:09,470825][I][ezpz/dist:1774:setup_torch] ['x1921c0s7b0n0'][device='xpu'][node=1/1][rank=21/23][local_rank=09/11]
[2025-12-31 12:13:09,470874][I][ezpz/dist:1774:setup_torch] ['x1921c0s7b0n0'][device='xpu'][node=0/1][rank=22/23][local_rank=10/11]
[2025-12-31 12:13:09,470870][I][ezpz/dist:1774:setup_torch] ['x1921c0s7b0n0'][device='xpu'][node=1/1][rank=23/23][local_rank=11/11]
wandb: Currently logged in as: foremans (aurora_gpt) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.23.1
wandb: Run data is saved locally in /lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/wandb/run-20251231_121309-g19jy6bl
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run snowy-hill-239
wandb: View project at https://wandb.ai/aurora_gpt/ezpz.examples.vit
wandb: View run at https://wandb.ai/aurora_gpt/ezpz.examples.vit/runs/g19jy6bl
[2025-12-31 12:13:10,974322][I][ezpz/dist:2069:setup_wandb] wandb.run=[snowy-hill-239](https://wandb.ai/aurora_gpt/ezpz.examples.vit/runs/g19jy6bl)
[2025-12-31 12:13:10,980450][I][ezpz/dist:2112:setup_wandb] Running on machine='SunSpot'
[2025-12-31 12:13:10,983391][I][examples/vit:509:main] Using native for SDPA backend
[2025-12-31 12:13:10,984013][I][examples/vit:535:main] Using AttentionBlock Attention with args.compile=False
[2025-12-31 12:13:10,984652][I][examples/vit:287:train_fn] asdict(config)={'img_size': 224, 'batch_size': 128, 'num_heads': 16, 'head_dim': 64, 'depth': 24, 'patch_size': 16, 'hidden_dim': 1024, 'mlp_dim': 4096, 'dropout': 0.0, 'attention_dropout': 0.0, 'num_classes': 1000}
[2025-12-31 12:14:34,029080][I][examples/vit:354:train_fn]
==========================================================================================
Layer (type:depth-idx) Output Shape Param #
==========================================================================================
VisionTransformer [128, 1000] 200,704
ββPatchEmbed: 1-1 [128, 196, 1024] 787,456
ββDropout: 1-2 [128, 196, 1024] --
ββIdentity: 1-3 [128, 196, 1024] --
ββIdentity: 1-4 [128, 196, 1024] --
ββSequential: 1-5 [128, 196, 1024] 201,547,776
ββIdentity: 1-6 [128, 196, 1024] --
ββLayerNorm: 1-7 [128, 1024] 2,048
ββDropout: 1-8 [128, 1024] --
ββLinear: 1-9 [128, 1000] 1,025,000
==========================================================================================
Total params: 203,562,984
Trainable params: 203,562,984
Non-trainable params: 0
Total mult-adds (G): 45.69
==========================================================================================
Input size (MB): 77.07
Forward/backward pass size (MB): 49532.61
Params size (MB): 813.45
Estimated Total Size (MB): 50423.13
==========================================================================================
[2025-12-31 12:14:34,032988][I][examples/vit:355:train_fn] Model size: nparams=0.91 B
[2025-12-31 12:14:34,038818][I][ezpz/dist:685:wrap_model] Wrapping model with: ddp
2025:12:31-12:14:34:(105486) |CCL_WARN| value of CCL_OP_SYNC changed to be 1 (default:0)
2025:12:31-12:14:34:(105486) |CCL_WARN| value of CCL_PROCESS_LAUNCHER changed to be pmix (default:hydra)
[2025-12-31 12:14:47,099101][I][ezpz/dist:685:wrap_model] Wrapping model with: ddp
[2025-12-31 12:14:47,296113][I][ezpz/history:220:__init__] Using History with distributed_history=True
[2025-12-31 12:14:47,312293][I][examples/vit:408:train_fn] Training with 24 x xpu (s), using torch_dtype=torch.bfloat16
[2025-12-31 12:15:32,154227][I][examples/vit:445:train_fn] iter=10 loss=7.111572 dt=0.744444 dtd=0.003141 dtf=0.021592 dto=0.698994 dtb=0.020717 loss/mean=7.036184 loss/max=7.148438 loss/min=6.926270 loss/std=0.051675 dt/mean=0.744634 dt/max=0.745104 dt/min=0.744115 dt/std=0.000000 dtd/mean=0.003989 dtd/max=0.005223 dtd/min=0.003121 dtd/std=0.000751 dtf/mean=0.021139 dtf/max=0.021592 dtf/min=0.020827 dtf/std=0.000188 dto/mean=0.698727 dto/max=0.699881 dto/min=0.697217 dto/std=0.000961 dtb/mean=0.020778 dtb/max=0.021557 dtb/min=0.020275 dtb/std=0.000320
[2025-12-31 12:15:33,015905][I][examples/vit:445:train_fn] iter=11 loss=7.032715 dt=0.702058 dtd=0.001719 dtf=0.022411 dto=0.657195 dtb=0.020732 loss/mean=7.011394 loss/max=7.086426 loss/min=6.919678 loss/std=0.041570 dt/mean=0.735905 dt/max=0.771187 dt/min=0.697317 dt/std=0.022530 dtd/mean=0.001865 dtd/max=0.002308 dtd/min=0.001698 dtd/std=0.000162 dtf/mean=0.021999 dtf/max=0.023057 dtf/min=0.021225 dtf/std=0.000596 dto/mean=0.691333 dto/max=0.726180 dto/min=0.652583 dto/std=0.022813 dtb/mean=0.020708 dtb/max=0.021126 dtb/min=0.020190 dtb/std=0.000228
[2025-12-31 12:15:33,747367][I][examples/vit:445:train_fn] iter=12 loss=6.982422 dt=0.717778 dtd=0.001721 dtf=0.023286 dto=0.671980 dtb=0.020791 loss/mean=7.006734 loss/max=7.068848 loss/min=6.895752 loss/std=0.042117 dt/mean=0.724329 dt/max=0.732337 dt/min=0.713690 dt/std=0.004574 dtd/mean=0.003245 dtd/max=0.005390 dtd/min=0.001609 dtd/std=0.001376 dtf/mean=0.023302 dtf/max=0.024685 dtf/min=0.021369 dtf/std=0.000678 dto/mean=0.677032 dto/max=0.684881 dto/min=0.669961 dto/std=0.003860 dtb/mean=0.020750 dtb/max=0.021848 dtb/min=0.020185 dtb/std=0.000450
[2025-12-31 12:15:34,556097][I][examples/vit:445:train_fn] iter=13 loss=7.114746 dt=0.742885 dtd=0.001825 dtf=0.023194 dto=0.697149 dtb=0.020716 loss/mean=7.035777 loss/max=7.121094 loss/min=6.938965 loss/std=0.043673 dt/mean=0.743002 dt/max=0.746045 dt/min=0.739701 dt/std=0.002211 dtd/mean=0.004097 dtd/max=0.005476 dtd/min=0.001825 dtd/std=0.001211 dtf/mean=0.022114 dtf/max=0.023244 dtf/min=0.021464 dtf/std=0.000590 dto/mean=0.696058 dto/max=0.699107 dto/min=0.692656 dto/std=0.002238 dtb/mean=0.020733 dtb/max=0.021334 dtb/min=0.020149 dtb/std=0.000345
[2025-12-31 12:15:35,310688][I][examples/vit:445:train_fn] iter=14 loss=7.011475 dt=0.720220 dtd=0.001705 dtf=0.022270 dto=0.675471 dtb=0.020774 loss/mean=7.039348 loss/max=7.114502 loss/min=6.937744 loss/std=0.041890 dt/mean=0.750122 dt/max=0.777821 dt/min=0.720220 dt/std=0.018705 dtd/mean=0.001850 dtd/max=0.002233 dtd/min=0.001625 dtd/std=0.000194 dtf/mean=0.022314 dtf/max=0.024492 dtf/min=0.021074 dtf/std=0.000813 dto/mean=0.705263 dto/max=0.730777 dto/min=0.675471 dto/std=0.018371 dtb/mean=0.020695 dtb/max=0.021149 dtb/min=0.020171 dtb/std=0.000279
[2025-12-31 12:15:36,066943][I][examples/vit:445:train_fn] iter=15 loss=7.011230 dt=0.735597 dtd=0.001686 dtf=0.022349 dto=0.690830 dtb=0.020732 loss/mean=7.028381 loss/max=7.112061 loss/min=6.949463 loss/std=0.039451 dt/mean=0.751743 dt/max=0.761701 dt/min=0.735432 dt/std=0.008930 dtd/mean=0.002519 dtd/max=0.004533 dtd/min=0.001567 dtd/std=0.000977 dtf/mean=0.022890 dtf/max=0.024537 dtf/min=0.021450 dtf/std=0.000851 dto/mean=0.705617 dto/max=0.715304 dto/min=0.690162 dto/std=0.008198 dtb/mean=0.020716 dtb/max=0.021369 dtb/min=0.020218 dtb/std=0.000304
[2025-12-31 12:15:36,829552][I][examples/vit:445:train_fn] iter=16 loss=7.066895 dt=0.728091 dtd=0.001674 dtf=0.023043 dto=0.682651 dtb=0.020723 loss/mean=7.036306 loss/max=7.118652 loss/min=6.950928 loss/std=0.040736 dt/mean=0.735283 dt/max=0.739360 dt/min=0.728091 dt/std=0.003266 dtd/mean=0.003385 dtd/max=0.005231 dtd/min=0.001674 dtd/std=0.001218 dtf/mean=0.022158 dtf/max=0.023252 dtf/min=0.021710 dtf/std=0.000435 dto/mean=0.688973 dto/max=0.694191 dto/min=0.682651 dto/std=0.003334 dtb/mean=0.020767 dtb/max=0.021937 dtb/min=0.020278 dtb/std=0.000388
[2025-12-31 12:15:37,624977][I][examples/vit:445:train_fn] iter=17 loss=7.056885 dt=0.739866 dtd=0.001641 dtf=0.022526 dto=0.694913 dtb=0.020786 loss/mean=7.031006 loss/max=7.094238 loss/min=6.946289 loss/std=0.042612 dt/mean=0.747859 dt/max=0.765443 dt/min=0.731697 dt/std=0.012958 dtd/mean=0.002078 dtd/max=0.003025 dtd/min=0.001545 dtd/std=0.000425 dtf/mean=0.022239 dtf/max=0.023510 dtf/min=0.021278 dtf/std=0.000711 dto/mean=0.702795 dto/max=0.719565 dto/min=0.686761 dto/std=0.012629 dtb/mean=0.020747 dtb/max=0.021336 dtb/min=0.020204 dtb/std=0.000284
[2025-12-31 12:15:38,414813][I][examples/vit:445:train_fn] iter=18 loss=7.021973 dt=0.730126 dtd=0.001684 dtf=0.022472 dto=0.685149 dtb=0.020821 loss/mean=7.009206 loss/max=7.126221 loss/min=6.934570 loss/std=0.041062 dt/mean=0.757921 dt/max=0.776361 dt/min=0.729369 dt/std=0.015862 dtd/mean=0.002343 dtd/max=0.004546 dtd/min=0.001629 dtd/std=0.000985 dtf/mean=0.022413 dtf/max=0.023448 dtf/min=0.021651 dtf/std=0.000472 dto/mean=0.712472 dto/max=0.730281 dto/min=0.685116 dto/std=0.015107 dtb/mean=0.020693 dtb/max=0.021261 dtb/min=0.020149 dtb/std=0.000297
[2025-12-31 12:15:39,209452][I][examples/vit:445:train_fn] iter=19 loss=7.029053 dt=0.733423 dtd=0.001668 dtf=0.022388 dto=0.688487 dtb=0.020880 loss/mean=7.010173 loss/max=7.071777 loss/min=6.963379 loss/std=0.031614 dt/mean=0.757083 dt/max=0.784014 dt/min=0.728249 dt/std=0.018796 dtd/mean=0.001842 dtd/max=0.002232 dtd/min=0.001557 dtd/std=0.000210 dtf/mean=0.022281 dtf/max=0.023509 dtf/min=0.021485 dtf/std=0.000466 dto/mean=0.712357 dto/max=0.738712 dto/min=0.684110 dto/std=0.018800 dtb/mean=0.020603 dtb/max=0.021067 dtb/min=0.020249 dtb/std=0.000253
[2025-12-31 12:15:39,959071][I][examples/vit:445:train_fn] iter=20 loss=6.989258 dt=0.729999 dtd=0.001656 dtf=0.023220 dto=0.684263 dtb=0.020860 loss/mean=6.997142 loss/max=7.051025 loss/min=6.924805 loss/std=0.037109 dt/mean=0.736245 dt/max=0.744330 dt/min=0.723712 dt/std=0.006234 dtd/mean=0.002480 dtd/max=0.003943 dtd/min=0.001558 dtd/std=0.000659 dtf/mean=0.023104 dtf/max=0.024808 dtf/min=0.021568 dtf/std=0.000875 dto/mean=0.689984 dto/max=0.697561 dto/min=0.679231 dto/std=0.005511 dtb/mean=0.020676 dtb/max=0.021082 dtb/min=0.020291 dtb/std=0.000217
[2025-12-31 12:15:40,714336][I][examples/vit:445:train_fn] iter=21 loss=6.949463 dt=0.738463 dtd=0.001676 dtf=0.022789 dto=0.693238 dtb=0.020760 loss/mean=7.004436 loss/max=7.096680 loss/min=6.928467 loss/std=0.038073 dt/mean=0.744181 dt/max=0.747653 dt/min=0.738463 dt/std=0.002405 dtd/mean=0.003571 dtd/max=0.005418 dtd/min=0.001676 dtd/std=0.001278 dtf/mean=0.022411 dtf/max=0.023831 dtf/min=0.021547 dtf/std=0.000525 dto/mean=0.697513 dto/max=0.700923 dto/min=0.693238 dto/std=0.001938 dtb/mean=0.020686 dtb/max=0.021187 dtb/min=0.020283 dtb/std=0.000298
[2025-12-31 12:15:41,522736][I][examples/vit:445:train_fn] iter=22 loss=7.014893 dt=0.755023 dtd=0.001776 dtf=0.025431 dto=0.706979 dtb=0.020837 loss/mean=7.004486 loss/max=7.110352 loss/min=6.913818 loss/std=0.038965 dt/mean=0.748414 dt/max=0.766116 dt/min=0.732838 dt/std=0.012672 dtd/mean=0.002078 dtd/max=0.003655 dtd/min=0.001651 dtd/std=0.000579 dtf/mean=0.022676 dtf/max=0.025431 dtf/min=0.021047 dtf/std=0.001224 dto/mean=0.702984 dto/max=0.719493 dto/min=0.687241 dto/std=0.012053 dtb/mean=0.020675 dtb/max=0.021195 dtb/min=0.020129 dtb/std=0.000270
[2025-12-31 12:15:42,268381][I][examples/vit:445:train_fn] iter=23 loss=7.009033 dt=0.732504 dtd=0.001736 dtf=0.022508 dto=0.687491 dtb=0.020769 loss/mean=7.007579 loss/max=7.079834 loss/min=6.947510 loss/std=0.039306 dt/mean=0.757627 dt/max=0.778756 dt/min=0.731312 dt/std=0.016388 dtd/mean=0.002094 dtd/max=0.003084 dtd/min=0.001562 dtd/std=0.000526 dtf/mean=0.022488 dtf/max=0.023932 dtf/min=0.021376 dtf/std=0.000707 dto/mean=0.712397 dto/max=0.732798 dto/min=0.685533 dto/std=0.016056 dtb/mean=0.020648 dtb/max=0.021215 dtb/min=0.020275 dtb/std=0.000233
[2025-12-31 12:15:43,132486][I][examples/vit:445:train_fn] iter=24 loss=7.014648 dt=0.791617 dtd=0.001653 dtf=0.025728 dto=0.743375 dtb=0.020861 loss/mean=7.015188 loss/max=7.098389 loss/min=6.912109 loss/std=0.045387 dt/mean=0.763487 dt/max=0.791617 dt/min=0.728050 dt/std=0.020098 dtd/mean=0.001822 dtd/max=0.002207 dtd/min=0.001626 dtd/std=0.000180 dtf/mean=0.022584 dtf/max=0.025728 dtf/min=0.021624 dtf/std=0.000843 dto/mean=0.718427 dto/max=0.746834 dto/min=0.683279 dto/std=0.019673 dtb/mean=0.020654 dtb/max=0.021240 dtb/min=0.020345 dtb/std=0.000230
[2025-12-31 12:15:43,923470][I][examples/vit:445:train_fn] iter=25 loss=6.988525 dt=0.730475 dtd=0.005105 dtf=0.022559 dto=0.682103 dtb=0.020708 loss/mean=6.990112 loss/max=7.058594 loss/min=6.914795 loss/std=0.035373 dt/mean=0.745873 dt/max=0.765539 dt/min=0.722147 dt/std=0.014330 dtd/mean=0.002260 dtd/max=0.005105 dtd/min=0.001588 dtd/std=0.000836 dtf/mean=0.023081 dtf/max=0.025434 dtf/min=0.021319 dtf/std=0.001245 dto/mean=0.699837 dto/max=0.717573 dto/min=0.677176 dto/std=0.013419 dtb/mean=0.020695 dtb/max=0.021258 dtb/min=0.020226 dtb/std=0.000287
[2025-12-31 12:15:44,671377][I][examples/vit:445:train_fn] iter=26 loss=6.978271 dt=0.728732 dtd=0.001722 dtf=0.027302 dto=0.678941 dtb=0.020767 loss/mean=7.000173 loss/max=7.060303 loss/min=6.930664 loss/std=0.038023 dt/mean=0.751072 dt/max=0.780739 dt/min=0.718407 dt/std=0.018831 dtd/mean=0.001792 dtd/max=0.002193 dtd/min=0.001632 dtd/std=0.000166 dtf/mean=0.022540 dtf/max=0.027302 dtf/min=0.021307 dtf/std=0.001100 dto/mean=0.706061 dto/max=0.735676 dto/min=0.673876 dto/std=0.019266 dtb/mean=0.020680 dtb/max=0.021233 dtb/min=0.020343 dtb/std=0.000255
[2025-12-31 12:15:45,426093][I][examples/vit:445:train_fn] iter=27 loss=6.985840 dt=0.735329 dtd=0.001691 dtf=0.022562 dto=0.690344 dtb=0.020733 loss/mean=6.988373 loss/max=7.067383 loss/min=6.901123 loss/std=0.034054 dt/mean=0.742626 dt/max=0.748739 dt/min=0.735329 dt/std=0.003427 dtd/mean=0.003301 dtd/max=0.005561 dtd/min=0.001586 dtd/std=0.001388 dtf/mean=0.022639 dtf/max=0.024640 dtf/min=0.021827 dtf/std=0.000775 dto/mean=0.695893 dto/max=0.701744 dto/min=0.690344 dto/std=0.003605 dtb/mean=0.020792 dtb/max=0.021600 dtb/min=0.020282 dtb/std=0.000347
[2025-12-31 12:15:46,187495][I][examples/vit:445:train_fn] iter=28 loss=6.958496 dt=0.743998 dtd=0.001662 dtf=0.022803 dto=0.698705 dtb=0.020828 loss/mean=7.000173 loss/max=7.078369 loss/min=6.931885 loss/std=0.038867 dt/mean=0.746971 dt/max=0.760301 dt/min=0.736719 dt/std=0.009229 dtd/mean=0.002349 dtd/max=0.003210 dtd/min=0.001617 dtd/std=0.000579 dtf/mean=0.022645 dtf/max=0.024866 dtf/min=0.021274 dtf/std=0.001089 dto/mean=0.701287 dto/max=0.713176 dto/min=0.691328 dto/std=0.008573 dtb/mean=0.020691 dtb/max=0.021143 dtb/min=0.020316 dtb/std=0.000245
[2025-12-31 12:15:46,977725][I][examples/vit:445:train_fn] iter=29 loss=6.984131 dt=0.770773 dtd=0.001760 dtf=0.024091 dto=0.724120 dtb=0.020802 loss/mean=6.992188 loss/max=7.054688 loss/min=6.922852 loss/std=0.032682 dt/mean=0.756948 dt/max=0.776455 dt/min=0.732494 dt/std=0.015026 dtd/mean=0.001839 dtd/max=0.002294 dtd/min=0.001539 dtd/std=0.000199 dtf/mean=0.022533 dtf/max=0.024091 dtf/min=0.021525 dtf/std=0.000700 dto/mean=0.711877 dto/max=0.731058 dto/min=0.687968 dto/std=0.014665 dtb/mean=0.020698 dtb/max=0.021198 dtb/min=0.020312 dtb/std=0.000234
[2025-12-31 12:15:47,731430][I][examples/vit:445:train_fn] iter=30 loss=6.968750 dt=0.728727 dtd=0.001725 dtf=0.022213 dto=0.683936 dtb=0.020853 loss/mean=6.986908 loss/max=7.042236 loss/min=6.934814 loss/std=0.030321 dt/mean=0.738177 dt/max=0.744469 dt/min=0.728727 dt/std=0.003020 dtd/mean=0.003073 dtd/max=0.004729 dtd/min=0.001595 dtd/std=0.000956 dtf/mean=0.022537 dtf/max=0.024266 dtf/min=0.021735 dtf/std=0.000633 dto/mean=0.691848 dto/max=0.697793 dto/min=0.683936 dto/std=0.002686 dtb/mean=0.020719 dtb/max=0.021254 dtb/min=0.020223 dtb/std=0.000282
[2025-12-31 12:15:48,481466][I][examples/vit:445:train_fn] iter=31 loss=7.029053 dt=0.736371 dtd=0.001709 dtf=0.022263 dto=0.691483 dtb=0.020916 loss/mean=6.982574 loss/max=7.067383 loss/min=6.832764 loss/std=0.049333 dt/mean=0.745070 dt/max=0.751480 dt/min=0.736371 dt/std=0.003513 dtd/mean=0.003256 dtd/max=0.005504 dtd/min=0.001579 dtd/std=0.001344 dtf/mean=0.022521 dtf/max=0.023703 dtf/min=0.021750 dtf/std=0.000556 dto/mean=0.698558 dto/max=0.705201 dto/min=0.691483 dto/std=0.003774 dtb/mean=0.020735 dtb/max=0.021304 dtb/min=0.020201 dtb/std=0.000310
[2025-12-31 12:15:49,251607][I][examples/vit:445:train_fn] iter=32 loss=7.045166 dt=0.749095 dtd=0.001659 dtf=0.023011 dto=0.703600 dtb=0.020825 loss/mean=7.009898 loss/max=7.066162 loss/min=6.954834 loss/std=0.026349 dt/mean=0.745355 dt/max=0.756215 dt/min=0.736894 dt/std=0.007296 dtd/mean=0.002465 dtd/max=0.004041 dtd/min=0.001549 dtd/std=0.000795 dtf/mean=0.022334 dtf/max=0.023693 dtf/min=0.021286 dtf/std=0.000763 dto/mean=0.699811 dto/max=0.710788 dto/min=0.691312 dto/std=0.007118 dtb/mean=0.020746 dtb/max=0.021371 dtb/min=0.020353 dtb/std=0.000265
[2025-12-31 12:15:49,997199][I][examples/vit:445:train_fn] iter=33 loss=7.020264 dt=0.727653 dtd=0.001750 dtf=0.022298 dto=0.682872 dtb=0.020734 loss/mean=6.983205 loss/max=7.062500 loss/min=6.895508 loss/std=0.035641 dt/mean=0.738254 dt/max=0.745029 dt/min=0.727653 dt/std=0.003339 dtd/mean=0.003172 dtd/max=0.004648 dtd/min=0.001657 dtd/std=0.000887 dtf/mean=0.022837 dtf/max=0.024299 dtf/min=0.022065 dtf/std=0.000576 dto/mean=0.691531 dto/max=0.697682 dto/min=0.682872 dto/std=0.002794 dtb/mean=0.020714 dtb/max=0.021391 dtb/min=0.020169 dtb/std=0.000318
[2025-12-31 12:15:50,759356][I][examples/vit:445:train_fn] iter=34 loss=6.914062 dt=0.741301 dtd=0.001698 dtf=0.024205 dto=0.694543 dtb=0.020855 loss/mean=6.980601 loss/max=7.046875 loss/min=6.911621 loss/std=0.036172 dt/mean=0.742607 dt/max=0.749077 dt/min=0.736232 dt/std=0.003409 dtd/mean=0.003402 dtd/max=0.005554 dtd/min=0.001676 dtd/std=0.001323 dtf/mean=0.022480 dtf/max=0.024390 dtf/min=0.021743 dtf/std=0.000859 dto/mean=0.695940 dto/max=0.702199 dto/min=0.691508 dto/std=0.003538 dtb/mean=0.020785 dtb/max=0.021545 dtb/min=0.020341 dtb/std=0.000307
[2025-12-31 12:15:51,512264][I][examples/vit:445:train_fn] iter=35 loss=6.983398 dt=0.734306 dtd=0.001671 dtf=0.022545 dto=0.689456 dtb=0.020635 loss/mean=6.997111 loss/max=7.062988 loss/min=6.936035 loss/std=0.028505 dt/mean=0.742718 dt/max=0.749095 dt/min=0.734306 dt/std=0.003453 dtd/mean=0.003206 dtd/max=0.005422 dtd/min=0.001573 dtd/std=0.001368 dtf/mean=0.022523 dtf/max=0.024025 dtf/min=0.021785 dtf/std=0.000639 dto/mean=0.696285 dto/max=0.702540 dto/min=0.689456 dto/std=0.003634 dtb/mean=0.020704 dtb/max=0.021270 dtb/min=0.020327 dtb/std=0.000209
[2025-12-31 12:15:52,257198][I][examples/vit:445:train_fn] iter=36 loss=6.991455 dt=0.735148 dtd=0.001708 dtf=0.022518 dto=0.690147 dtb=0.020774 loss/mean=6.995128 loss/max=7.067383 loss/min=6.942139 loss/std=0.032034 dt/mean=0.741631 dt/max=0.747561 dt/min=0.735148 dt/std=0.003155 dtd/mean=0.003153 dtd/max=0.005476 dtd/min=0.001570 dtd/std=0.001396 dtf/mean=0.022958 dtf/max=0.024346 dtf/min=0.021980 dtf/std=0.000610 dto/mean=0.694814 dto/max=0.700863 dto/min=0.690147 dto/std=0.003289 dtb/mean=0.020706 dtb/max=0.021249 dtb/min=0.020209 dtb/std=0.000304
[2025-12-31 12:15:53,082889][I][examples/vit:445:train_fn] iter=37 loss=7.010742 dt=0.754607 dtd=0.001597 dtf=0.024032 dto=0.708152 dtb=0.020825 loss/mean=6.982076 loss/max=7.057861 loss/min=6.918457 loss/std=0.034110 dt/mean=0.746617 dt/max=0.757576 dt/min=0.737717 dt/std=0.007612 dtd/mean=0.002758 dtd/max=0.005157 dtd/min=0.001536 dtd/std=0.001230 dtf/mean=0.022482 dtf/max=0.024220 dtf/min=0.021333 dtf/std=0.000915 dto/mean=0.700606 dto/max=0.711254 dto/min=0.690800 dto/std=0.007668 dtb/mean=0.020771 dtb/max=0.021395 dtb/min=0.020302 dtb/std=0.000313
[2025-12-31 12:15:53,822050][I][examples/vit:445:train_fn] iter=38 loss=6.975830 dt=0.725656 dtd=0.001658 dtf=0.025023 dto=0.678155 dtb=0.020820 loss/mean=6.994375 loss/max=7.053711 loss/min=6.947266 loss/std=0.024550 dt/mean=0.753547 dt/max=0.784940 dt/min=0.718371 dt/std=0.020758 dtd/mean=0.001801 dtd/max=0.002128 dtd/min=0.001635 dtd/std=0.000166 dtf/mean=0.022542 dtf/max=0.025023 dtf/min=0.021558 dtf/std=0.000725 dto/mean=0.708533 dto/max=0.739468 dto/min=0.674357 dto/std=0.020838 dtb/mean=0.020671 dtb/max=0.021199 dtb/min=0.020265 dtb/std=0.000210
[2025-12-31 12:15:54,657351][I][examples/vit:445:train_fn] iter=39 loss=6.992188 dt=0.764405 dtd=0.001675 dtf=0.025172 dto=0.716769 dtb=0.020789 loss/mean=6.970714 loss/max=7.027100 loss/min=6.914795 loss/std=0.034333 dt/mean=0.754408 dt/max=0.770382 dt/min=0.729280 dt/std=0.013428 dtd/mean=0.002184 dtd/max=0.003573 dtd/min=0.001631 dtd/std=0.000654 dtf/mean=0.022970 dtf/max=0.025172 dtf/min=0.021572 dtf/std=0.001008 dto/mean=0.708547 dto/max=0.723172 dto/min=0.684613 dto/std=0.012350 dtb/mean=0.020707 dtb/max=0.021262 dtb/min=0.020282 dtb/std=0.000285
[2025-12-31 12:15:55,440080][I][examples/vit:445:train_fn] iter=40 loss=7.038330 dt=0.733582 dtd=0.001704 dtf=0.024849 dto=0.686207 dtb=0.020821 loss/mean=6.987508 loss/max=7.041504 loss/min=6.928223 loss/std=0.026493 dt/mean=0.760830 dt/max=0.792707 dt/min=0.725636 dt/std=0.020782 dtd/mean=0.001795 dtd/max=0.002238 dtd/min=0.001635 dtd/std=0.000158 dtf/mean=0.022468 dtf/max=0.024849 dtf/min=0.021375 dtf/std=0.000751 dto/mean=0.715907 dto/max=0.747222 dto/min=0.682017 dto/std=0.020906 dtb/mean=0.020659 dtb/max=0.021324 dtb/min=0.020248 dtb/std=0.000278
[2025-12-31 12:15:56,186584][I][examples/vit:445:train_fn] iter=41 loss=6.945068 dt=0.727525 dtd=0.001735 dtf=0.022900 dto=0.682011 dtb=0.020879 loss/mean=6.969208 loss/max=7.058594 loss/min=6.916992 loss/std=0.032973 dt/mean=0.748452 dt/max=0.770154 dt/min=0.727149 dt/std=0.015412 dtd/mean=0.002192 dtd/max=0.003208 dtd/min=0.001573 dtd/std=0.000546 dtf/mean=0.022726 dtf/max=0.024180 dtf/min=0.021340 dtf/std=0.000864 dto/mean=0.702787 dto/max=0.723392 dto/min=0.681505 dto/std=0.014915 dtb/mean=0.020747 dtb/max=0.021484 dtb/min=0.020331 dtb/std=0.000298
[2025-12-31 12:15:57,002317][I][examples/vit:445:train_fn] iter=42 loss=7.038574 dt=0.776704 dtd=0.001668 dtf=0.022678 dto=0.731496 dtb=0.020862 loss/mean=6.976430 loss/max=7.039307 loss/min=6.919434 loss/std=0.034110 dt/mean=0.756732 dt/max=0.786670 dt/min=0.723782 dt/std=0.018553 dtd/mean=0.001802 dtd/max=0.002211 dtd/min=0.001626 dtd/std=0.000166 dtf/mean=0.022396 dtf/max=0.023093 dtf/min=0.021225 dtf/std=0.000533 dto/mean=0.711839 dto/max=0.741265 dto/min=0.678716 dto/std=0.018654 dtb/mean=0.020695 dtb/max=0.021359 dtb/min=0.020267 dtb/std=0.000279
[2025-12-31 12:15:57,815591][I][examples/vit:445:train_fn] iter=43 loss=7.035645 dt=0.758882 dtd=0.012152 dtf=0.022963 dto=0.702949 dtb=0.020818 loss/mean=6.980540 loss/max=7.037842 loss/min=6.929443 loss/std=0.032973 dt/mean=0.765730 dt/max=0.788675 dt/min=0.732798 dt/std=0.018196 dtd/mean=0.002373 dtd/max=0.012152 dtd/min=0.001639 dtd/std=0.002068 dtf/mean=0.022709 dtf/max=0.024687 dtf/min=0.021031 dtf/std=0.000978 dto/mean=0.719959 dto/max=0.741003 dto/min=0.687917 dto/std=0.017812 dtb/mean=0.020690 dtb/max=0.021201 dtb/min=0.020271 dtb/std=0.000228
[2025-12-31 12:15:58,555728][I][examples/vit:445:train_fn] iter=44 loss=6.953613 dt=0.719717 dtd=0.001672 dtf=0.022583 dto=0.674526 dtb=0.020936 loss/mean=6.987224 loss/max=7.059814 loss/min=6.906738 loss/std=0.037973 dt/mean=0.744013 dt/max=0.769423 dt/min=0.719717 dt/std=0.016983 dtd/mean=0.001891 dtd/max=0.002867 dtd/min=0.001532 dtd/std=0.000326 dtf/mean=0.022596 dtf/max=0.025096 dtf/min=0.021569 dtf/std=0.000788 dto/mean=0.698715 dto/max=0.723603 dto/min=0.674526 dto/std=0.016540 dtb/mean=0.020811 dtb/max=0.021485 dtb/min=0.020209 dtb/std=0.000297
[2025-12-31 12:15:59,320244][I][examples/vit:445:train_fn] iter=45 loss=6.965576 dt=0.736210 dtd=0.001692 dtf=0.024698 dto=0.689147 dtb=0.020674 loss/mean=6.996837 loss/max=7.056885 loss/min=6.898682 loss/std=0.037109 dt/mean=0.744331 dt/max=0.750168 dt/min=0.736210 dt/std=0.003461 dtd/mean=0.003146 dtd/max=0.005443 dtd/min=0.001592 dtd/std=0.001397 dtf/mean=0.022662 dtf/max=0.024698 dtf/min=0.021666 dtf/std=0.000771 dto/mean=0.697826 dto/max=0.704015 dto/min=0.689147 dto/std=0.003852 dtb/mean=0.020697 dtb/max=0.021296 dtb/min=0.020216 dtb/std=0.000292
[2025-12-31 12:16:00,081771][I][examples/vit:445:train_fn] iter=46 loss=6.968750 dt=0.741624 dtd=0.001717 dtf=0.023044 dto=0.696055 dtb=0.020807 loss/mean=6.971436 loss/max=7.033203 loss/min=6.863525 loss/std=0.034499 dt/mean=0.747135 dt/max=0.763320 dt/min=0.732857 dt/std=0.011744 dtd/mean=0.002163 dtd/max=0.003133 dtd/min=0.001573 dtd/std=0.000460 dtf/mean=0.022883 dtf/max=0.025204 dtf/min=0.021264 dtf/std=0.001195 dto/mean=0.701406 dto/max=0.716596 dto/min=0.688054 dto/std=0.011067 dtb/mean=0.020684 dtb/max=0.021180 dtb/min=0.020288 dtb/std=0.000252
[2025-12-31 12:16:00,841569][I][examples/vit:445:train_fn] iter=47 loss=6.917969 dt=0.734437 dtd=0.001652 dtf=0.023252 dto=0.688742 dtb=0.020790 loss/mean=6.979259 loss/max=7.023926 loss/min=6.917969 loss/std=0.027274 dt/mean=0.740831 dt/max=0.746801 dt/min=0.734437 dt/std=0.002858 dtd/mean=0.003643 dtd/max=0.005405 dtd/min=0.001531 dtd/std=0.001477 dtf/mean=0.022557 dtf/max=0.024191 dtf/min=0.021717 dtf/std=0.000603 dto/mean=0.693941 dto/max=0.701646 dto/min=0.688742 dto/std=0.002652 dtb/mean=0.020690 dtb/max=0.021128 dtb/min=0.020172 dtb/std=0.000248
[2025-12-31 12:16:01,595278][I][examples/vit:445:train_fn] iter=48 loss=6.997559 dt=0.732685 dtd=0.001657 dtf=0.022673 dto=0.687710 dtb=0.020645 loss/mean=6.980276 loss/max=7.027344 loss/min=6.907715 loss/std=0.035048 dt/mean=0.745125 dt/max=0.751038 dt/min=0.732685 dt/std=0.004158 dtd/mean=0.002830 dtd/max=0.004684 dtd/min=0.001657 dtd/std=0.001000 dtf/mean=0.022955 dtf/max=0.024705 dtf/min=0.022187 dtf/std=0.000669 dto/mean=0.698627 dto/max=0.705444 dto/min=0.687710 dto/std=0.004085 dtb/mean=0.020713 dtb/max=0.021169 dtb/min=0.020320 dtb/std=0.000278
[2025-12-31 12:16:02,343776][I][examples/vit:445:train_fn] iter=49 loss=6.997070 dt=0.729157 dtd=0.001664 dtf=0.023304 dto=0.683426 dtb=0.020763 loss/mean=6.977102 loss/max=7.046875 loss/min=6.920898 loss/std=0.028033 dt/mean=0.738032 dt/max=0.743651 dt/min=0.729157 dt/std=0.003392 dtd/mean=0.003257 dtd/max=0.005523 dtd/min=0.001553 dtd/std=0.001387 dtf/mean=0.022512 dtf/max=0.023798 dtf/min=0.021764 dtf/std=0.000579 dto/mean=0.691580 dto/max=0.697807 dto/min=0.683426 dto/std=0.003778 dtb/mean=0.020684 dtb/max=0.021177 dtb/min=0.020273 dtb/std=0.000231
[2025-12-31 12:16:03,087624][I][examples/vit:445:train_fn] iter=50 loss=6.925293 dt=0.730377 dtd=0.001676 dtf=0.022686 dto=0.684904 dtb=0.021110 loss/mean=6.973846 loss/max=7.045166 loss/min=6.856689 loss/std=0.040075 dt/mean=0.737219 dt/max=0.743345 dt/min=0.730377 dt/std=0.003285 dtd/mean=0.003267 dtd/max=0.005473 dtd/min=0.001551 dtd/std=0.001356 dtf/mean=0.022508 dtf/max=0.023485 dtf/min=0.021924 dtf/std=0.000446 dto/mean=0.690625 dto/max=0.697362 dto/min=0.684904 dto/std=0.003584 dtb/mean=0.020819 dtb/max=0.021817 dtb/min=0.020345 dtb/std=0.000324
[2025-12-31 12:16:03,908992][I][examples/vit:445:train_fn] iter=51 loss=7.015381 dt=0.749356 dtd=0.001659 dtf=0.024072 dto=0.702847 dtb=0.020778 loss/mean=6.979340 loss/max=7.034912 loss/min=6.907959 loss/std=0.035908 dt/mean=0.744960 dt/max=0.756240 dt/min=0.735700 dt/std=0.007744 dtd/mean=0.002572 dtd/max=0.004506 dtd/min=0.001551 dtd/std=0.000984 dtf/mean=0.022622 dtf/max=0.024409 dtf/min=0.021383 dtf/std=0.001038 dto/mean=0.699012 dto/max=0.709381 dto/min=0.689005 dto/std=0.007465 dtb/mean=0.020755 dtb/max=0.021618 dtb/min=0.020256 dtb/std=0.000345
[2025-12-31 12:16:04,680315][I][examples/vit:445:train_fn] iter=52 loss=6.945557 dt=0.718842 dtd=0.001774 dtf=0.022528 dto=0.673614 dtb=0.020926 loss/mean=6.967998 loss/max=7.070068 loss/min=6.917236 loss/std=0.039451 dt/mean=0.752761 dt/max=0.781505 dt/min=0.718842 dt/std=0.019747 dtd/mean=0.001806 dtd/max=0.002223 dtd/min=0.001580 dtd/std=0.000170 dtf/mean=0.022435 dtf/max=0.023659 dtf/min=0.021713 dtf/std=0.000526 dto/mean=0.707762 dto/max=0.735720 dto/min=0.673614 dto/std=0.019645 dtb/mean=0.020759 dtb/max=0.021476 dtb/min=0.020367 dtb/std=0.000264
[2025-12-31 12:16:05,486745][I][examples/vit:445:train_fn] iter=53 loss=7.022705 dt=0.726803 dtd=0.001651 dtf=0.022481 dto=0.681990 dtb=0.020681 loss/mean=6.981262 loss/max=7.050781 loss/min=6.899170 loss/std=0.040407 dt/mean=0.752933 dt/max=0.773121 dt/min=0.726373 dt/std=0.016582 dtd/mean=0.001963 dtd/max=0.003030 dtd/min=0.001545 dtd/std=0.000437 dtf/mean=0.022516 dtf/max=0.024074 dtf/min=0.021568 dtf/std=0.000693 dto/mean=0.707640 dto/max=0.726559 dto/min=0.680444 dto/std=0.016295 dtb/mean=0.020813 dtb/max=0.021907 dtb/min=0.020303 dtb/std=0.000374
[2025-12-31 12:16:06,253200][I][examples/vit:445:train_fn] iter=54 loss=6.986328 dt=0.714074 dtd=0.001667 dtf=0.022484 dto=0.669063 dtb=0.020860 loss/mean=6.987539 loss/max=7.055664 loss/min=6.936035 loss/std=0.028705 dt/mean=0.751958 dt/max=0.784207 dt/min=0.714074 dt/std=0.020769 dtd/mean=0.001780 dtd/max=0.002099 dtd/min=0.001626 dtd/std=0.000151 dtf/mean=0.022335 dtf/max=0.023637 dtf/min=0.021591 dtf/std=0.000496 dto/mean=0.707091 dto/max=0.738269 dto/min=0.669063 dto/std=0.020846 dtb/mean=0.020752 dtb/max=0.021352 dtb/min=0.020242 dtb/std=0.000278
[2025-12-31 12:16:07,036859][I][examples/vit:445:train_fn] iter=55 loss=7.020752 dt=0.726670 dtd=0.001671 dtf=0.023300 dto=0.680836 dtb=0.020863 loss/mean=6.975759 loss/max=7.020752 loss/min=6.936035 loss/std=0.021573 dt/mean=0.748641 dt/max=0.772063 dt/min=0.725118 dt/std=0.017210 dtd/mean=0.002003 dtd/max=0.003673 dtd/min=0.001546 dtd/std=0.000579 dtf/mean=0.022822 dtf/max=0.024702 dtf/min=0.021458 dtf/std=0.000904 dto/mean=0.703149 dto/max=0.725319 dto/min=0.678733 dto/std=0.017051 dtb/mean=0.020667 dtb/max=0.021340 dtb/min=0.020243 dtb/std=0.000276
[2025-12-31 12:16:07,827772][I][examples/vit:445:train_fn] iter=56 loss=6.957275 dt=0.730915 dtd=0.001995 dtf=0.022926 dto=0.685141 dtb=0.020853 loss/mean=6.961477 loss/max=7.016602 loss/min=6.924561 loss/std=0.020670 dt/mean=0.757991 dt/max=0.778522 dt/min=0.730217 dt/std=0.017030 dtd/mean=0.002200 dtd/max=0.003252 dtd/min=0.001647 dtd/std=0.000499 dtf/mean=0.022664 dtf/max=0.023615 dtf/min=0.021357 dtf/std=0.000580 dto/mean=0.712382 dto/max=0.732616 dto/min=0.684696 dto/std=0.016824 dtb/mean=0.020746 dtb/max=0.021194 dtb/min=0.020332 dtb/std=0.000239
[2025-12-31 12:16:08,579242][I][examples/vit:445:train_fn] iter=57 loss=7.012207 dt=0.737117 dtd=0.001701 dtf=0.023760 dto=0.690935 dtb=0.020722 loss/mean=6.988149 loss/max=7.083252 loss/min=6.920166 loss/std=0.034222 dt/mean=0.753169 dt/max=0.786585 dt/min=0.716214 dt/std=0.021419 dtd/mean=0.001782 dtd/max=0.002007 dtd/min=0.001624 dtd/std=0.000127 dtf/mean=0.022379 dtf/max=0.023760 dtf/min=0.021507 dtf/std=0.000557 dto/mean=0.708152 dto/max=0.740705 dto/min=0.672085 dto/std=0.021671 dtb/mean=0.020857 dtb/max=0.021699 dtb/min=0.020344 dtb/std=0.000374
[2025-12-31 12:16:09,362010][I][examples/vit:445:train_fn] iter=58 loss=6.983643 dt=0.763087 dtd=0.001641 dtf=0.024827 dto=0.715777 dtb=0.020842 loss/mean=6.971110 loss/max=7.022461 loss/min=6.918945 loss/std=0.027204 dt/mean=0.748961 dt/max=0.769213 dt/min=0.724905 dt/std=0.015968 dtd/mean=0.001924 dtd/max=0.002593 dtd/min=0.001557 dtd/std=0.000297 dtf/mean=0.022521 dtf/max=0.024827 dtf/min=0.021563 dtf/std=0.000732 dto/mean=0.703753 dto/max=0.723593 dto/min=0.680019 dto/std=0.015505 dtb/mean=0.020763 dtb/max=0.021321 dtb/min=0.020258 dtb/std=0.000261
[2025-12-31 12:16:10,113690][I][examples/vit:445:train_fn] iter=59 loss=6.993652 dt=0.732712 dtd=0.001718 dtf=0.022345 dto=0.687850 dtb=0.020799 loss/mean=6.973938 loss/max=7.031738 loss/min=6.913330 loss/std=0.033942 dt/mean=0.740580 dt/max=0.746455 dt/min=0.732712 dt/std=0.003285 dtd/mean=0.003164 dtd/max=0.005442 dtd/min=0.001579 dtd/std=0.001364 dtf/mean=0.022843 dtf/max=0.024134 dtf/min=0.021971 dtf/std=0.000731 dto/mean=0.693711 dto/max=0.699074 dto/min=0.687850 dto/std=0.003418 dtb/mean=0.020862 dtb/max=0.021546 dtb/min=0.020358 dtb/std=0.000337
[2025-12-31 12:16:10,896999][I][examples/vit:445:train_fn] iter=60 loss=6.969971 dt=0.747576 dtd=0.001661 dtf=0.023705 dto=0.701328 dtb=0.020880 loss/mean=6.962677 loss/max=7.035400 loss/min=6.872803 loss/std=0.038965 dt/mean=0.746533 dt/max=0.760261 dt/min=0.733879 dt/std=0.009983 dtd/mean=0.002699 dtd/max=0.004842 dtd/min=0.001575 dtd/std=0.001212 dtf/mean=0.022815 dtf/max=0.025205 dtf/min=0.021562 dtf/std=0.001125 dto/mean=0.700239 dto/max=0.713503 dto/min=0.687099 dto/std=0.009887 dtb/mean=0.020781 dtb/max=0.021246 dtb/min=0.020256 dtb/std=0.000274
[2025-12-31 12:16:11,665090][I][examples/vit:445:train_fn] iter=61 loss=6.933838 dt=0.748712 dtd=0.001657 dtf=0.023562 dto=0.702590 dtb=0.020903 loss/mean=6.976868 loss/max=7.038330 loss/min=6.912598 loss/std=0.033942 dt/mean=0.748078 dt/max=0.776145 dt/min=0.718618 dt/std=0.019553 dtd/mean=0.001892 dtd/max=0.002529 dtd/min=0.001623 dtd/std=0.000261 dtf/mean=0.022607 dtf/max=0.024323 dtf/min=0.021660 dtf/std=0.000795 dto/mean=0.702790 dto/max=0.728456 dto/min=0.674124 dto/std=0.019070 dtb/mean=0.020790 dtb/max=0.021661 dtb/min=0.020219 dtb/std=0.000346
[2025-12-31 12:16:12,412241][I][examples/vit:445:train_fn] iter=62 loss=6.958252 dt=0.732601 dtd=0.001710 dtf=0.022547 dto=0.687455 dtb=0.020889 loss/mean=6.965841 loss/max=7.051514 loss/min=6.913818 loss/std=0.035426 dt/mean=0.739734 dt/max=0.745430 dt/min=0.732601 dt/std=0.003257 dtd/mean=0.003280 dtd/max=0.005450 dtd/min=0.001559 dtd/std=0.001292 dtf/mean=0.022775 dtf/max=0.024696 dtf/min=0.021823 dtf/std=0.000722 dto/mean=0.692936 dto/max=0.698621 dto/min=0.687455 dto/std=0.003339 dtb/mean=0.020744 dtb/max=0.021340 dtb/min=0.020305 dtb/std=0.000279
[2025-12-31 12:16:13,174938][I][examples/vit:445:train_fn] iter=63 loss=7.009521 dt=0.740339 dtd=0.001691 dtf=0.023941 dto=0.693831 dtb=0.020876 loss/mean=6.973592 loss/max=7.028320 loss/min=6.924316 loss/std=0.029232 dt/mean=0.739645 dt/max=0.748818 dt/min=0.733537 dt/std=0.005658 dtd/mean=0.003032 dtd/max=0.005469 dtd/min=0.001573 dtd/std=0.001450 dtf/mean=0.022828 dtf/max=0.024976 dtf/min=0.021450 dtf/std=0.001211 dto/mean=0.692919 dto/max=0.702017 dto/min=0.685927 dto/std=0.005692 dtb/mean=0.020867 dtb/max=0.022359 dtb/min=0.020390 dtb/std=0.000440
[2025-12-31 12:16:13,919647][I][examples/vit:445:train_fn] iter=64 loss=6.896729 dt=0.731400 dtd=0.001672 dtf=0.022233 dto=0.686662 dtb=0.020833 loss/mean=6.962646 loss/max=7.011475 loss/min=6.896729 loss/std=0.027964 dt/mean=0.737714 dt/max=0.741256 dt/min=0.731400 dt/std=0.002549 dtd/mean=0.003417 dtd/max=0.005453 dtd/min=0.001575 dtd/std=0.001243 dtf/mean=0.022408 dtf/max=0.023186 dtf/min=0.021953 dtf/std=0.000294 dto/mean=0.691057 dto/max=0.695397 dto/min=0.686662 dto/std=0.002354 dtb/mean=0.020831 dtb/max=0.021627 dtb/min=0.020358 dtb/std=0.000299
[2025-12-31 12:16:14,724718][I][examples/vit:445:train_fn] iter=65 loss=6.983154 dt=0.740475 dtd=0.001657 dtf=0.026268 dto=0.691680 dtb=0.020869 loss/mean=6.979614 loss/max=7.029297 loss/min=6.906494 loss/std=0.025466 dt/mean=0.737195 dt/max=0.748038 dt/min=0.728805 dt/std=0.006987 dtd/mean=0.002650 dtd/max=0.005013 dtd/min=0.001557 dtd/std=0.001223 dtf/mean=0.022836 dtf/max=0.026268 dtf/min=0.021512 dtf/std=0.001202 dto/mean=0.690811 dto/max=0.701612 dto/min=0.682157 dto/std=0.007034 dtb/mean=0.020898 dtb/max=0.022102 dtb/min=0.020411 dtb/std=0.000431
[2025-12-31 12:16:15,460588][I][examples/vit:445:train_fn] iter=66 loss=6.993408 dt=0.717219 dtd=0.001660 dtf=0.022625 dto=0.672191 dtb=0.020743 loss/mean=6.968008 loss/max=7.035645 loss/min=6.893066 loss/std=0.032624 dt/mean=0.751308 dt/max=0.772520 dt/min=0.717219 dt/std=0.019154 dtd/mean=0.001972 dtd/max=0.002954 dtd/min=0.001552 dtd/std=0.000424 dtf/mean=0.022521 dtf/max=0.023821 dtf/min=0.021679 dtf/std=0.000597 dto/mean=0.705970 dto/max=0.727169 dto/min=0.672191 dto/std=0.018948 dtb/mean=0.020846 dtb/max=0.021620 dtb/min=0.020417 dtb/std=0.000325
[2025-12-31 12:16:16,284567][I][examples/vit:445:train_fn] iter=67 loss=6.943604 dt=0.745598 dtd=0.002759 dtf=0.023811 dto=0.698298 dtb=0.020731 loss/mean=6.967296 loss/max=7.018311 loss/min=6.920166 loss/std=0.025391 dt/mean=0.745446 dt/max=0.759088 dt/min=0.734187 dt/std=0.009741 dtd/mean=0.002868 dtd/max=0.005055 dtd/min=0.001623 dtd/std=0.001256 dtf/mean=0.022759 dtf/max=0.024636 dtf/min=0.021525 dtf/std=0.001030 dto/mean=0.699143 dto/max=0.711942 dto/min=0.687225 dto/std=0.009435 dtb/mean=0.020677 dtb/max=0.021094 dtb/min=0.020287 dtb/std=0.000258
[2025-12-31 12:16:17,011642][I][examples/vit:445:train_fn] iter=68 loss=6.976074 dt=0.713172 dtd=0.001813 dtf=0.023177 dto=0.667256 dtb=0.020926 loss/mean=6.972229 loss/max=7.019043 loss/min=6.924316 loss/std=0.023519 dt/mean=0.747308 dt/max=0.778938 dt/min=0.712412 dt/std=0.020759 dtd/mean=0.001799 dtd/max=0.002089 dtd/min=0.001641 dtd/std=0.000141 dtf/mean=0.022417 dtf/max=0.023780 dtf/min=0.021402 dtf/std=0.000554 dto/mean=0.702239 dto/max=0.732663 dto/min=0.667256 dto/std=0.020857 dtb/mean=0.020854 dtb/max=0.021991 dtb/min=0.020358 dtb/std=0.000379
[2025-12-31 12:16:17,757204][I][examples/vit:445:train_fn] iter=69 loss=7.019287 dt=0.736641 dtd=0.001659 dtf=0.023917 dto=0.690050 dtb=0.021014 loss/mean=6.978435 loss/max=7.033447 loss/min=6.893066 loss/std=0.031614 dt/mean=0.736665 dt/max=0.741711 dt/min=0.731619 dt/std=0.003202 dtd/mean=0.003064 dtd/max=0.005396 dtd/min=0.001570 dtd/std=0.001377 dtf/mean=0.022623 dtf/max=0.024116 dtf/min=0.021729 dtf/std=0.000645 dto/mean=0.690252 dto/max=0.696442 dto/min=0.686604 dto/std=0.003588 dtb/mean=0.020725 dtb/max=0.021321 dtb/min=0.020251 dtb/std=0.000325
[2025-12-31 12:16:18,541456][I][examples/vit:445:train_fn] iter=70 loss=6.986084 dt=0.760055 dtd=0.001617 dtf=0.025038 dto=0.712560 dtb=0.020840 loss/mean=6.975403 loss/max=7.017090 loss/min=6.896240 loss/std=0.027344 dt/mean=0.746176 dt/max=0.763127 dt/min=0.731291 dt/std=0.012082 dtd/mean=0.002768 dtd/max=0.005176 dtd/min=0.001572 dtd/std=0.001342 dtf/mean=0.022493 dtf/max=0.025038 dtf/min=0.021381 dtf/std=0.001002 dto/mean=0.700163 dto/max=0.716501 dto/min=0.684296 dto/std=0.012116 dtb/mean=0.020752 dtb/max=0.021390 dtb/min=0.020222 dtb/std=0.000294
[2025-12-31 12:16:19,334196][I][examples/vit:445:train_fn] iter=71 loss=6.953369 dt=0.771233 dtd=0.001641 dtf=0.022572 dto=0.726191 dtb=0.020829 loss/mean=6.974915 loss/max=7.038574 loss/min=6.913574 loss/std=0.029813 dt/mean=0.757112 dt/max=0.789940 dt/min=0.722059 dt/std=0.020186 dtd/mean=0.001796 dtd/max=0.002250 dtd/min=0.001634 dtd/std=0.000169 dtf/mean=0.022474 dtf/max=0.023432 dtf/min=0.021478 dtf/std=0.000466 dto/mean=0.712175 dto/max=0.744055 dto/min=0.677076 dto/std=0.020119 dtb/mean=0.020667 dtb/max=0.021133 dtb/min=0.020371 dtb/std=0.000207
[2025-12-31 12:16:20,077115][I][examples/vit:445:train_fn] iter=72 loss=7.032959 dt=0.734149 dtd=0.001679 dtf=0.022509 dto=0.688994 dtb=0.020967 loss/mean=6.976024 loss/max=7.038330 loss/min=6.899414 loss/std=0.038571 dt/mean=0.740182 dt/max=0.745929 dt/min=0.734149 dt/std=0.003146 dtd/mean=0.003266 dtd/max=0.005491 dtd/min=0.001601 dtd/std=0.001299 dtf/mean=0.022640 dtf/max=0.024450 dtf/min=0.021874 dtf/std=0.000701 dto/mean=0.693552 dto/max=0.699313 dto/min=0.688994 dto/std=0.002868 dtb/mean=0.020724 dtb/max=0.021125 dtb/min=0.020261 dtb/std=0.000253
[2025-12-31 12:16:20,836082][I][examples/vit:445:train_fn] iter=73 loss=7.007568 dt=0.745784 dtd=0.001600 dtf=0.024390 dto=0.699126 dtb=0.020668 loss/mean=6.979442 loss/max=7.045898 loss/min=6.938721 loss/std=0.027204 dt/mean=0.742050 dt/max=0.748410 dt/min=0.735671 dt/std=0.003312 dtd/mean=0.003212 dtd/max=0.005472 dtd/min=0.001600 dtd/std=0.001346 dtf/mean=0.022506 dtf/max=0.024390 dtf/min=0.021779 dtf/std=0.000684 dto/mean=0.695677 dto/max=0.702715 dto/min=0.691317 dto/std=0.003525 dtb/mean=0.020656 dtb/max=0.021041 dtb/min=0.020242 dtb/std=0.000207
[2025-12-31 12:16:21,664389][I][examples/vit:445:train_fn] iter=74 loss=6.933594 dt=0.749712 dtd=0.001732 dtf=0.024171 dto=0.702734 dtb=0.021074 loss/mean=6.961670 loss/max=7.008789 loss/min=6.928467 loss/std=0.021573 dt/mean=0.744975 dt/max=0.755956 dt/min=0.736076 dt/std=0.007805 dtd/mean=0.002906 dtd/max=0.004958 dtd/min=0.001630 dtd/std=0.001174 dtf/mean=0.022639 dtf/max=0.024171 dtf/min=0.021378 dtf/std=0.001003 dto/mean=0.698668 dto/max=0.708864 dto/min=0.689240 dto/std=0.007451 dtb/mean=0.020762 dtb/max=0.021149 dtb/min=0.020382 dtb/std=0.000228
[2025-12-31 12:16:22,453938][I][examples/vit:445:train_fn] iter=75 loss=6.971191 dt=0.717920 dtd=0.001719 dtf=0.022678 dto=0.672664 dtb=0.020860 loss/mean=6.967296 loss/max=7.068115 loss/min=6.936523 loss/std=0.028438 dt/mean=0.752582 dt/max=0.785606 dt/min=0.717920 dt/std=0.021208 dtd/mean=0.001796 dtd/max=0.002113 dtd/min=0.001629 dtd/std=0.000155 dtf/mean=0.022076 dtf/max=0.022856 dtf/min=0.021302 dtf/std=0.000369 dto/mean=0.707902 dto/max=0.740324 dto/min=0.672664 dto/std=0.021404 dtb/mean=0.020809 dtb/max=0.022220 dtb/min=0.020329 dtb/std=0.000479
[2025-12-31 12:16:23,201097][I][examples/vit:445:train_fn] iter=76 loss=6.988281 dt=0.727026 dtd=0.001768 dtf=0.022971 dto=0.681522 dtb=0.020766 loss/mean=6.961833 loss/max=7.000977 loss/min=6.909180 loss/std=0.025466 dt/mean=0.755362 dt/max=0.786134 dt/min=0.719971 dt/std=0.020749 dtd/mean=0.001817 dtd/max=0.002205 dtd/min=0.001637 dtd/std=0.000169 dtf/mean=0.022384 dtf/max=0.024022 dtf/min=0.021393 dtf/std=0.000706 dto/mean=0.710380 dto/max=0.739394 dto/min=0.674898 dto/std=0.020961 dtb/mean=0.020782 dtb/max=0.021310 dtb/min=0.020390 dtb/std=0.000236
[2025-12-31 12:16:23,947330][I][examples/vit:445:train_fn] iter=77 loss=6.943848 dt=0.730181 dtd=0.001688 dtf=0.022086 dto=0.685413 dtb=0.020994 loss/mean=6.956411 loss/max=7.021484 loss/min=6.871338 loss/std=0.025088 dt/mean=0.741112 dt/max=0.746444 dt/min=0.730181 dt/std=0.003829 dtd/mean=0.002945 dtd/max=0.005359 dtd/min=0.001569 dtd/std=0.001373 dtf/mean=0.022700 dtf/max=0.024477 dtf/min=0.021997 dtf/std=0.000720 dto/mean=0.694724 dto/max=0.701460 dto/min=0.685413 dto/std=0.003837 dtb/mean=0.020743 dtb/max=0.021349 dtb/min=0.020327 dtb/std=0.000262
[2025-12-31 12:16:24,773082][I][examples/vit:445:train_fn] iter=78 loss=6.978516 dt=0.761054 dtd=0.001700 dtf=0.024937 dto=0.713578 dtb=0.020839 loss/mean=6.965098 loss/max=7.022461 loss/min=6.865479 loss/std=0.034166 dt/mean=0.748827 dt/max=0.766382 dt/min=0.732463 dt/std=0.012634 dtd/mean=0.002365 dtd/max=0.004309 dtd/min=0.001603 dtd/std=0.000892 dtf/mean=0.022818 dtf/max=0.025502 dtf/min=0.021410 dtf/std=0.001281 dto/mean=0.702965 dto/max=0.718497 dto/min=0.686230 dto/std=0.012092 dtb/mean=0.020679 dtb/max=0.021142 dtb/min=0.020321 dtb/std=0.000249
[2025-12-31 12:16:25,518658][I][examples/vit:445:train_fn] iter=79 loss=6.922363 dt=0.727205 dtd=0.001661 dtf=0.023797 dto=0.680897 dtb=0.020849 loss/mean=6.969279 loss/max=7.024170 loss/min=6.917969 loss/std=0.028303 dt/mean=0.751349 dt/max=0.784285 dt/min=0.716415 dt/std=0.020322 dtd/mean=0.001798 dtd/max=0.002212 dtd/min=0.001578 dtd/std=0.000171 dtf/mean=0.022376 dtf/max=0.024469 dtf/min=0.021327 dtf/std=0.000698 dto/mean=0.706422 dto/max=0.737019 dto/min=0.671566 dto/std=0.020411 dtb/mean=0.020752 dtb/max=0.021284 dtb/min=0.020372 dtb/std=0.000268
[2025-12-31 12:16:26,272580][I][examples/vit:445:train_fn] iter=80 loss=6.928955 dt=0.734315 dtd=0.003959 dtf=0.022724 dto=0.686737 dtb=0.020895 loss/mean=6.962504 loss/max=7.028076 loss/min=6.925293 loss/std=0.027552 dt/mean=0.740758 dt/max=0.746343 dt/min=0.734315 dt/std=0.002784 dtd/mean=0.003410 dtd/max=0.005488 dtd/min=0.001590 dtd/std=0.001289 dtf/mean=0.022813 dtf/max=0.024931 dtf/min=0.021937 dtf/std=0.000724 dto/mean=0.693835 dto/max=0.700043 dto/min=0.686737 dto/std=0.003044 dtb/mean=0.020699 dtb/max=0.021340 dtb/min=0.020316 dtb/std=0.000263
[2025-12-31 12:16:27,084889][I][examples/vit:445:train_fn] iter=81 loss=6.980469 dt=0.746773 dtd=0.001703 dtf=0.023629 dto=0.700639 dtb=0.020802 loss/mean=6.972107 loss/max=7.020996 loss/min=6.912598 loss/std=0.028705 dt/mean=0.748201 dt/max=0.759930 dt/min=0.739375 dt/std=0.007907 dtd/mean=0.002643 dtd/max=0.004569 dtd/min=0.001562 dtd/std=0.000958 dtf/mean=0.022796 dtf/max=0.024607 dtf/min=0.021381 dtf/std=0.001134 dto/mean=0.702011 dto/max=0.712560 dto/min=0.692673 dto/std=0.007507 dtb/mean=0.020751 dtb/max=0.021455 dtb/min=0.020430 dtb/std=0.000278
[2025-12-31 12:16:27,843702][I][examples/vit:445:train_fn] iter=82 loss=6.937988 dt=0.734645 dtd=0.001703 dtf=0.023584 dto=0.688593 dtb=0.020765 loss/mean=6.963796 loss/max=7.047852 loss/min=6.843262 loss/std=0.039979 dt/mean=0.755872 dt/max=0.788405 dt/min=0.721559 dt/std=0.020939 dtd/mean=0.001804 dtd/max=0.002214 dtd/min=0.001632 dtd/std=0.000168 dtf/mean=0.022503 dtf/max=0.023584 dtf/min=0.021390 dtf/std=0.000611 dto/mean=0.710801 dto/max=0.742479 dto/min=0.676907 dto/std=0.021092 dtb/mean=0.020765 dtb/max=0.021568 dtb/min=0.020342 dtb/std=0.000294
[2025-12-31 12:16:28,646170][I][examples/vit:445:train_fn] iter=83 loss=6.970703 dt=0.768635 dtd=0.001662 dtf=0.023451 dto=0.722643 dtb=0.020879 loss/mean=6.966736 loss/max=7.012695 loss/min=6.913574 loss/std=0.026851 dt/mean=0.751228 dt/max=0.781235 dt/min=0.720570 dt/std=0.019909 dtd/mean=0.001883 dtd/max=0.002957 dtd/min=0.001620 dtd/std=0.000329 dtf/mean=0.022392 dtf/max=0.024167 dtf/min=0.021578 dtf/std=0.000560 dto/mean=0.706234 dto/max=0.735096 dto/min=0.675456 dto/std=0.019756 dtb/mean=0.020719 dtb/max=0.021357 dtb/min=0.020337 dtb/std=0.000256
[2025-12-31 12:16:29,459074][I][examples/vit:445:train_fn] iter=84 loss=6.983887 dt=0.747875 dtd=0.001695 dtf=0.024866 dto=0.700442 dtb=0.020872 loss/mean=6.968292 loss/max=7.023682 loss/min=6.903809 loss/std=0.031855 dt/mean=0.752166 dt/max=0.773170 dt/min=0.729233 dt/std=0.015648 dtd/mean=0.002163 dtd/max=0.003639 dtd/min=0.001540 dtd/std=0.000643 dtf/mean=0.022767 dtf/max=0.024866 dtf/min=0.021553 dtf/std=0.000826 dto/mean=0.706544 dto/max=0.726773 dto/min=0.683277 dto/std=0.015370 dtb/mean=0.020692 dtb/max=0.021045 dtb/min=0.020339 dtb/std=0.000232
[2025-12-31 12:16:30,208391][I][examples/vit:445:train_fn] iter=85 loss=6.955322 dt=0.724813 dtd=0.001680 dtf=0.023383 dto=0.678959 dtb=0.020791 loss/mean=6.962280 loss/max=7.042480 loss/min=6.893799 loss/std=0.035695 dt/mean=0.751438 dt/max=0.784257 dt/min=0.715721 dt/std=0.020661 dtd/mean=0.001808 dtd/max=0.002161 dtd/min=0.001590 dtd/std=0.000179 dtf/mean=0.022539 dtf/max=0.023851 dtf/min=0.021704 dtf/std=0.000611 dto/mean=0.706313 dto/max=0.738357 dto/min=0.671076 dto/std=0.020862 dtb/mean=0.020778 dtb/max=0.021399 dtb/min=0.020329 dtb/std=0.000294
[2025-12-31 12:16:30,995940][I][examples/vit:445:train_fn] iter=86 loss=7.006348 dt=0.746630 dtd=0.004681 dtf=0.023454 dto=0.697523 dtb=0.020972 loss/mean=6.955139 loss/max=7.006592 loss/min=6.906006 loss/std=0.033146 dt/mean=0.744196 dt/max=0.762628 dt/min=0.727688 dt/std=0.013330 dtd/mean=0.002605 dtd/max=0.004681 dtd/min=0.001628 dtd/std=0.001020 dtf/mean=0.022811 dtf/max=0.024531 dtf/min=0.021475 dtf/std=0.000946 dto/mean=0.697820 dto/max=0.715555 dto/min=0.680953 dto/std=0.012916 dtb/mean=0.020960 dtb/max=0.022034 dtb/min=0.020438 dtb/std=0.000390
[2025-12-31 12:16:31,768262][I][examples/vit:445:train_fn] iter=87 loss=6.986572 dt=0.753831 dtd=0.001649 dtf=0.023473 dto=0.707854 dtb=0.020854 loss/mean=6.973979 loss/max=7.022217 loss/min=6.937012 loss/std=0.024000 dt/mean=0.752923 dt/max=0.785754 dt/min=0.716823 dt/std=0.020745 dtd/mean=0.001765 dtd/max=0.002053 dtd/min=0.001533 dtd/std=0.000135 dtf/mean=0.022395 dtf/max=0.023473 dtf/min=0.021672 dtf/std=0.000459 dto/mean=0.708042 dto/max=0.740092 dto/min=0.672523 dto/std=0.020517 dtb/mean=0.020721 dtb/max=0.021177 dtb/min=0.020375 dtb/std=0.000236
[2025-12-31 12:16:32,573348][I][examples/vit:445:train_fn] iter=88 loss=6.975830 dt=0.744247 dtd=0.001681 dtf=0.024306 dto=0.697233 dtb=0.021027 loss/mean=6.963277 loss/max=7.026367 loss/min=6.892090 loss/std=0.030571 dt/mean=0.742532 dt/max=0.756320 dt/min=0.730731 dt/std=0.010125 dtd/mean=0.002749 dtd/max=0.004906 dtd/min=0.001625 dtd/std=0.001149 dtf/mean=0.022892 dtf/max=0.025352 dtf/min=0.021483 dtf/std=0.001251 dto/mean=0.696179 dto/max=0.709292 dto/min=0.684075 dto/std=0.009735 dtb/mean=0.020712 dtb/max=0.021224 dtb/min=0.020333 dtb/std=0.000239
[2025-12-31 12:16:33,343035][I][examples/vit:445:train_fn] iter=89 loss=6.995605 dt=0.719805 dtd=0.001665 dtf=0.022857 dto=0.674301 dtb=0.020983 loss/mean=6.958171 loss/max=7.008789 loss/min=6.910400 loss/std=0.030195 dt/mean=0.746791 dt/max=0.771666 dt/min=0.719805 dt/std=0.018258 dtd/mean=0.002107 dtd/max=0.003667 dtd/min=0.001627 dtd/std=0.000692 dtf/mean=0.022595 dtf/max=0.024857 dtf/min=0.021425 dtf/std=0.001067 dto/mean=0.701338 dto/max=0.722970 dto/min=0.674301 dto/std=0.017279 dtb/mean=0.020751 dtb/max=0.021155 dtb/min=0.020439 dtb/std=0.000192
[2025-12-31 12:16:34,121240][I][examples/vit:445:train_fn] iter=90 loss=6.995850 dt=0.726353 dtd=0.001734 dtf=0.022452 dto=0.681219 dtb=0.020947 loss/mean=6.958120 loss/max=7.021484 loss/min=6.881348 loss/std=0.034720 dt/mean=0.749065 dt/max=0.770316 dt/min=0.725386 dt/std=0.016169 dtd/mean=0.001976 dtd/max=0.002963 dtd/min=0.001595 dtd/std=0.000374 dtf/mean=0.022522 dtf/max=0.024208 dtf/min=0.021439 dtf/std=0.000793 dto/mean=0.703768 dto/max=0.723543 dto/min=0.680416 dto/std=0.015765 dtb/mean=0.020798 dtb/max=0.021766 dtb/min=0.020314 dtb/std=0.000326
[2025-12-31 12:16:34,889609][I][examples/vit:445:train_fn] iter=91 loss=6.919434 dt=0.725853 dtd=0.004117 dtf=0.022681 dto=0.678346 dtb=0.020710 loss/mean=6.969594 loss/max=7.016113 loss/min=6.892090 loss/std=0.027690 dt/mean=0.750679 dt/max=0.769203 dt/min=0.725032 dt/std=0.015449 dtd/mean=0.002085 dtd/max=0.004117 dtd/min=0.001563 dtd/std=0.000580 dtf/mean=0.022936 dtf/max=0.024109 dtf/min=0.021977 dtf/std=0.000661 dto/mean=0.704853 dto/max=0.723677 dto/min=0.678346 dto/std=0.015343 dtb/mean=0.020805 dtb/max=0.021383 dtb/min=0.020440 dtb/std=0.000236
[2025-12-31 12:16:35,646065][I][examples/vit:445:train_fn] iter=92 loss=6.972900 dt=0.743182 dtd=0.001629 dtf=0.022283 dto=0.698406 dtb=0.020864 loss/mean=6.962301 loss/max=7.013672 loss/min=6.867188 loss/std=0.034444 dt/mean=0.754227 dt/max=0.777483 dt/min=0.731005 dt/std=0.016673 dtd/mean=0.002124 dtd/max=0.003964 dtd/min=0.001629 dtd/std=0.000662 dtf/mean=0.022493 dtf/max=0.024020 dtf/min=0.021623 dtf/std=0.000811 dto/mean=0.708874 dto/max=0.731137 dto/min=0.686752 dto/std=0.015792 dtb/mean=0.020735 dtb/max=0.021157 dtb/min=0.020330 dtb/std=0.000213
[2025-12-31 12:16:36,405894][I][examples/vit:445:train_fn] iter=93 loss=6.955811 dt=0.737651 dtd=0.001657 dtf=0.027067 dto=0.688207 dtb=0.020721 loss/mean=6.956289 loss/max=7.032227 loss/min=6.893311 loss/std=0.029941 dt/mean=0.739291 dt/max=0.745214 dt/min=0.734239 dt/std=0.003010 dtd/mean=0.003118 dtd/max=0.005423 dtd/min=0.001554 dtd/std=0.001424 dtf/mean=0.022985 dtf/max=0.027067 dtf/min=0.022070 dtf/std=0.001058 dto/mean=0.692402 dto/max=0.699386 dto/min=0.688207 dto/std=0.003538 dtb/mean=0.020786 dtb/max=0.022114 dtb/min=0.020341 dtb/std=0.000411
[2025-12-31 12:16:37,197312][I][examples/vit:445:train_fn] iter=94 loss=6.941406 dt=0.748396 dtd=0.001713 dtf=0.022729 dto=0.703060 dtb=0.020894 loss/mean=6.956451 loss/max=7.019287 loss/min=6.906250 loss/std=0.030571 dt/mean=0.753825 dt/max=0.763172 dt/min=0.746897 dt/std=0.006138 dtd/mean=0.003277 dtd/max=0.005392 dtd/min=0.001634 dtd/std=0.001310 dtf/mean=0.022594 dtf/max=0.024555 dtf/min=0.021348 dtf/std=0.001013 dto/mean=0.707138 dto/max=0.716177 dto/min=0.699779 dto/std=0.006010 dtb/mean=0.020816 dtb/max=0.021603 dtb/min=0.020155 dtb/std=0.000312
[2025-12-31 12:16:37,943003][I][examples/vit:445:train_fn] iter=95 loss=6.995361 dt=0.725441 dtd=0.001654 dtf=0.023189 dto=0.679825 dtb=0.020773 loss/mean=6.960592 loss/max=7.019287 loss/min=6.904785 loss/std=0.028771 dt/mean=0.744709 dt/max=0.761425 dt/min=0.725441 dt/std=0.013316 dtd/mean=0.002854 dtd/max=0.004914 dtd/min=0.001627 dtd/std=0.001285 dtf/mean=0.022630 dtf/max=0.024223 dtf/min=0.021440 dtf/std=0.000929 dto/mean=0.698424 dto/max=0.715335 dto/min=0.679825 dto/std=0.012162 dtb/mean=0.020800 dtb/max=0.021414 dtb/min=0.020413 dtb/std=0.000289
[2025-12-31 12:16:38,685541][I][examples/vit:445:train_fn] iter=96 loss=6.924316 dt=0.730290 dtd=0.001679 dtf=0.022138 dto=0.685656 dtb=0.020817 loss/mean=6.952393 loss/max=7.033936 loss/min=6.894531 loss/std=0.028505 dt/mean=0.738270 dt/max=0.744446 dt/min=0.730290 dt/std=0.003330 dtd/mean=0.003186 dtd/max=0.005505 dtd/min=0.001642 dtd/std=0.001371 dtf/mean=0.022534 dtf/max=0.023826 dtf/min=0.021813 dtf/std=0.000534 dto/mean=0.691715 dto/max=0.697518 dto/min=0.685656 dto/std=0.003466 dtb/mean=0.020836 dtb/max=0.021774 dtb/min=0.020360 dtb/std=0.000338
[2025-12-31 12:16:39,454027][I][examples/vit:445:train_fn] iter=97 loss=6.977295 dt=0.747579 dtd=0.001707 dtf=0.026247 dto=0.698855 dtb=0.020769 loss/mean=6.967041 loss/max=7.055420 loss/min=6.875977 loss/std=0.035210 dt/mean=0.745013 dt/max=0.753496 dt/min=0.739475 dt/std=0.005213 dtd/mean=0.002956 dtd/max=0.004715 dtd/min=0.001561 dtd/std=0.000987 dtf/mean=0.022581 dtf/max=0.026247 dtf/min=0.021254 dtf/std=0.001179 dto/mean=0.698686 dto/max=0.706904 dto/min=0.692804 dto/std=0.004965 dtb/mean=0.020791 dtb/max=0.021900 dtb/min=0.020284 dtb/std=0.000342
[2025-12-31 12:16:40,195874][I][examples/vit:445:train_fn] iter=98 loss=6.967773 dt=0.732756 dtd=0.001660 dtf=0.025091 dto=0.685154 dtb=0.020851 loss/mean=6.942464 loss/max=7.022705 loss/min=6.878174 loss/std=0.035641 dt/mean=0.739319 dt/max=0.745367 dt/min=0.732756 dt/std=0.003418 dtd/mean=0.002662 dtd/max=0.003719 dtd/min=0.001660 dtd/std=0.000606 dtf/mean=0.022934 dtf/max=0.025091 dtf/min=0.021985 dtf/std=0.000785 dto/mean=0.692973 dto/max=0.697970 dto/min=0.685154 dto/std=0.002909 dtb/mean=0.020750 dtb/max=0.021213 dtb/min=0.020436 dtb/std=0.000199
[2025-12-31 12:16:40,950324][I][examples/vit:445:train_fn] iter=99 loss=6.977539 dt=0.745598 dtd=0.002238 dtf=0.023939 dto=0.698594 dtb=0.020827 loss/mean=6.956676 loss/max=7.008301 loss/min=6.899658 loss/std=0.022355 dt/mean=0.741473 dt/max=0.748397 dt/min=0.733044 dt/std=0.004455 dtd/mean=0.003068 dtd/max=0.005405 dtd/min=0.001547 dtd/std=0.001378 dtf/mean=0.023009 dtf/max=0.025263 dtf/min=0.021605 dtf/std=0.000825 dto/mean=0.694567 dto/max=0.701266 dto/min=0.687203 dto/std=0.004004 dtb/mean=0.020828 dtb/max=0.021858 dtb/min=0.020438 dtb/std=0.000346
[2025-12-31 12:16:41,771715][I][examples/vit:445:train_fn] iter=100 loss=7.000244 dt=0.760887 dtd=0.001625 dtf=0.026943 dto=0.711486 dtb=0.020833 loss/mean=6.960460 loss/max=7.000244 loss/min=6.919434 loss/std=0.022609 dt/mean=0.746579 dt/max=0.762490 dt/min=0.731703 dt/std=0.012070 dtd/mean=0.002105 dtd/max=0.003019 dtd/min=0.001625 dtd/std=0.000459 dtf/mean=0.022804 dtf/max=0.026943 dtf/min=0.021464 dtf/std=0.001315 dto/mean=0.700953 dto/max=0.715302 dto/min=0.686720 dto/std=0.011352 dtb/mean=0.020716 dtb/max=0.021132 dtb/min=0.020289 dtb/std=0.000214
/lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/src/ezpz/history.py:2223: UserWarning: Converting a tensor with requires_grad=True to a scalar may lead to unexpected behavior.
Consider using tensor.detach() first. (Triggered internally at /lus/tegu/projects/datasets/software/wheelforge/repositories/pytorch_2p8_rel_07_18_2025/pytorch/torch/csrc/autograd/generated/python_variable_methods.cpp:835.)
x = torch.Tensor(x).numpy(force=True)
[2025-12-31 12:16:42,102625][I][ezpz/history:2385:finalize] Saving plots to /lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/outputs/2025-12-31-121308/2025-12-31-121642/plots/mplot (matplotlib) and /lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/outputs/2025-12-31-121308/2025-12-31-121642/plots/tplot (tplot)
train_dt train_dt/min
βββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββ
0.792β€ ββ β0.7469β€-- - -- β
0.777β€ ββ β β0.7304β€--------------------------------β
β ββββ ββ β ββ β β0.7138β€-- - - - --- ---------- β
0.762β€ ββββ βββ β ββ β β β ββ0.6973β€- β
0.747β€ββ ββββββββββββ ββ ββββββββββββββ ββ¬ββββββββ¬ββββββββ¬βββββββ¬ββββββββ¬β
0.732β€ββββββββββββββββββββββββββββββββββ 1.0 23.5 46.0 68.5 91.0
βββββ ββββ βββ βββ ββ βββββββββ βtrain_dt/min iter
0.717β€ββ β ββ ββ β β β train_dt/std
0.702β€β β ββββββββββββββββββββββββββββββββββ
ββ¬ββββββββ¬ββββββββ¬ββββββββ¬ββββββββ¬β0.0225β€* * * *** ******** β
1.0 23.5 46.0 68.5 91.0 0.0188β€** ***** **** *************** β
train_dt iter 0.0113β€***** ** ** * * ******** **** *β
train_dt/mean 0.0075β€***** **** *** ******** ***β
ββββββββββββββββββββββββββββββββββ0.0000β€** * * * β
0.7657β€ Β· Β· β ββ¬ββββββββ¬ββββββββ¬βββββββ¬ββββββββ¬β
0.7588β€ Β·Β· Β·Β· β 1.0 23.5 46.0 68.5 91.0
β Β·Β·Β· Β· Β·Β· Β· Β· Β· Β· βtrain_dt/std iter
0.7519β€ Β·Β·Β·Β·Β·Β·Β· Β·Β· Β·Β·Β· Β· Β·Β·Β·Β·Β·Β·Β·Β·Β· β train_dt/max
0.7450β€ Β·Β·Β·Β·Β·Β·Β·Β·Β· Β·Β·Β·Β· Β·Β·Β·Β·Β·Β·Β·Β·Β· Β·Β·Β·Β· Β·β βββββββββββββββββββββββββββββββββββ
βΒ·Β·Β·Β·Β· Β·Β·Β·Β· Β·Β·Β· Β·Β·Β·Β·Β·Β·Β·Β· Β·Β·Β·Β·Β·Β·β0.793β€ + ++ + ++ ++ β
0.7381β€Β·Β·Β·Β· Β·Β· Β·Β· Β·Β·Β·Β· Β·Β· β0.783β€++ + +++ +++ ++++ ++++ +++++ β
0.7312β€Β·Β·Β· β0.763β€+++++++++ + ++ + ++ ++++++ ++++ +β
βΒ·Β· β0.752β€+++ + ++++ +++ ++++++ ++ +++β
0.7243β€ Β· β0.732β€ ++ β
ββ¬ββββββββ¬ββββββββ¬βββββββ¬ββββββββ¬β ββ¬ββββββββ¬ββββββββ¬ββββββββ¬ββββββββ¬β
1.0 23.5 46.0 68.5 91.0 1.0 23.5 46.0 68.5 91.0
train_dt/mean iter train_dt/max iter
text saved in /lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/outputs/2025-12-31-121308/2025-12-31-121642/plots/tplot/train_dt.txt
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
0.793β€ ++ train_dt/max + β
β -- train_dt/min ++ + + + β
β Β·Β· train_dt/mean +++ + + + ++ ++ + ++ + + β
β ββ train_dt +++++ ++ ++ ++ + +++ ++++++ β
0.777β€ + ++ +β++ + ++++β +++ ++ + + ++ + +++ ++++++ + β
β ++ ++ +β++++ +++ββ ++ + + ++ ++ +β + +++ + ++++ ++ β
β +++ ++ +β+++β ++ ββ+ + + ++ +++ ββ + +++ + β+++++++ β
β++++ ++ +β+++β +β ββ+ + + ++ +++ ββ + +++ +ββ+++ + β
0.761β€+++ ++ + +ββ ++β +βΒ·ββ+ + + β ++ +++ ββ + ++β +ββ ++ +++ ββ
β+++ ++Β·Β· +ββ ++β +βΒ·ββ+++ + Β·β ++ + ++ββ + +ββ +ββ + + + +ββ
β+++ ++Β·Β· βββ +ββ + βββΒ·ββ+++ +Β· Β· β+ + + ++ββ + Β·ββ+ ββ β Β·Β·+ ++ββ
β+++Β·Β·+Β·Β· βββ Β·ββ β+++βββΒ·ββ +++ +β Β·Β· β+ + +Β·++ββ+βΒ·Β·ββ+Β·ββΒ·ββ Β·Β·Β·Β·+++β β
0.745β€++Β· Β·+Β·Β· βββΒ·Β·ββ+βΒ· βββ ββ Β·Β· ββ Β· ββββ+ Β· ββΒ·β+β Β·ββ+βββΒ·ββΒ· Β·β+βββΒ·β
ββ+β Β·Β· Β·+βββ ββΒ·βΒ·βΒ·βββ ββΒ·Β·βΒ·+ββ ββββ +Β·βββ βββ Β·βββββββΒ·Β·β βββΒ·βββ β
ββ+β Β·β Β·Β·βββ βββΒ·βββ βββ βββββΒ·Β·ββ βββββΒ·ββββ βββ βββββββ β βββ βββ β
ββΒ·β βββ β-ββ β-βββββββββ β ββββ ββ βββ-βββββββ βββ βββββ β β β--ββββ β
ββΒ·βββββββ ββββ ββ β β ββ-β -ββββ β β- ββββββ-- -β βββ- β -β β--ββ-β-β
0.729β€ββββββ--β -ββ β β β-ββ-β β ββ β-- -- ββββ-- -ββ-β - -β--βββ -ββ β
βββββ - -- β - -β ββ β-- -- ββββ-- -β--- ---β--βββ β β
βββββ - - β ββββ-- - βββ - -β--- -----β β
βββ- -β - ββ β - - - β
0.713β€ββ- β β β
βββ β
βββ β
β-β β
0.697β€ - β
ββ¬ββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββ¬β
1.0 23.5 46.0 68.5 91.0
text saved in /lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/outputs/2025-12-31-121308/2025-12-31-121642/plots/tplot/train_dt_summary.txt
train_dt/mean hist train_dt/max hist
ββββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββ
22.0β€ ββββ β23.0β€ βββ β
18.3β€ ββββ β19.2β€ βββ β
β ββββββββ β β βββ β
14.7β€ ββββββββββββββ β15.3β€ βββ β
11.0β€ ββββββββββββββ β11.5β€ βββ ββββ ββββ β
β βββββββββββββββββ β β βββ ββββ ββββ β
7.3β€ βββββββββββββββββ β 7.7β€ ββββββββββββββββββββββββ β
3.7β€ ββββββββββββββββββββββββ β 3.8β€ ββββββββββββββββββββββββββββββββ
β ββββββββββββββββββββββββββββ β ββββββββββββββββββββββββββββββββ
0.0β€βββ ββββββββββββββββββββββββββββ 0.0β€βββββββββββββββββββββββββββββββββββ
ββ¬ββββββββ¬βββββββββ¬ββββββββ¬ββββββββ¬β ββ¬ββββββββ¬βββββββββ¬ββββββββ¬ββββββββ¬β
0.722 0.734 0.745 0.756 0.768 0.730 0.746 0.763 0.779 0.795
train_dt/min hist train_dt/std hist
ββββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββ
29.0β€ βββ β26.0β€ ββββ β
β βββ β β ββββ β
24.2β€ βββββββ β21.7β€ ββββ β
19.3β€ βββββββ β17.3β€ ββββ β
β βββββββ β β ββββ β
14.5β€ βββββββ β13.0β€ ββββ ββββββββ
β ββββ βββββββ β β ββββ ββββββββ
9.7β€ ββββββββββββββ β 8.7β€ ββββ ββββ ββββ βββββββββββ
4.8β€ βββββββββββββββββββββ β 4.3β€ ββββββββββββββββββββββββββββββββ
β βββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββ
0.0β€βββ βββββββββββββββββββββββββ 0.0β€βββββββββββββββββββββββββββββββββββ
ββ¬ββββββββ¬βββββββββ¬ββββββββ¬ββββββββ¬β ββ¬ββββββββ¬βββββββββ¬ββββββββ¬βββββββββ
0.695 0.709 0.722 0.736 0.749 -0.0010 0.0051 0.0113 0.0174
text saved in /lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/outputs/2025-12-31-121308/2025-12-31-121642/plots/tplot/train_dt_hist.txt
train_dtb train_dtb/min
ββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββ
0.021110β€ β β β0.020440β€ -- - -------β
0.021031β€ β βββ β β0.020336β€ - ------ ------------------β
β β ββββ ββ β0.020233β€----------------- -- -- β
0.020952β€ β β ββ β ββββ βββ β0.020129β€ - -- - - - β
0.020873β€ βββ βββ ββ βββββββββββββββ ββ ββ¬βββββββ¬ββββββββ¬βββββββ¬βββββββ¬β
0.020793β€β βββββββββββββββββββββββ βββββ 1.0 23.5 46.0 68.5 91.0
βββββββ ββ βββββ β ββ β βββ βtrain_dtb/min iter
0.020714β€βββ β β ββ ββ β β ββ β train_dtb/std
0.020635β€ β ββ β β ββββββββββββββββββββββββββββββββ
ββ¬βββββββ¬ββββββββ¬βββββββ¬βββββββ¬β0.000479β€ * * β
1.0 23.5 46.0 68.5 91.0 0.000431β€*** ** *** * * * β
train_dtb iter 0.000335β€*** * *** ******* * * ****β
train_dtb/mean 0.000288β€******************************β
ββββββββββββββββββββββββββββββββ0.000192β€ * ** * ** **β
0.020960β€ Β· β ββ¬βββββββ¬ββββββββ¬βββββββ¬βββββββ¬β
0.020900β€ Β· β 1.0 23.5 46.0 68.5 91.0
β Β·Β·Β· Β· βtrain_dtb/std iter
0.020841β€ Β·Β·Β·Β·Β· Β· Β·Β·β train_dtb/max
0.020781β€ Β· Β· Β· Β·Β·Β·Β·Β·Β·Β· Β· Β· Β·Β·Β·Β·β βββββββββββββββββββββββββββββββββ
βΒ·Β·Β· Β· Β·Β·Β·Β·Β· Β·Β·Β·Β·Β·Β·Β· Β·Β·Β·Β·Β·Β· Β·Β·β0.02236β€ + + β
0.020722β€Β·Β·Β· Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β· Β·Β·Β· Β·Β·Β·Β·Β· Β·β0.02214β€ + + +++ ++ + ++ β
0.020662β€ Β· Β·Β·Β·Β· Β·Β·Β·Β·Β· Β· Β· Β· Β·Β·Β· β0.02170β€+++ + + ++ ++++ ++ ++ ++++β
β Β·Β·Β· Β· Β· β0.02148β€++++++++++++++++++ ++++++++++++β
0.020603β€ Β· β0.02104β€++ ++ + + +++ + +++ + +++ +β
ββ¬βββββββ¬ββββββββ¬βββββββ¬βββββββ¬β ββ¬ββββββββ¬βββββββ¬ββββββββ¬βββββββ¬β
1.0 23.5 46.0 68.5 91.0 1.0 23.5 46.0 68.5 91.0
train_dtb/mean iter train_dtb/max iter
text saved in /lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/outputs/2025-12-31-121308/2025-12-31-121642/plots/tplot/train_dtb.txt
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
0.02236β€ ++ train_dtb/max + β
β -- train_dtb/min ++ β
β Β·Β· train_dtb/mean ++ + β
β ββ train_dtb ++ + ++ + β
0.02199β€ ++++ + ++ + ++ β
β + + ++++++ ++ ++ ++ + β
β + ++ + + ++++++ ++ ++ ++ +++ β
β ++ ++ ++ + + ++++++ ++ ++ + ++ +++ β
0.02162β€ ++ ++ + ++ ++ + + ++ ++ ++ ++ +++ ++ β
β+++ ++ + + + ++ +++++ ++ ++ + ++ ++ +++ ++ β
β+++ ++ ++ ++ + + + + +++++ ++ ++ ++ ++ ++ +++ ++ β
β+++ ++ ++ +++ + + ++ + +++++++ ++ + +++ + +++ + ++ + ++ β
0.02124β€++++ + ++ + +++ + +++ + +++ + + ++ +++ + +++ + ++ ++ β
β++++ + +++++ + + + + + + + + + ++ ++ ++ + ++ β
β + + ++ + + β + ++ β + ++ +β
β β β ββ β + βββ β
β β β ββ β β Β· ββ βββ ββ β ββ β β β
0.02087β€ ββββββ ββββββ ββββββ β ββββββββββββββΒ·βββββ β ββββββββ ββββββββββ
βΒ·ββββββ ββββββββΒ·ββββΒ·ββΒ· Β·ββββ βββ βββΒ·Β· ββ Β·Β·Β·βΒ·βΒ· Β·ββΒ·Β·βΒ·Β· ββββββ Β·β
ββΒ·βΒ· Β· Β·Β·Β·Β·β Β·Β· βΒ· Β·Β· Β·Β·βΒ·ββ β Β· Β· Β· β Β·Β· Β· β β
β Β· β
0.02050β€ β
β - - - ----- - -- -- --- - ---- β
β- - ---------- - ------ - ---- --- - -- - - ---- --- -- -- - -β
β -- -- - - - --- - - ---- - -- - -- ---- - β
0.02013β€ -- - - - - - β
ββ¬ββββββββββββββββββ¬βββββββββββββββββ¬ββββββββββββββββββ¬βββββββββββββββββ¬β
1.0 23.5 46.0 68.5 91.0
text saved in /lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/outputs/2025-12-31-121308/2025-12-31-121642/plots/tplot/train_dtb_summary.txt
train_dtb/mean hist train_dtb/max hist
ββββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββ
23.0β€ βββ β24β€ βββ β
19.2β€ βββ ββββ β20β€ βββββββ β
β βββ ββββ β β βββββββ β
15.3β€ ββββββββββ β16β€βββββββββββ β
11.5β€ ββββββββββββββ β12β€βββββββββββ β
β ββββββββββββββ β ββββββββββββ β
7.7β€ ββββββββββββββββββ β 8β€ββββββββββββββββββ β
3.8β€ ββββββββββββββββββββββββ β 4β€ββββββββββββββββββ ββββ β
β ββββββββββββββββββββββββ β βββββββββββββββββββββββββββββββββ β
0.0β€βββββββββββββββββββββββββββββββββββ 0β€βββββββββββββββββββββββββββββββββββββ
ββ¬ββββββββ¬βββββββββ¬ββββββββ¬βββββββββ ββ¬βββββββββ¬βββββββββ¬ββββββββ¬ββββββββββ
0.02059 0.02068 0.02078 0.02088 0.02098 0.02134 0.02170 0.02206
train_dtb/min hist train_dtb/std hist
ββββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββ
21.0β€ ββββ β21.0β€ ββββ β
β ββββ β β βββββββ β
17.5β€ ββββ β17.5β€ βββββββββββ β
14.0β€ ββββ ββββ β14.0β€ βββββββββββ β
β ββββ ββββ β β βββββββββββ β
10.5β€ ββββ ββββ β10.5β€ βββββββββββ β
β ββββββββββββββββββββ βββββ ββββββββββββββββββββββ β
7.0β€ ββββββββββββββββββββ βββββ 7.0β€βββββββββββββββββββββ β
3.5β€βββββββββββββββββββββββββββ βββββ 3.5β€ββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββ ββββββββ
0.0β€βββββββββββββββββββββββββββββββββββ 0.0β€βββββββββββββββββββββββββββββββββββ
ββ¬ββββββββ¬βββββββββ¬ββββββββ¬βββββββββ ββ¬ββββββββ¬βββββββββ¬ββββββββ¬βββββββββ
0.02011 0.02020 0.02028 0.02037 0.000179 0.000257 0.000335 0.000413
text saved in /lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/outputs/2025-12-31-121308/2025-12-31-121642/plots/tplot/train_dtb_hist.txt
train_dtd train_dtd/min
ββββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββ
0.0122β€ β β0.00312β€- β
0.0104β€ β β0.00259β€- β
β β β0.00206β€-- β
0.0086β€ β β0.00153β€-------------------------------β
0.0069β€ β β ββ¬ββββββββ¬βββββββ¬ββββββββ¬βββββββ¬β
0.0051β€ β β β 1.0 23.5 46.0 68.5 91.0
β β β β β β βtrain_dtd/min iter
0.0034β€β β β β β β β β train_dtd/std
0.0016β€βββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββ
ββ¬ββββββββ¬ββββββββ¬βββββββ¬ββββββββ¬β0.00207β€ * β
1.0 23.5 46.0 68.5 91.0 0.00174β€ * **** *** * * *** ***β
train_dtd iter 0.00110β€********** **** ***************β
train_dtd/mean 0.00077β€******** **** **** ********* **β
βββββββββββββββββββββββββββββββββ0.00013β€** * ** *** **** ** ***** β
0.00410β€Β·Β· β ββ¬ββββββββ¬βββββββ¬ββββββββ¬βββββββ¬β
0.00371β€Β·Β· β 1.0 23.5 46.0 68.5 91.0
βΒ·Β· Β· Β· βtrain_dtd/std iter
0.00332β€Β·Β·Β·Β·Β· Β·Β·Β· Β·Β· Β·Β· Β· Β· Β· β train_dtd/max
0.00293β€Β·Β·Β·Β·Β·Β·Β·Β·Β·Β· Β·Β· Β·Β·Β· Β·Β·Β·Β· Β·Β·Β·β ββββββββββββββββββββββββββββββββββ
βΒ·Β·Β·Β·Β·Β·Β·Β· Β· Β·Β·Β· Β·Β·Β·Β·Β·Β·Β·Β· Β·Β·Β·Β·Β·β0.0122β€ + β
0.00254β€Β·Β·Β·Β·Β·Β·Β·Β· Β· Β·Β· Β· Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β· Β·Β·β0.0105β€ + β
0.00215β€Β·Β·Β·Β·Β·Β·Β· Β·Β·Β·Β· Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β· Β·β0.0071β€ + β
βΒ·Β·Β·Β·Β·Β·Β· Β·Β·Β· Β·Β·Β·Β· Β·Β·Β·Β·Β·Β·Β·Β·Β· Β·β0.0054β€+++++++++++++++ ++++++++++++++++β
0.00176β€Β·Β· Β· Β·Β· Β·Β·Β· Β·Β·Β·Β· Β·Β· Β·Β·Β·Β·Β· β0.0020β€++++++++ +++ +++++++++++++++ +β
ββ¬ββββββββ¬βββββββ¬ββββββββ¬βββββββ¬β ββ¬ββββββββ¬ββββββββ¬βββββββ¬ββββββββ¬β
1.0 23.5 46.0 68.5 91.0 1.0 23.5 46.0 68.5 91.0
train_dtd/mean iter train_dtd/max iter
text saved in /lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/outputs/2025-12-31-121308/2025-12-31-121642/plots/tplot/train_dtd.txt
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
0.0122β€ ++ train_dtd/max ββ β
β -- train_dtd/min ββ β
β Β·Β· train_dtd/mean ββ β
β ββ train_dtd ββ β
0.0104β€ ββ β
β ββ β
β ββ β
β ββ β
0.0086β€ ββ β
β ββ β
β ββ β
β ββ β
0.0068β€ ββ β
β ββ β
β ββ β
β ββ β
β+ + + + + + +++ ββ ++ ++ + +++ + ++ + + ++ + + β
0.0051β€+++ ++ ++ ββ+ +++ + ββ++++ + +++ + ++++ + ++++ + + ++++ β
β+++ +++ ++ ββ+ + ++ + ββ++ + + +++ ++++++ + + + + β++ + ++ β
βΒ·+Β·+ ++ ++ ββ+ + + + ββ++ + +++ ++++++ + + +β+ +β++ β+ ++ β
βΒ·+Β·+ +++ Β· ββ+ + + + ββ++Β· + + +++ ++++++ + + +β+ ++β+++β ++ β
0.0033β€β+Β·+ Β·++Β·Β·+ββΒ·++Β·Β·Β·Β·Β·Β·+++ββ+Β·Β· Β·Β· +++ +Β·+Β·Β·Β·+++Β·+Β·Β· + +β+ ++β+++βΒ·Β· Β· Β·+β
ββΒ·Β·+Β·Β·++Β·Β·+ββΒ·+Β· Β· Β·+++ββΒ·Β· Β· Β·++++ Β·Β·Β· Β·+βΒ·Β·Β· Β· +Β·Β·ββ +ββ+Β·ββΒ· Β· Β·Β· β
ββΒ·Β·+Β·Β·Β·+Β·Β· ββΒ·Β·Β· Β· Β·+Β·Β·ββΒ·Β· Β·+++Β· Β·Β·Β· Β·Β·βΒ·Β·Β· Β· Β· Β·ββ++ββΒ·Β·ββΒ· Β· β
ββΒ·-Β· Β·Β· Β·Β·ββΒ· Β· Β· Β·ββΒ·Β· Β·Β·Β·βΒ·Β· Β· ββΒ· Β· Β·Β· Β·ββΒ·Β·ββΒ·Β·ββΒ· βΒ·β
0.0015β€ βββββββββββββββββββββββββ-ββββββββββββββββββ-ββββββββββββββ-ββββββββββββ
ββ¬ββββββββββββββββββ¬ββββββββββββββββββ¬βββββββββββββββββ¬ββββββββββββββββββ¬β
1.0 23.5 46.0 68.5 91.0
text saved in /lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/outputs/2025-12-31-121308/2025-12-31-121642/plots/tplot/train_dtd_summary.txt
train_dtd/mean hist train_dtd/max hist
ββββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββ
27.0β€ββββ β31.0β€ ββββ β
22.5β€ββββ β25.8β€ββββ ββββ β
βββββ β βββββ ββββ β
18.0β€ββββ β20.7β€ββββ ββββ β
13.5β€ββββ β15.5β€ββββ βββββββ β
ββββββββ ββββ β βββββββββββββββ β
9.0β€βββββββ ββββββββββββββ β10.3β€ββββββββββββββ β
4.5β€ββββββββββββββββββββββββ β 5.2β€ββββββββββββββ β
ββββββββββββββββββββββββββββ βββββ βββββββββββββββ β
0.0β€βββββββββββββββββββββββββββββββββββ 0.0β€βββββββββββββ βββββ
ββ¬ββββββββ¬βββββββββ¬ββββββββ¬βββββββββ ββ¬ββββββββ¬βββββββββ¬ββββββββ¬βββββββββ
0.00166 0.00230 0.00293 0.00357 0.0016 0.0043 0.0071 0.0098
train_dtd/min hist train_dtd/std hist
ββββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββ
88.0β€ββββ β22.0β€ββββ ββββ β
βββββ β βββββ ββββ β
73.3β€ββββ β18.3β€ββββ ββββ β
58.7β€ββββ β14.7β€ββββ ββββ β
βββββ β βββββ βββββββ β
44.0β€ββββ β11.0β€ββββ βββ βββββββ β
βββββ β βββββββββββ βββββββββββ β
29.3β€ββββ β 7.3β€ββββββββββ βββββββββββ β
14.7β€ββββ β 3.7β€ββββββββββββββββββββββββ β
βββββ β βββββββββββββββββββββββββ β
0.0β€βββββββ βββββ 0.0β€ββββββββββββββββββββββββ βββββ
ββ¬ββββββββ¬βββββββββ¬ββββββββ¬βββββββββ ββ¬ββββββββ¬βββββββββ¬ββββββββ¬βββββββββ
0.00146 0.00189 0.00233 0.00276 0.00004 0.00057 0.00110 0.00163
text saved in /lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/outputs/2025-12-31-121308/2025-12-31-121642/plots/tplot/train_dtd_hist.txt
train_dtf train_dtf/min
βββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββ
0.02730β€ β β ββ0.02219β€ -- - - - --- β
0.02635β€ β β ββββ0.02173β€ -----------------------------β
β ββ β ββββ0.02128β€-------- ---- --- -------------β
0.02540β€ βββ ββ β β β β β ββββ0.02083β€- - - β
0.02445β€ βββ βββββ β β βββββββ ββββ ββ¬ββββββββ¬βββββββ¬ββββββββ¬βββββββ¬β
0.02350β€ ββββ βββββββ βββββββββββ ββββ 1.0 23.5 46.0 68.5 91.0
βββββββββββββββββββββββββ βββββ βtrain_dtf/min iter
0.02254β€ββββββββββ β βββββββββββ βββ β train_dtf/std
0.02159β€β β β βββββββββββββββββββββββββββββββββ
ββ¬ββββββββ¬βββββββ¬ββββββββ¬βββββββ¬β0.00132β€ ** * * * * **β
1.0 23.5 46.0 68.5 91.0 0.00113β€ *** **** ** ********** ***β
train_dtf iter 0.00075β€ ******************************β
train_dtf/mean 0.00056β€***** *** ****** *** * * * * β
βββββββββββββββββββββββββββββββββ0.00019β€* * β
0.02330β€ Β· β ββ¬ββββββββ¬βββββββ¬ββββββββ¬βββββββ¬β
0.02294β€Β·Β· Β· Β· Β·Β· Β· Β· Β·β 1.0 23.5 46.0 68.5 91.0
βΒ·Β·Β·Β· Β· Β·Β·Β·Β·Β·Β· Β·Β·Β·Β·Β· Β·Β·Β·Β·Β·Β·Β·Β·βtrain_dtf/std iter
0.02258β€Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β· β train_dtf/max
0.02222β€Β·Β·Β·Β·Β· Β· Β· Β·Β·Β· Β·Β· Β·Β·Β·Β· Β· β βββββββββββββββββββββββββββββββββ
βΒ·Β·Β· Β· β0.02730β€ + + +β
0.02186β€Β· β0.02635β€ ++ + + ++ + +++++β
0.02150β€Β· β0.02445β€ ++++ ++++++++++++++++++++++++ β
βΒ· β0.02350β€++++ + + +++ + + + + + β
0.02114β€Β· β0.02159β€+ β
ββ¬ββββββββ¬βββββββ¬ββββββββ¬βββββββ¬β ββ¬ββββββββ¬βββββββ¬ββββββββ¬βββββββ¬β
1.0 23.5 46.0 68.5 91.0 1.0 23.5 46.0 68.5 91.0
train_dtf/mean iter train_dtf/max iter
text saved in /lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/outputs/2025-12-31-121308/2025-12-31-121642/plots/tplot/train_dtf.txt
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
0.0273β€ ++ train_dtf/max β
β -- train_dtf/min ββ ββ
β Β·Β· train_dtf/mean ββ +ββ
β ββ train_dtf ββ +ββ
0.0262β€ β+ β ββ ββ+ββ
β β+ β ββ ββ+ββ
β βββ+ β ββ ββ+ββ
β ββββ+ β + ββ βββββ
0.0251β€ ββββ+ β + + β ++ + ββ βββ β
β βββββ+ ββββ ++ β+ +β β +β+ +β ++ ββ β+β β
β + +βββββ+ β β + β + +ββ+ ++β + β ββ + ββ ++ ββ β β β
β ++++ ++βββββ+ + β β+ ββ++ + ++ββ++ +ββ+ β +β ββ + ββ+++ ββ β β β
0.0241β€ ++ + ++βββββ β+ ββ+β β+ ββ++ β +++ββ++ +ββ+ β+βββ+ββ + ββ+ββ++ββ+β β β
β ++ + +ββββββββ++ββ+β β+ ββ + β++++ββ βββββ+β+β+β+ββ ++ββ+ββ ββ β β β
β ++ + +β ββββββ +ββ β β+ ββ +β+ ++ββββββββββ+ββ+β β βββββ+ββ ββ β β
β +Β· + ++β ββββββ ββ β β+ ββ ββ ββββββββββ ββ+β β ββββββββ ββ β β
β ββ + ββ ββββββ ββββ β+ βββββββ ββ βββββββββ ββ+β β ββ β ββββ β
0.0230β€+ββ ββ ββ ββββββ ββββΒ· Β·ββββββββββ ββ ββ ββββ ββ βββΒ·βΒ· Β· ββΒ·ββββΒ·Β·Β·β
β+ββΒ·ββ ββΒ·ββββββΒ·ββββΒ·Β·Β·Β·βββΒ·Β·βΒ·ββΒ·βΒ· ββΒ·ββββΒ· Β·ββΒ·βββΒ·βΒ· Β·Β·Β·Β·Β·ββββββΒ· β
β+ββΒ·ββββ Β·β ββΒ·βΒ·βΒ· Β·Β· Β·βββ Β·ββ ββ Β·Β·Β·βΒ· ββΒ· Β·Β· Β· βββ β β
ββΒ·Β·β Β·Β· β - - β Β·ββ β β β
0.0219β€β - -- --- - -- - - - -- -- - -- -- β
ββ ----- - - - - --- --- ---- --------- -- - ---- -- --- - ---- β
βΒ·-- - - -- --- - - --- - - - - -- --- - -- - -- -- - -β
βΒ· - - - β
0.0208β€- β
ββ¬ββββββββββββββββββ¬ββββββββββββββββββ¬βββββββββββββββββ¬ββββββββββββββββββ¬β
1.0 23.5 46.0 68.5 91.0
text saved in /lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/outputs/2025-12-31-121308/2025-12-31-121642/plots/tplot/train_dtf_summary.txt
train_dtf/mean hist train_dtf/max hist
ββββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββ
37.0β€ ββββ β24β€ ββββββββ β
30.8β€ ββββ β20β€ ββββββββ β
β ββββ β β βββββββββββ β
24.7β€ ββββ β16β€ βββββββββββ β
18.5β€ βββββββ β12β€ ββββββββββββββ β
β ββββββββββ β β ββββββββββββββ β
12.3β€ ββββββββββββββ β 8β€ ββββββββββββββ β
6.2β€ ββββββββββββββ β 4β€ ββββββββββββββββββ β
β ββββββββββββββββββ β β ββββββββββββββββββ ββββββββ
0.0β€βββ βββββββββββββββββββββββββ 0β€ββββ ββββββββββββββββββββββββββββββ
ββ¬ββββββββ¬βββββββββ¬ββββββββ¬βββββββββ ββ¬βββββββββ¬βββββββββ¬ββββββββ¬βββββββββ¬β
0.02104 0.02163 0.02222 0.02281 0.0213 0.0229 0.0244 0.0260 0.0276
train_dtf/min hist train_dtf/std hist
ββββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββ
23.0β€ ββββ β19.0β€ ββββ β
β ββββ β β βββββββ β
19.2β€ ββββ β15.8β€ βββββββββββ β
15.3β€ βββββββββββ β12.7β€ βββββββββββ β
β ββββββββββββββ β β βββββββββββ β
11.5β€ ββββββββββββββ β 9.5β€ ββββββββββββββ βββ β
β ββββββββββββββ β β ββββββββββββββ βββ β
7.7β€ ββββββββββββββ ββββ β 6.3β€ ββββββββββββββββββββββββββββ
3.8β€ βββββββββββββββββββββ β 3.2β€ ββββββββββββββββββββββββββββ
β ββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββ
0.0β€βββββββββββββββββββββββββββββββββββ 0.0β€βββββββββββββββββββββββββββββββββββ
ββ¬ββββββββ¬βββββββββ¬ββββββββ¬βββββββββ ββ¬ββββββββ¬βββββββββ¬ββββββββ¬βββββββββ
0.02077 0.02114 0.02151 0.02188 0.00014 0.00044 0.00075 0.00106
text saved in /lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/outputs/2025-12-31-121308/2025-12-31-121642/plots/tplot/train_dtf_hist.txt
train_dto train_dto/min
βββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββ
0.743β€ ββ β0.6998β€-- - - -- β
0.729β€ ββ β β β0.6840β€--------------------------------β
β ββββ ββ β ββ β β0.6683β€-- - - - --- --- ------ β
0.715β€ ββββ ββ β ββ β β β ββ0.6526β€- β
0.700β€ββ ββββββββββββ ββ ββββββββββββββ ββ¬ββββββββ¬ββββββββ¬βββββββ¬ββββββββ¬β
0.686β€ββββββββββββββββββββββββββββββββββ 1.0 23.5 46.0 68.5 91.0
βββββ ββββ βββ βββ ββ ββββββββ βtrain_dto/min iter
0.672β€β β ββ ββ β β β train_dto/std
0.657β€β β ββββββββββββββββββββββββββββββββββ
ββ¬ββββββββ¬ββββββββ¬ββββββββ¬ββββββββ¬β0.0228β€* * ** ******** β
1.0 23.5 46.0 68.5 91.0 0.0192β€** ***** **** *************** β
train_dto iter 0.0119β€******** ** * * ******** **** *β
train_dto/mean 0.0082β€***** **** *** ******** ****β
ββββββββββββββββββββββββββββββββββ0.0010β€*** * **** ** ****** * ** β
0.7200β€ Β· Β· β ββ¬ββββββββ¬ββββββββ¬βββββββ¬ββββββββ¬β
0.7128β€ Β·Β· Β·Β· β 1.0 23.5 46.0 68.5 91.0
β Β·Β·Β· Β· Β·Β· Β· Β· Β· Β· Β·Β· βtrain_dto/std iter
0.7056β€ Β·Β·Β·Β·Β·Β·Β· Β·Β·Β· Β·Β·Β·Β· Β· Β·Β·Β·Β·Β·Β·Β·Β·Β· β train_dto/max
0.6985β€Β·Β·Β·Β·Β·Β·Β·Β·Β·Β· Β·Β·Β·Β· Β·Β·Β·Β·Β·Β·Β·Β·Β· Β·Β·Β·Β·Β·Β·β βββββββββββββββββββββββββββββββββββ
βΒ·Β·Β·Β·Β· Β·Β·Β·Β· Β·Β·Β· Β·Β·Β·Β·Β·Β·Β·Β· Β·Β·Β·Β·Β·Β·β0.747β€ + +++ + ++ ++ β
0.6913β€Β·Β·Β·Β· Β·Β· Β·Β· Β·Β·Β·Β·Β· Β·Β· β0.737β€++ + +++ +++ ++++ ++++ +++ + β
0.6842β€Β·Β· β0.716β€+++++++++ ++++ + ++ ++++++ ++++ +β
βΒ·Β· β0.706β€+++ + ++++ +++ ++++++ ++ +++β
0.6770β€ Β· β0.685β€ + β
ββ¬ββββββββ¬ββββββββ¬βββββββ¬ββββββββ¬β ββ¬ββββββββ¬ββββββββ¬ββββββββ¬ββββββββ¬β
1.0 23.5 46.0 68.5 91.0 1.0 23.5 46.0 68.5 91.0
train_dto/mean iter train_dto/max iter
text saved in /lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/outputs/2025-12-31-121308/2025-12-31-121642/plots/tplot/train_dto.txt
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
0.747β€ ++ train_dto/max + β
β -- train_dto/min ++ + + β
β Β·Β· train_dto/mean +++ + + + ++ ++ ++ + + β
β ββ train_dto +++++ ++ ++ ++ + + + ++++++ β
0.731β€ + ++ +β++ + +++ββ +++ ++ + ++ + +++ ++++++ + β
β ++ ++ +β++++ +++ββ ++++ + + ++ +β + +++ + ++++ ++ β
β +++ ++ +β+++β ++ ββ+ + + + ++ +++ ββ + +++ + β+++++++ β
β++++ ++ +β+++β + ββ+ + + ++ +++ ββ + +++ +ββ+++ + β
0.716β€+++ ++ + + β+++β +βΒ·ββ+ + + β ++ +++ ββ + +++ +ββ ++ +++ +β
β+++ ++Β·Β· +ββ ++β +βΒ·ββ+++ + Β·β ++ + ++ββ + +ββ +ββ + + + +ββ
β+++ ++Β·Β· +ββ +ββ + ββΒ·ββ+++ +Β· Β· β+ + + ++ββ +Β·Β·ββ+ ββ β Β· + +ββ
β+++Β·Β·+Β·Β· βββ Β·ββ ++ βββΒ·ββ +++ +Β· Β·Β· β+ + +Β·++ββ+Β· Β·ββ+Β·ββΒ·ββ Β·Β·Β·Β·+ +βββ
0.700β€++Β· Β·+Β·Β· βββΒ·Β·ββ+β+++βββ ββ Β·+ ββ Β· ββββ+ Β·Β·Β·+ββ+β Β·ββ+βββΒ·ββΒ· Β·β+++βΒ·β
ββ+β Β·Β· Β·+βββΒ·Β·ββ+βΒ· Β·βββ ββΒ·Β·Β·Β·+ββ ββββ++Β·βββ βββ Β·ββββββ βββ βββΒ·βββ β
ββ+β Β·β Β·Β·βββ βββΒ·βΒ·β βββ ββ ββΒ· ββ βββββ+Β·βββ βββ βββββββ β βββΒ·βββ β
ββΒ·βββββ β-ββ β-βββββββββ ββ ββ-Β·ββ βββ βββββββ βββ βββββββ β βββββββ β
ββΒ·βββββββ ββ β ββ β ββββ-β -ββββ β β- ββββββ - -β βββ-β β β β-βββββ-β
0.684β€ββββββ -β βββ β β β-ββ-β ββββ β-- -- ββββ-- -ββ-ββ- -β -β β -ββ β
βββββ - -ββ β -β ββ β-- -- ββββ-- -β--β ---β--βββ β β
βββββ -- - β ββββ-- - βββ - -β--- -----β β
βββ- ββ - ββ β - - - β
0.668β€ββ β β β
βββ β
βββ β
βββ β
0.653β€ - β
ββ¬ββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββ¬β
1.0 23.5 46.0 68.5 91.0
text saved in /lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/outputs/2025-12-31-121308/2025-12-31-121642/plots/tplot/train_dto_summary.txt
train_dto/mean hist train_dto/max hist
ββββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββ
20.0β€ ββββ β25.0β€ βββ β
16.7β€ ββββ ββββ β20.8β€ βββ β
β ββββ βββββββ β β βββ β
13.3β€ ββββββββββββββ β16.7β€ βββ β
10.0β€ βββββββββββββββββ β12.5β€ βββ ββββ β
β βββββββββββββββββ β β βββ ββββ ββββ ββββ β
6.7β€ βββββββββββββββββββββ β 8.3β€ ββββββββββββββββββββββββ β
3.3β€ βββββββββββββββββββββ β 4.2β€ ββββββββββββββββββββββββββββ
β βββββββββββββββββββββββββ β ββββββββββββββββββββββββββββββββ
0.0β€βββ ββββββββββββββββββββββββββββ 0.0β€βββββββββββββββββββββββββββββββββββ
ββ¬ββββββββ¬βββββββββ¬ββββββββ¬ββββββββ¬β ββ¬ββββββββ¬βββββββββ¬ββββββββ¬ββββββββ¬β
0.675 0.687 0.698 0.710 0.722 0.682 0.699 0.716 0.733 0.750
train_dto/min hist train_dto/std hist
ββββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββ
28.0β€ βββ β20.0β€ ββββ β
β βββ β β ββββ β
23.3β€ βββ β16.7β€ ββββ β
18.7β€ βββββββ β13.3β€ ββββ β
β βββββββ β β ββββ ββββ β
14.0β€ ββββ βββββββ β10.0β€βββββββ ββββ ββββββββ
β ββββββββββββββββββ β βββββββββββ ββββββββββββββββββ
9.3β€ ββββββββββββββββββ β 6.7β€ββββββββββ ββββββββββββββββββ
4.7β€ ββββββββββββββββββ β 3.3β€βββββββββββββββββββββββββββββββββββ
β βββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββ
0.0β€βββ βββββββββββββββββββββββββ 0.0β€βββββββββββββββββββββββββββββββββββ
ββ¬ββββββββ¬βββββββββ¬ββββββββ¬ββββββββ¬β ββ¬ββββββββ¬βββββββββ¬ββββββββ¬βββββββββ
0.650 0.663 0.676 0.689 0.702 -0.0000 0.0059 0.0119 0.0178
text saved in /lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/outputs/2025-12-31-121308/2025-12-31-121642/plots/tplot/train_dto_hist.txt
train_loss train_loss/min
βββββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββ
7.115β€ββ β6.963β€ -- - --- β
7.078β€ββ β6.920β€-- --------------------- ---- - -β
ββββ β β6.876β€ - -- -- ---- -------- β
7.042β€ββββ ββ βββ ββ ββ β6.833β€ - - β
7.006β€ββββββ βββββββββββ βββββ ββββ ββ ββ¬ββββββββ¬ββββββββ¬ββββββββ¬ββββββββ¬β
6.969β€β βββββββββββββββββββββββββββββββ 1.0 23.5 46.0 68.5 91.0
β ββ β β ββββββ ββββββββββ βββ βtrain_loss/min iter
6.933β€ β ββ ββ ββββ βββ β train_loss/std
6.897β€ β β β ββββββββββββββββββββββββββββββββββ
ββ¬ββββββββ¬ββββββββ¬ββββββββ¬ββββββββ¬β0.0517β€* * β
1.0 23.5 46.0 68.5 91.0 0.0465β€**** * * ** β
train_loss iter 0.0362β€ ****************** * * ** * * β
train_loss/mean 0.0310β€ * **** * **************** β
βββββββββββββββββββββββββββββββββββ0.0207β€ * * * * *β
7.039β€Β·Β·Β· β ββ¬ββββββββ¬ββββββββ¬βββββββ¬ββββββββ¬β
7.023β€Β·Β·Β· β 1.0 23.5 46.0 68.5 91.0
βΒ·Β·Β· Β· βtrain_loss/std iter
7.007β€Β·Β· Β·Β·Β· Β· β train_loss/max
6.991β€ Β·Β·Β·Β·Β·Β·Β· Β· β βββββββββββββββββββββββββββββββββββ
β Β·Β·Β·Β·Β·Β·Β·Β· Β·Β·Β·Β· β7.148β€+ β
6.975β€ Β·Β· Β·Β·Β·Β·Β·Β·Β·Β·Β·Β· Β· Β· Β· β7.124β€++++++ β
6.959β€ Β· Β·Β· Β·Β·Β· Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·β7.074β€++ ++++++++++ + + + β
β Β· Β·Β· Β·Β·Β·β7.050β€ + + ++++++++++++++++++++ ++ β
6.942β€ Β· β7.000β€ + +++ + ++++ +β
ββ¬ββββββββ¬ββββββββ¬ββββββββ¬ββββββββ¬β ββ¬ββββββββ¬ββββββββ¬ββββββββ¬ββββββββ¬β
1.0 23.5 46.0 68.5 91.0 1.0 23.5 46.0 68.5 91.0
train_loss/mean iter train_loss/max iter
text saved in /lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/outputs/2025-12-31-121308/2025-12-31-121642/plots/tplot/train_loss.txt
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
7.148β€ ++ train_loss/max β
β -- train_loss/min β
β Β·Β· train_loss/mean β
β ββ train_loss β
7.096β€β β + +++ β
ββ+β + + ++ + β
ββ β β ++ + + + + + ++ + β
ββ β ββ ++ ++ ++ + + + + ++ +++ ++ ++ + β
7.043β€βββ ββ + +β + + +++ + +++++ ++ + +++ + + ++ β
ββββΒ·βΒ·ββ ββ +ββββ ++ + + ++++++ ++ + +β++ ++++ + ++ β
βΒ·ββ β ββ β β ββββ + ββ β++ ++ +β+ββ++ +++++++ + +++++ + β
β βΒ·ββ Β·β βββ β β ββββββ βββββββ β+ β βββ++ + β + + + β
6.991β€ β βΒ·β β Β·Β·Β·βΒ·β Β·βββββββΒ· ββββββββββ β βββββ β + ββ ββ β ββ
β β ββ ββΒ·ββΒ·βΒ·β ββββΒ·βΒ· βββββββββββ β βββΒ·ββΒ·βββ β β ββ βββ β βββ
β ββ ββββ ββ βΒ·ββΒ·β ββββββ ββΒ·Β·βΒ·βΒ·βββ ββΒ·βββββββΒ·ββ Β·ββββ ββββ β
β -ββ β -β ββ ββββββββ ββ β βΒ·βββ ββ ββββββββ βΒ·Β·Β·ββββββΒ·Β·Β·β
β ---- β- --β -- ββ ββββββ ββ ββββ β ββββββββ β β βΒ· β
6.938β€ -- -- -- -- ---β--- - - ββββ -- ββ ββ -β- ββ β - β β β
β--- --- -------β -- -- ββ-β - -- -β - - - ββ -- β β -β
β -- ----- ---β - - --- -- - --β--- -- -- --- --- -- - β
β - - --- -- -- - -- β - -- - -- -- - -- -- - - β
6.885β€ -- -- -- -- -- -- --- -- β
β -- -- -- - - -- - - β
β -- - - -- β
β -- - β
6.833β€ - β
ββ¬ββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββ¬β
1.0 23.5 46.0 68.5 91.0
text saved in /lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/outputs/2025-12-31-121308/2025-12-31-121642/plots/tplot/train_loss_summary.txt
train_loss/mean hist train_loss/max hist
ββββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββ
22.0β€ βββββββ β24β€ βββ β
18.3β€ βββββββ β20β€ βββ β
β βββββββ β β βββ β
14.7β€ βββββββ β16β€ βββββββββββ β
11.0β€ ββββββββββββββ β12β€ ββββββββββββββ β
β ββββββββββββββ β β ββββββββββββββ β
7.3β€ βββββββββββββββββββββ β 8β€ββββββββββββββββββ β
3.7β€ βββββββββββββββββββββ βββββ 4β€ββββββββββββββββββββββ ββββ β
β ββββββββββββββββββββββββ βββββ βββββββββββββββββββββββββββββββββ β
0.0β€βββββββββββββββββββββββββββββββββββ 0β€βββββββββββββββββββββββββββββββββββββ
ββ¬ββββββββ¬βββββββββ¬ββββββββ¬ββββββββ¬β ββ¬βββββββββ¬βββββββββ¬ββββββββ¬βββββββββ¬β
6.938 6.965 6.991 7.017 7.044 6.994 7.034 7.074 7.115 7.155
train_loss/min hist train_loss/std hist
ββββββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββ
24β€ ββββ β20.0β€ ββββ β
β ββββ β β βββ ββββ β
20β€ ββββββββ β16.7β€ βββ ββββ β
16β€ ββββββββ β13.3β€ βββ ββββ β
β βββββββββββ β β ββββββββββ ββββ β
12β€ βββββββββββββββ β10.0β€ βββββββββββββββββββββ β
β βββββββββββββββ β β βββββββββββββββββββββ β
8β€ ββββββββββββββββββ β 6.7β€ββββββββββββββββββββββββ β
4β€ βββββββββββββββββββββββββ β 3.3β€βββββββββββββββββββββββββββ β
βββββ ββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββ βββββ
0β€βββββββββββββββββββββββββββββββββββββ 0.0β€βββββββββββββββββββββββββββ βββββ
ββ¬βββββββββ¬βββββββββ¬ββββββββ¬βββββββββ¬β ββ¬ββββββββ¬βββββββββ¬ββββββββ¬βββββββββ
6.827 6.863 6.898 6.934 6.969 0.0193 0.0277 0.0362 0.0446
text saved in /lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/outputs/2025-12-31-121308/2025-12-31-121642/plots/tplot/train_loss_hist.txt
[2025-12-31 12:16:47,448186][W][ezpz/history:2320:save_dataset] Unable to save dataset to W&B, skipping!
[2025-12-31 12:16:47,449805][I][utils/__init__:651:dataset_to_h5pyfile] Saving dataset to: /lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/outputs/2025-12-31-121308/2025-12-31-121642/train_dataset.h5
[2025-12-31 12:16:47,467124][I][ezpz/history:2433:finalize] Saving history report to /lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/outputs/2025-12-31-121308/2025-12-31-121642/report.md
[2025-12-31 12:16:47,473135][I][examples/vit:463:train_fn] dataset=<xarray.Dataset> Size: 26kB
Dimensions: (draw: 91)
Coordinates:
* draw (draw) int64 728B 0 1 2 3 4 5 6 7 ... 84 85 86 87 88 89 90
Data variables: (12/35)
train_iter (draw) int64 728B 10 11 12 13 14 15 ... 95 96 97 98 99 100
train_loss (draw) float32 364B 7.112 7.033 6.982 ... 6.968 6.978 7.0
train_dt (draw) float64 728B 0.7444 0.7021 0.7178 ... 0.7456 0.7609
train_dtd (draw) float64 728B 0.003141 0.001719 ... 0.002238 0.001625
train_dtf (draw) float64 728B 0.02159 0.02241 ... 0.02394 0.02694
train_dto (draw) float64 728B 0.699 0.6572 0.672 ... 0.6986 0.7115
... ...
train_dto_min (draw) float64 728B 0.6972 0.6526 0.67 ... 0.6872 0.6867
train_dto_std (draw) float64 728B 0.0009612 0.02281 ... 0.004004 0.01135
train_dtb_mean (draw) float64 728B 0.02078 0.02071 ... 0.02083 0.02072
train_dtb_max (draw) float64 728B 0.02156 0.02113 ... 0.02186 0.02113
train_dtb_min (draw) float64 728B 0.02027 0.02019 ... 0.02044 0.02029
train_dtb_std (draw) float64 728B 0.0003202 0.0002283 ... 0.0002136
[2025-12-31 12:16:47,618825][I][examples/vit:544:<module>] Took 218.91 seconds
wandb:
wandb: π View run snowy-hill-239 at:
wandb: Find logs at: ../../../../../../lus/tegu/projects/datascience/foremans/projects/saforem2/ezpz/wandb/run-20251231_121309-g19jy6bl/logs
[2025-12-31 12:16:49,364101][I][ezpz/launch:447:launch] ----[π ezpz.launch][stop][2025-12-31-121649]----
[2025-12-31 12:16:49,364806][I][ezpz/launch:448:launch] Execution finished with 0.
[2025-12-31 12:16:49,365202][I][ezpz/launch:449:launch] Executing finished in 227.18 seconds.
[2025-12-31 12:16:49,365551][I][ezpz/launch:450:launch] Took 227.18 seconds to run. Exiting.