π ezpz on Perlmutter @ NERSCβοΈ
-
Submit interactive job on Perlmutter:
-
Load modules:
-
Navigate to
$SCRATCHand set environment variables: -
Create and activate virtual environment:
-
Install
ezpz(+mpi4py): -
Run tests:
# Train MLP on MNIST ezpz launch python3 -m ezpz.examples.test # Fine Tune LLM ezpz launch python3 -m ezpz.examples.hf \ --dataset_name=eliplutchok/fineweb-small-sample \ --streaming \ --model_name_or_path meta-llama/Llama-3.2-1B \ --bf16=true \ --do_train=true \ --do_eval=true \ --report-to=wandb \ --logging-steps=1 \ --include-tokens-per-second=true \ --max-steps=100 \ --include-num-input-tokens-seen=true \ --optim=adamw_torch \ --logging-first-step \ --include-for-metrics='inputs,loss' \ --max-eval-samples=100 \ --per_device_train_batch_size=1 \ --per_device_eval_batch_size=1 \ --block_size=8192 \ --gradient_checkpointing=true \ --fsdp=auto_wrap \ --output_dir=outputs/ezpz.hf_trainer/$(tstamp)