๐ฆ Tar Envโ๏ธ
Check for (and create if missing) a tarball of the current Python environment for distribution to compute nodes.
ezpz tar-env is a standalone tarball-creation utility. For end-to-end
environment distribution, use ezpz yeet instead โ
it handles the full Lustre โ /tmp/ copy, path patching, and parallel
fan-out to all worker nodes (with --compress providing similar
compression benefits as a separate tar-env step).
What it doesโ๏ธ
- Detects the active Python environment (conda or virtual environment) from
sys.executable. - Checks for an existing
.tar.gzin three locations, in order:<env-parent>/<env-name>.tar.gzโ/tmp/<env-name>.tar.gzโ<cwd>/<env-name>.tar.gz. If found (andoverwrite=False), reuses it. - If none exist, creates a new gzipped tarball next to the
environment (e.g.
/path/to/.venvโ/path/to/.venv.tar.gz). - Returns the absolute path to the tarball.
This is the canonical location for the tarball: subsequent ezpz yeet
(no args) invocations will see the same-named .tar.gz next to the
detected venv and print a hint suggesting the tarball-broadcast form,
which scales much better than per-file rsync (see the
scaling section on the yeet
page).
This is particularly useful on HPC systems where shared filesystems can become bottlenecks when many nodes simultaneously read Python packages. By packing the environment into a single tarball and extracting it on each node's local storage, import times are dramatically reduced.
Exampleโ๏ธ
Programmatic usage
The underlying check_for_tarball() function can be called directly from
Python with additional options:
See Alsoโ๏ธ
ezpz yeetโ distribute files (envs, models, datasets, etc.) to all worker nodes via parallel rsync (recommended for end-to-end use)ezpz.utils.tar_envโ Python API reference