Skip to content

✍️ ezpz-generateβš“οΈŽ

/// details | Warning type: warning

Experimental / not well tested. The flow works in simple cases but has not seen broad coverageβ€”treat as best-effort and be ready to fall back to your own HF script if needed.

///

Interactive text generation loop for Hugging Face causal language models.

  • Loads a model and tokenizer via πŸ€— transformers
  • Moves the model to the device detected by ezpz.get_torch_device_type()
  • Prompts you for text and a max length, then streams a single completion

Usageβš“οΈŽ

# direct console script
ezpz-generate --model_name meta-llama/Llama-3.2-1B --dtype bfloat16

# equivalent module form (useful with ezpz launch)
python -m ezpz.examples.generate --model_name TinyLlama/TinyLlama-1.1B-Chat-v1.0
ezpz launch -- python -m ezpz.examples.generate --model_name meta-llama/Llama-3.2-1B

Flagsβš“οΈŽ

  • --model_name (default: meta-llama/Llama-3.2-1B): Hugging Face repo/model to load.
  • --dtype (default: bfloat16, choices: float16|bfloat16|float32): Torch dtype for the model.

At runtime the script will prompt for:

  • prompt: Text to feed the model.
  • max length: Token limit passed to model.generate.

Notesβš“οΈŽ

  • Expects torch and transformers to be installed and a compatible accelerator available (GPU strongly recommended).
  • Tokenizer pad_token is set to eos_token before generation.
  • Type β€œexit” at the prompt or press Ctrl+C to quit.