William: A tiny poetry model in the browser William, a tiny poetry language model trained by a developer, runs entirely in the browser using ONNX Runtime Web. The 6-layer transformer was trained on the Gutenberg Poetry Corpus and fine-tuned on Poetry Foundation poems, then quantized to a 14 MB int8 model. Users can edit a title line and generate short poems locally without any server endpoint. William: a tiny poetry model in the browser William is a tiny local language model I trained to write short poems. The model on this page is loaded by the browser and sampled locally, one token at a time. There is no server endpoint behind the button. loading William... The title line is editable. William tokenizes it in the browser before generating the poem. William is a small decoder-only transformer: 6 layers, 384 hidden dimensions, 6 attention heads, and a 256-token context window. I trained it locally with MLX https://github.com/ml-explore/mlx on Apple Silicon. The training pipeline was two-stage. First, the model learned general poem-shaped text from the biglam/gutenberg-poetry-corpus https://huggingface.co/datasets/biglam/gutenberg-poetry-corpus line corpus after filtering out Project Gutenberg boilerplate, headers, editorial apparatus, prose-like blocks, and non-English fragments. Then I fine-tuned it on title/body poem pairs from , with extra filtering for rows that were too long or too prose-like for the short context window. I also used https://huggingface.co/datasets/suayptalha/Poetry-Foundation-Poems suayptalha/Poetry-Foundation-Poems locally as a grading model to help reject low-fitness fine-tuning rows and audit pretraining artifacts. https://huggingface.co/prism-ml/Bonsai-8B-mlx-1bit prism-ml/Bonsai-8B-mlx-1bit For this page, the MLX checkpoint was converted to ONNX and dynamically quantized to int8. The page downloads that static model file and runs it with ONNX Runtime Web in your browser; the model asset is around 14 MB.