Tiny GPT in Go. Optimised for Understanding. Trained on Jules Verne Books

wpnews.pro

cd /news/machine-learning/tiny-gpt-in-go-optimised-for-underst… · home › topics › machine-learning › article

[ARTICLE · art-19592] src=github.com ↗ pub=2026-06-02T21:22Z topic=machine-learning verified=true sentiment=↑ positive

Tiny GPT in Go. Optimised for Understanding. Trained on Jules Verne Books

A developer released a minimal GPT implementation written entirely in Go, trained on Jules Verne novels. The model generates short text fragments like "Mysterious Island" and takes about 40 minutes to train on an M3 MacBook Air. The project prioritizes educational clarity over performance, removing batch dimensions and external dependencies to serve as a companion to Karpathy's "Neural Networks: Zero to Hero" course.

read3 min views25 publishedJun 2, 2026

Simple GPT implementation in pure Go. Trained on favourite Jules Verne books.

What kind of response you can expect from the model:

Mysterious Island.
Well.
My days must follow

Or this:

Captain Nemo, in two hundred thousand feet weary in
the existence of the world.
bash
$ go run .

It takes about 40 minutes to train on MacBook Air M3. The trained weights will be saved to model-1.234M

file. If you rerun the model, it will pick up the saved weights and continue training. The loss should decrease each time, indicating that the model is learning something useful.

You can train on your own dataset by pointing the data.dataset

variable to your text corpus.

To run in chat-only mode once the training is done:

$ go run . -chat

You can use this repository as a companion to the Neural Networks: Zero to Hero course. Use git checkout <tag>

to see how the model has evolved over time: naive

, bigram

, multihead

, block

, residual

, full

In main_test.go you will find explanations starting from basic neuron example:

// Our neuron has 2 inputs and 1 output (number of columns in weight matrix).
// Its goal is to predict next number in the sequence.
input := V{1, 2} // {x1, x2}
weight := M{
    {2}, // how much x1 contributes to the output
    {3}, // how much x2 contributes to the output
}

All the way to self-attention mechanism:

// To calculate the sum of all previous tokens, we can multiply by this triangular matrix:
tril := M{
    {1, 0, 0, 0}, // first token attends only at itself ("cat"), it can't look into the future
    {1, 1, 0, 0}, // second token attends at itself and the previous token ( "cat" + ", ")
    {1, 1, 1, 0}, // third token attends at itself and the two previous tokens ("cat" + ", " + "dog")
    {1, 1, 1, 1}, // fourth token attends at itself and all the previous tokens ("cat" + ", " + "dog" + " and")
}.Var()
// So, at this point each embedding is enriched with the information from all the previous tokens.
// That's the crux of self-attention.
enrichedEmbeds := MatMul(tril, inputEmbeds)

No batches.

I've given up the complexity of the batch dimension for the sake of better understanding. It's far easier to build intuition with 2D matrices, rather than with 3D tensors. Besides, batches aren't inherent to the transformer architecture. For better gradient smoothing gradient accumulation was tried. The effect was negligible, so it was removed as well.

Removed gonum

The gonum.matmul

gave us ~30% performance boost, but it brought additional dependency. We're not striving for maximum efficiency here, rather for radical simplicity. Current matmul implementation is quite effective, and it's only 40 lines of plain readable code.

You don't need to read them to understand the code :)

Attention Is All You Need

Deep Residual Learning

DeepMind WaveNet

Batch Normalization

Deep NN + huge data = breakthrough performance

OpenAI GPT-3 paper

Analyzing the Structure of Attention

Many thanks to Andrej Karpathy for his brilliant Neural Networks: Zero to Hero course.

source & further reading

github.com — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/tiny-gpt-in-go-optimised…

Read original on github.com → github.com/zakirullin/gpt-go

mentioned entities

Jules Verne

MacBook Air M3

Karpathy

Neural Networks: Zero to Hero

GPT

metadata

slugtiny-gpt-in-go-optimised-for-understanding-trained-on-jules-verne-books

topic#machine-learning

secondary4 topics

sentimentpositive

canonicalgithub.com

navigation

← prevAI Ready – U.S. Department of La…

next →Give your AI agent long-term mem…

── more in #machine-learning 4 stories · sorted by recency

theverge.com · 18 Jul · #machine-learning

Dave Eggers told OpenAI staff that ChatGPT was ‘silencing an entire generation’

cryptobriefing.com · 18 Jul · #machine-learning

Kimi K3 matches top public models in agent-programming scenarios, says OpenAI strategist

dev.to · 18 Jul · #machine-learning

GPT Is a Nerd, Claude Is a Colleague: Why AI Models Have Personalities (and Why It Matters)

dev.to · 18 Jul · #machine-learning

The Frontier AI Safety Conversation Has a Blind Spot

── more on @jules verne 3 stories trending now

wpnews · 26 May · #ai-agents

Think, Durable Objects, and the Real Shape of AI Applications

wpnews · 8 Jul · #large-language-models

Gemini 3.5 Pro Delayed to July 17: Architectural Rebuild Explained

wpnews · 8 Jul · #ai-chips

D-Matrix launches Corsair AI inference platform, challenging Nvidia’s GPU dominance

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required