{"slug": "a-game-of-robot-telephone", "title": "A Game of Robot Telephone", "summary": "A developer tested LLM code translation by passing a Go program through 10 languages and back, finding the final Go version grew from 94 to 443 lines while retaining correctness. The experiment used a chain of LLM-generated rewrites through TypeScript, Python, Ruby, C++, Java, Haskell, Common Lisp, Zig, Rust, and back to Go, with each step verified against a test API.", "body_md": "# A Game of Robot Telephone\n\n# Intro\n\nWay back when [AltaVista Babel Fish](https://en.wikipedia.org/wiki/Babel_Fish_(website)) first appeared online, it became a\n\nfun game on IRC to take a phrase, translate it through a chain of\n\nlanguages, and then back into English. We all also played the game of telephone as kids, trying to pass a message through the class by whispering in each other's ears.\n\nSometimes the result was surprisingly poetic, but often the result was complete gibberish as errors compounded and mutated along the way.\n\nWith the endless stream of LLM-fueled \"rewrite this in X\" posts doing the rounds, I thought it would be fun to try a similar game, but with code:\n\n- Start with a small but non-trivial program in Go\n- Pass it through a chain of LLM-generated rewrites\n- Bring it back to Go\n- See what survived\n\nThe final program produced the right answer, but grew to nearly five times due to a grab bag of semantic souvenirs it had picked up from the languages it passed through.\n\n# The Task\n\nThe task the code performs should have enough moving parts to make translation interesting, but be common-day enough to be achievable without heavy frameworks or a multitude of libraries.\n\nThe program I settled on does the following:\n\n- Accepts and validates a URL from the command line\n- Makes an HTTP request to the URL to retrieve a list of TODOs in JSON format\n- Parses the JSON response into values representing TODOs\n- Reads the current local date\n- Parses and validates the TODO deadline dates (\n`YYYY-MM-DD`\n\nformat) - Groups TODOs by user ID\n- Counts completed and overdue TODOs per user\n- Sorts summaries by completed/overdue counts\n- Formats a fixed-width table of results and prints to stdout\n\nThe initial implementation relies entirely on go's (well-suited) standard library.\n\n## Example API Response\n\n```\n  [\n    {\n      \"userId\": 1,\n      \"id\": 1,\n      \"title\": \"delectus aut autem\",\n      \"completed\": false,\n      \"dueDate\": \"1900-01-01\"\n    },\n    {\n      \"userId\": 1,\n      \"id\": 2,\n      \"title\": \"quis ut nam facilis et officia qui\",\n      \"completed\": true,\n      \"dueDate\": \"2999-12-31\"\n    },\n    ...\n  ]\n```\n\n- The input has twenty TODOs spread across seven users.\n- Due dates range from 1900 to 2999, so the completed dates may vary at the time of running.\n- There are no \"poorly formatted\" inputs.\n\n## Example output\n\n```\nUSER  COMPLETED  MISSED\n3     2          2\n2     2          1\n4     2          1\n5     1          3\n1     1          1\n7     0          1\n6     0          0\n```\n\n# Links in the Chain\n\nThe full chain of languages used was:\n\n``` php\nGo -> TypeScript -> Python -> Ruby -> C++ -> Java -> Haskell -> Common Lisp -> Zig -> Rust -> Go\n```\n\nEvery step was run by a fresh Codex process, and executed in an isolated worktree. The prompt used is given in full below. It details what to deliver and how to check the results (detailed below). It also encourages use of idiomatic code, installation of a toolchain and use of popular libraries. I found that Codex was VERY prone to, for example, writing it's own JSON parser instead of installing a toolchain.\n\n## The Prompt\n\nEach generated project contains its own build files, dependencies and `run.sh`\n\n, a wrapper script accepting a single argument (the TODOs endpoint) and tasked with running the newly generated language's code.\n\n# An Oracle to Guide Us\n\nThe API also exposed a `POST /conform`\n\nendpoint.\n\nCodex could run the newly generated project against `/todos`\n\n, gather the stdout output and post to this endpoint to get feedback as to whether it performed its task correctly.\n\nIf it failed, Codex could inspect the program locally, repair it and try again.\n\nThis was to prevent codex from being sneaky and peaking at expected outputs or another canonical implementation.\n\nAnd Codex is really sneaky.\n\n# Results\n\nEvery language in the chain was able to generate the expected table.\n\nThe original Go implementation was `94`\n\nlines, but the final one grew to `443`\n\nlines:\n\n| Original Go | Final Go | |\n|---|---|---|\n| Main implementation | 94 lines | 443 lines |\n| User IDs | integers | arbitrary JSON values |\n| Completed | Boolean | `true` , `\"true\"` or numeric `1` |\n| Fields | read when needed | all required up front |\n| HTTP timeout | 10 seconds | none |\n| Date parsing | Go standard library | hand-written validation |\n| HTTP status descriptions | Go standard library | hand-written lookup table |\n\nNone of the extra machinery changed the fixture output. Most of it existed to preserve decisions made by intermediate implementations, and preserve accumulated \"backward compatibility\" behavior of previous steps.\n\n## Truth Gets Complicated\n\nIn the original Go implementation, `completed`\n\nwas [a simple boolean value](https://github.com/minikomi/semantic_drift/blob/ed8ce179b9ebd278c49706f3828d40fe11fb5603/runs/latest/01-go/project/main.go#L12-L17) within the JSON emitted by the API:\n\n```\nCompleted bool `json:\"completed\"`\n```\n\nImmediately, the Typescript rewrite changes this behavior subtly.\n\nIn go, a type mismatch for the JSON, like a string where a bool is expected, will return an error. TypeScript with axios however, trusts the type annotation at compile time but does no runtime validation. A string value for `completed:`\n\nwould register as true, regardless of the content.\n\nIn practice, the API never returns strange data here, but it has repercussions down the line.\n\nPython and Ruby supplied their own ideas of truthiness, simply checking `if todo.completed`\n\n. Java then used Jackson's coercion rules, and by the Haskell stage the behavior had been made explicit: Boolean `true`\n\n, the string `\"true\"`\n\nand numeric `1`\n\nwere true.\n\nThat rule survives and becomes encoded in the Common Lisp, Zig and Rust implementations.\n\n## As Boolean\n\n### Lisp\n\n```\n  (defun as-boolean (value)\n  (or (eq value 'yason:true)\n      (and (stringp value) (string= value \"true\"))\n      (and (numberp value) (= value 1))))\n```\n\n### Zig\n\n```\n  fn asBoolean(value: std.json.Value) bool {\n    return switch (value) {\n        .bool => |b| b,\n        .string => |s| std.mem.eql(u8, s, \"true\"),\n        .integer => |n| n == 1,\n        .float => |n| n == 1.0,\n        .number_string => |s| std.mem.eql(u8, s, \"1\") or std.mem.eql(u8, s, \"1.0\"),\n        else => false,\n    };\n  }\n```\n\n### Rust\n\n``` php\n  fn as_boolean(value: &Value) -> bool {\n    match value {\n        Value::Bool(value) => *value,\n        Value::String(value) => value == \"true\",\n        Value::Number(value) => number_is_one(value),\n        _ => false,\n    }\n  }\n```\n\nWhen the program returns to Go it looks like this:\n\n```\nfunc asBoolean(value any) bool {\n\tswitch v := value.(type) {\n\tcase bool:\n\t\treturn v\n\tcase string:\n\t\treturn v == \"true\"\n\tcase json.Number:\n\t\treturn numberIsOne(v)\n\tdefault:\n\t\treturn false\n\t}\n}\n```\n\nThe final Go program carefully preserves the semantics of the boolean coercion inherited from the stages before it.\n\nThis is the main pattern in the chain. A language adds an interpretation, the next translation treats it as intentional, and eventually it becomes explicit compatibility code. Because of the oracle, these have no effect on the output, but contribute extra cognitive cruft.\n\n## What's in An ID\n\nThe original program decoded `userId`\n\ndirectly into an `int`\n\n.\n\nThe TypeScript annotation looked equally strict but again provides no runtime validation. Python and Ruby use dicts keyed by `todo[\"userId\"]`\n\n, allowing any type to be used as a hash key. Once we hit the C++ stage, the ambiguity is made official by accepting any JSON value as a user ID.\n\nFrom that point onward, user IDs could be strings, numbers, Booleans, null, arrays or objects. The implementation needed rules for grouping, displaying and sorting all of them.\n\nThe final Go program therefore has:\n\n`jsonKey`\n\nto serialize values for grouping`displayValue`\n\nto print arbitrary JSON values`compareUserID`\n\nto order mixed types`cloneJSONValue`\n\n, which no longer did anything\n\nIt also preserved JSON numbers as text. `1`\n\n, `1.0`\n\nand `1e0`\n\ncould become three\n\ndifferent grouping keys while all being displayed as `1`\n\n.\n\nNone of this was needed by the task - the fixture only contains integer IDs, a fact which was encoded succinctly in the original implementation by the type in the Go structs. Again the ambiguity of the oracle and the passage through dynamic languages caused downstream ambiguities to congregate and gave us a whole bunch of fresh souvenirs.\n\n## Error Handling Fossilizes\n\nThe original Go client got HTTP status text from the standard library. Later languages exposed status text differently, or not at all.\n\nBy the Common Lisp stage the program carried its own table of HTTP reason phrases. Zig copied it. Rust copied it. The final Go translation copied it back into a language which already knew how to produce those strings.\n\nThe final program contains a switch covering status codes from `400 Bad Request`\n\nto `505 HTTP Version Not Supported`\n\n.\n\nThis is not an LLM inventing random code. It is doing exactly what the setup asked: preserving observable behavior from its source. The problem is that an incidental workaround had quietly become observable behavior. This is mainly a failure of the prompt, too strictly encouraging using the previous stage's source as the canonical example.\n\n## Useful Behavior Disappears\n\nDrift did not only add things.\n\nThe original Go HTTP client had a ten-second timeout. TypeScript, Python, Ruby,\n\nC++, Java, Haskell and Common Lisp all retained some form of timeout.\n\nDuring the Zig translation step, the timeout was dropped. With no signal from there onward, the Rust translation and the final Go program result in calling `http.Get`\n\ndirectly with no timeout specified.\n\nIn our case, the TODOs API never serves a slow response and all stages pass without a problem. This highlights another side to fixture-driven conformance: behavior tested by the fixture becomes sacred. Behavior outside it can disappear without a trace.\n\n# Side Trips\n\n## Smaller Chains\n\nI also ran three shorter round trips, inspired by failed long-chain experiments.\n\n| Chain | Final Go | Main souvenir |\n|---|---|---|\n| Go -> Bash -> Go | 105 lines | dates compared as strings |\n| Go -> PHP -> Go | 272 lines | explicit emulation of PHP casting and truthiness |\n| Go -> Erlang -> Go | 130 lines | Erlang-style errors and a non-strict sort function |\n\n- The Bash version originally poisoned all implementations downstream by doing fancy\n`jq`\n\nmanipulations. It also compared ISO dates lexically instead of using some form of date parsing. This works for valid`YYYY-MM-DD`\n\nvalues, so the returning Go program kept doing it. Invalid or missing dates no longer behaved like the original. - The PHP round trip had the clearest example of semantic baggage. The returning Go implementation added helpers for PHP integer conversion, Boolean truthiness, string conversion, associative-array iteration and PHP-shaped date errors.\n- The Erlang round trip stayed fairly close to the original, but introduced a subtle bug in the sort. Erlang's sort predicate uses \"less than or equal to\". When translated into Go, that became =<==. Go's\n`sort.Slice`\n\nneeds a strict \"less than\" function, so passing it \"less than or equal to\" is incorrect. The fixture data never triggered the bad case, so the bug went unnoticed.\n\n## Adjusted Prompt Run\n\nAnother run with a [slightly adjusted prompt](https://github.com/minikomi/semantic_drift/blob/b31dc1d865ba151fbe3075054cb730da5ed05c1c/prompts/rewrite-neutral.md) heavily favored DIY solutions to the translation of ambiguous truthiness and dict keys once exiting the archipelago of dynamic languages. A [huge amount of code](https://github.com/minikomi/semantic_drift/blob/b31dc1d865ba151fbe3075054cb730da5ed05c1c/runs/run-20260621-202351/11-go/project/main.go#L16-L369) found its way to the final go implementation just to parse the json TODOs. It seems it [appeared in the C++ implementation](https://github.com/minikomi/semantic_drift/blob/b31dc1d865ba151fbe3075054cb730da5ed05c1c/runs/run-20260621-202351/05-cpp/project/main.cpp#L189-L250) and grew from there.\n\nIt seems that the system prompt still has a lot of sway over how the translation progresses, especially when the task runs in a loop building off of itself. Often this happens in unexpected ways.\n\n# What does it tell us?\n\nCan we take a lot away from this quick experiment? The oracle was deliberately stunted, the task trivial and the prompt underspecified. I could have gone back and made the prompt more deliberately strict about preserving types, timeouts and so on. Still, I think there's a kernel of wisdom to be had here when it comes to working with agentic coding setups.\n\nThe final implementation was a suitcase bearing the stickers of each language we visited along the way: JavaScript truthiness, generic JSON values from C++, coercion rules made explicit in Haskell, an HTTP status table from Common Lisp, date-validation rules passed through Zig and Rust, all of it rendered back into Go switches and helper functions.\n\nThe translation Codex performed is conservative, but has its quirks. It preserves what the source does, including workarounds and accidents, because the only way it can check correctness did not distinguish those from the point of the program.\n\nThe key takeaway is that LLM translations relying purely on test conformance are prone to sneaky behavior: satisfying the outcome while cutting corners elsewhere. Knowing this, setup, prompting, testing and final review are all extremely important to get right. LLMs are just as lazy as we are, and considerably better at hiding it.\n\n# Notes\n\n- Bash was originally in the full chain. It introduced enough shell-specific behavior that I replaced it with PHP.\n- PHP then pushed its coercion rules into every downstream translation, so I replaced it with C++ for the full run.\n- The first setup used\n[libfaketime](https://github.com/wolfcw/libfaketime)to freeze the date. It proved troublesome and exposed details of the harness to generated programs. The current fixture spans enough dates to remain useful while using the real local clock. - The oracle calculates its expected output when its server starts. Crossing local midnight during a run remains a small race.\n- Step specific analysis can be seen in\n[CHAIN_ANALYSIS.md](https://github.com/minikomi/semantic_drift/blob/5ff9907f7d0b70d7812296eaf52a5183f57fa453/CHAIN_ANALYSIS.md)", "url": "https://wpnews.pro/news/a-game-of-robot-telephone", "canonical_source": "https://poyo.co/note/20260619T090202/", "published_at": "2026-06-22 06:34:33+00:00", "updated_at": "2026-06-22 06:40:30.869830+00:00", "lang": "en", "topics": ["large-language-models", "generative-ai", "ai-tools", "developer-tools"], "entities": ["AltaVista Babel Fish", "Codex", "Go", "TypeScript", "Python", "Ruby", "C++", "Java"], "alternates": {"html": "https://wpnews.pro/news/a-game-of-robot-telephone", "markdown": "https://wpnews.pro/news/a-game-of-robot-telephone.md", "text": "https://wpnews.pro/news/a-game-of-robot-telephone.txt", "jsonld": "https://wpnews.pro/news/a-game-of-robot-telephone.jsonld"}}