I built a token-level debugger for comparing two LLMs A developer built Tokenflame, a token-level debugger that compares outputs from two different large language models given the same prompt. The tool generates entropy heatmaps, tokenizer diffs, divergence markers, and token-by-token replay in a single HTML file with one command. Same prompt, two models, different outputs. No tooling was actually showing me where they diverged. Built tokenflame that gives entropy heatmaps, tokenizer diffs, divergence markers, token-by-token replay. One command, one HTML file. pip install tokenflame Top comments 0 Subscribe For further actions, you may consider blocking this person and/or reporting abuse /report-abuse