{"slug": "a-solver-for-semantle", "title": "A Solver for Semantle", "summary": "Here is a factual summary of the article:\n\nThe article describes a solver for Semantle, a Wordle variant that scores guesses based on semantic similarity rather than letter matching. The solver uses a filtering approach that eliminates candidate words from the entire vocabulary based on whether their cosine similarity to each guess matches the score returned by the game, rather than attempting to directly solve for the target word's embedding vector. This method is highly effective, allowing the solver to reliably find the correct answer in approximately three guesses.", "body_md": "Semantle is a Wordle game variant. But instead of scoring guesses based on lexical similarity like its predecessor, it scores them based on semantic similarity.\nHere’s a screenshot from a game I played recently, ordered by similarity to the correct answer.\nI initially guessed “philosophy,” which was very far from the correct answer (similarity score of 6.02). After several more guesses I got lucky with “biology” (27.55) as my 8th guess, which pointed me toward more science-y words. Eventually I realized the answer had something to do with a hospital setting (yes, yes, I could have gotten there faster after “biology”). I landed on the right word “medical” as my 52nd guess.\nTo be honest, that’s a pretty good round for me. I’ve had games last more than twice that before giving up. If you’ve played before, you know Semantle is hard. But it is solvable, generally by gradually honing in on words that give higher similarity scores, and moving away from words that give lower scores.\nEthan Jantz and I wondered whether we could do better algorithmically. This post describes a simple solver we made while at the Recurse Center that reliably finds the answer in around 3 guesses.\nWhat information does the game give you?\nSemantle uses word embeddings — numerical vector representations of word meanings — to represent words. It uses Google News word2vec, which represents each word as a 300-dimensional vector. It then measures how close your guess is to the target word using the cosine similarity between the guess word embedding () and the target word embedding (). That similarity score is the feedback you get with each guess.\nThe difficulty of playing Semantle comes from how little information a single cosine similarity provides. It effectively tells you whether your guess is “hot” or “cold,” but not which direction you should move. As a result, you have to combine feedback from multiple guesses, mentally “triangulating” where the answer might be in semantic space.\nCan we solve for the embedding of the target word?\nIf you want to skip ahead to how we implemented the solver, you can jump to the next section. But first, I’m going to digress briefly to discuss why solving for the target word directly isn’t practical.\nA natural first idea might be to treat each guess as a clue to the hidden vector and try to combine those clues to recover it directly. In embedding space, that translates to solving for the target vector using a system of linear equations.\nThe similarity score the game returns with each guess is:\nIf we assume embeddings are normalized, this means similarity is equivalent to the dot product, . For each guess, that gives you one linear equation involving all 300 unknown components of .\nIn general, you need at least as many independent equations as unknowns to pin down a unique solution to a system of linear equations. Which means we need at least 300 independent guesses before we could recover and look up the nearest word in the vocabulary to find the target.\nSemantle is a hard game, but 300 guesses isn’t good enough to beat playing it as a human (or at least, I’d like to think so). So instead of trying to solve for the target embedding directly, we took advantage of the geometry of the cosine similarity and used a filtering approach that turned out to be surprisingly effective.\nHow we built the solver\nGeometrically, when we guess a word and get back a similarity score, that means the target word must lie somewhere on a surface of constant cosine similarity to the guess. On the unit sphere of embeddings, this surface corresponds to a ring of points that make the same angle with .\nThat makes each guess a very strong filter. Only words whose similarity to our guess is equal to the returned similarity score could still be the target. We can make use of that fact to build a solver.\nThe elimination approach\nThe solver works like this:\nIt keeps a list of all possible target words. Initially, this list contains every word in the embedding vocabulary for GoogleNews-vectors-negative300 (In theory, this could be restricted further since Semantle says it chooses its target from a list of the 5,000 most popular words in English. But even starting with several million candidates, this solver works just fine.)\nTo run this solver, we also built a clone of Semantle using their same embeddings so we could get immediate similarity scores.\nFor each round, the solver:\n- Picks a random candidate word as a guess\n- Asks Semantle for the similarity score between and\n- Computes the similarity score between and every remaining candidate word\n- Reduces the candidate list to only those that have the reported cosine similarity (within some small tolerance)\n- Repeats 1-4 until only one candidate word is left\nNote 1: For those who are familiar with this topic, we actually implemented this using cosine distance rather than cosine similarity, but the logic is effectively the same.\nNote 2: We chose guesses at random for simplicity, but in principle, you could probably do better than random.\nIn a visual diagram, that can be represented like this:\nIn code, that filtering step looks something like this:\nwhile len(potential_words) > 1:\n# step 1: make a guess\ng = random.choice(potential_words)\n# step 2: ask Semantle for score\ns_target = get_similarity_from_game(g)\n# step 3: similarity from g to every remaining candidate\ndistances = word_vectors.distances(g, other_words=potential_words)\nsimilarities = 1.0 - distances # cosine similarity = 1 - cosine_distance\n# step 4: keep only words w such that sim(g,w) matches s_target\ntemp_potential_words = []\nfor i in range(len(potential_words)):\nw = potential_words[i]\ns = similarities[i]\nif abs(s - s_target) < tolerance:\ntemp_potential_words.append(w)\npotential_words = temp_potential_words\n# one word left\nanswer = potential_words[0]\nWhy does this work so quickly?\nAlthough word embedding space is 300-dimensional, the vocabulary itself is sparse within that space. As a result, each cosine similarity constraint is highly restrictive. After just one or two guesses, the set of words that lie at the right distance shrinks dramatically. That makes the filtering strategy super effective.\nFor example, here’s a sample run: (We filter within a tolerance of 0.0001, which corresponds to the four-decimal-place precision of real Semantle’s cosine similarity scores.)\n── Guess 1: countryside ──\ncosine similarity: 0.023168 (±0.0001)\ncandidates searched: 3,000,000 -> remaining: 3,296\n── Guess 2: levelization ──\ncosine similarity: 0.097055 (±0.0001)\ncandidates searched: 3,296 -> remaining: 3\n── Guess 3: Skrzynski ──\ncosine similarity: 0.005881 (±0.0001)\ncandidates searched: 3 -> remaining: 1\nAnswer: medical\nIt’s worth noting that the guesses don’t need to “trend” toward the correct answer the way a human player’s guesses would. “Countryside,” “levelization,” and “Skrzynski” have nothing to do with “medical.” The solver isn’t walking a recognizable path.\nAnd yet, in a sense, humans and this solver are both “honing in” on the answer, just very differently. In a way, we as humans are doing something gradient-descent-like: every time we make a guess we are nudged towards the target, and we take small and meaningful steps toward it through semantic space. The process is local, intuitive, and guided by meaning.\nThe solver is also honing in, but it’s doing it globally rather than locally. Instead of taking tiny steps toward the target, it takes huge jumps that it knows are exactly right, while slicing away vast swaths of impossible words with each guess. It’s not actually moving in any kind of meaningful way closer to “medical,” it’s just shrinking the universe of possible answers until “medical” is the only possible choice remaining.\nI find that pretty cool: that the same embedding space can be something you navigate by meaning, or something you carve up by geometry. But either way, you get to the same place.", "url": "https://wpnews.pro/news/a-solver-for-semantle", "canonical_source": "http://victoriaritvo.com/blog/semantle-solver/", "published_at": "2026-02-17 12:00:00+00:00", "updated_at": "2026-05-23 18:42:50.243156+00:00", "lang": "en", "topics": ["machine-learning", "artificial-intelligence", "research", "data"], "entities": ["Semantle", "Wordle", "Google News", "word2vec", "Ethan Jantz", "Recurse Center"], "alternates": {"html": "https://wpnews.pro/news/a-solver-for-semantle", "markdown": "https://wpnews.pro/news/a-solver-for-semantle.md", "text": "https://wpnews.pro/news/a-solver-for-semantle.txt", "jsonld": "https://wpnews.pro/news/a-solver-for-semantle.jsonld"}}