Show HN: Language1 – Benchmarking LLM comprehension of vague prompts via Taboo

wpnews.pro

cd /news/large-language-models/show-hn-language1-benchmarking-llm-c… · home › topics › large-language-models › article

[ARTICLE · art-32737] src=language1.app ↗ pub=2026-06-18T14:47Z topic=large-language-models verified=true sentiment=· neutral

Show HN: Language1 – Benchmarking LLM comprehension of vague prompts via Taboo

A developer launched Language1, a word game where players guide an LLM to output a target word without using forbidden 'taboo' terms, aiming to build a benchmark dataset for evaluating AI comprehension of vague prompts. The free, ad-free browser game offers single-player and multiplayer modes with multiple LLM models, and uses guardrails to prevent bypasses like letter-spacing or ciphers.

read2 min views31 publishedJun 18, 2026

Hi HN,

I built Language1 (https://language1.app), a word game where you play "reverse Taboo" against an LLM. How it works: You are given a target word (e.g., "Apple") and a list of forbidden "taboo" words (e.g., "fruit", "red", "tree"). Your goal is to write a prompt that guides the LLM to output the exact target word, without using any of the forbidden words.

The Benchmark Goal: I am developing this project with the plan of using the gameplay data to build a benchmark dataset. The goal is to test and evaluate LLM capabilities when processing unclear prompts, metaphors, analogies, and vague explanations under semantic constraints.

Game Modes:

Single Player: Play through a pool of challenges to test your prompt precision. You compete against other players globally across attempts, solve time, and token consumption (measured via standard cl100k_base encoding). You can play instantly without registering, or sign in (one-click Google login) to submit scores to the leaderboards. Multiplayer Races: Real-time lobbies of up to 10 players racing through 3 rounds. Note: Since the game is new, public lobbies might be empty at first, but you can create private lobbies to play with friends. Available Models:

Anonymous users play with the default Gemma 3 Instruct model. Free registered users can choose between multiple models to test and compare reasoning styles, including Llama 3 8B, Liquid LFM 24B, Amazon Nova Micro, and Ministral 8B.

The Tech & Guardrails: The app is built with a React frontend and a Node.js/AWS Lambda backend. To keep things fair, we built a validation guard that parses input clues to block easy bypasses like letter-spacing (e.g., "A-P-P-L-E"), translations, cyphers, and base64. You have to rely purely on semantic reasoning to guide the model.

The game is completely free, has no ads, and is playable instantly in the browser.

I'd love to hear your thoughts on the gameplay and see what creative semantic tricks you use to guide the LLM!

Comments URL: [https://news.ycombinator.com/item?id=48586263](https://news.ycombinator.com/item?id=48586263)

Points: 1

source & further reading

language1.app — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/show-hn-language1-benchm…

Read original on language1.app → language1.app/

mentioned entities

Language1

Gemma 3 Instruct

Llama 3 8B

Liquid LFM 24B

Amazon Nova Micro

Ministral 8B

React

AWS Lambda

metadata

slugshow-hn-language1-benchmarking-llm-comprehension-of-vague-prompts-via-taboo

topic#large-language-models

secondary4 topics

sentimentneutral

canonicallanguage1.app

navigation

← prevSentinel- map any codebase for A…

next →Show HN: SunCalc v2 – a tiny Jav…

── more in #large-language-models 4 stories · sorted by recency

arxiv.org · 3 Aug · #large-language-models

Can LLMs Really Understand Item Difficulty Levels? Implications for Automated Item Generation Using LLMs

businesstimes.com.sg · 3 Aug · #large-language-models

Alibaba releases AI model with performance rivalling Anthropic’s Fable

startupfortune.com · 3 Aug · #large-language-models

Anthropic's COBOL Tool Rattled IBM, But AI Migration Still Hides Silent Bugs

insideai.news · 3 Aug · #large-language-models

Alibaba Launches Qwen3.8-Max, a 2.4 Trillion Parameter AI Model

── more on @language1 3 stories trending now

wpnews · 2 Aug · #artificial-intelligence

I Ran 8 AI APIs Through the Same 50 Prompts — Here's the Real Cost Breakdown

wpnews · 2 Aug · #developer-tools

Agent-Browser – Browser Automation for AI

wpnews · 2 Aug · #artificial-intelligence

Payment Rail vs. Settlement Layer: What AEON's Coinbase x402 Partnership Actually Validates

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required