Be Recommended by Inithouse: 4 Mistakes We Made Building an AI Visibility Checker — and the Fixes That Worked

Inithouse, a studio running parallel product experiments, built Be Recommended, a tool that checks brand visibility across AI models like ChatGPT and Perplexity. The team encountered four key technical mistakes: treating rate limits as edge cases, using naive caching, ignoring model version changes, and providing opaque scoring. They implemented fixes including per-provider circuit breakers, weekly cache invalidation with stale-while-revalidate, model version tracking, and decomposed scoring, which improved performance and user retention.

At Inithouse — a studio running parallel product experiments — we built Be Recommended https://berecommended.com , a tool that checks how visible your brand is across ChatGPT, Perplexity, Claude, and Gemini. The idea sounded simple: query multiple AI models, score the results, show a report. It was not simple. Here are four technical mistakes we made shipping v1 — and the fixes that actually survived production. We treated rate limits as edge cases. They were not. Every AI provider has different rate-limit headers, different backoff expectations, and different definitions of "too many requests." Our first architecture just retried on 429. That turned a rate limit into a cascade — one provider throttling triggered a retry storm that cascaded to the others. The fix: Per-provider circuit breakers with exponential backoff. Each provider gets its own state machine. When a circuit opens, we serve cached results for that provider and mark the score as "partial" in the UI. Users see real data, not a spinner that never resolves. At Audit Vibe Coding https://auditvibecoding.com — another tool in our portfolio focused on code quality audits — we observed the same pattern in a different domain: external API dependencies need isolation. The lesson transferred directly. Our first cache key was query + model . That breaks immediately — AI model responses drift over time, and a cached result from two weeks ago is misleading. We also had no invalidation strategy beyond TTL. The fix: Cache by query + model + week number . Weekly invalidation with stale-while-revalidate: serve the cached score instantly, trigger a background refresh, update the display when new data arrives. Users get instant feedback and fresh data within the same session. We measured the impact across our portfolio: stale-while-revalidate cut perceived load time from 8+ seconds to under 1 second for returning visitors. The background refresh means scores stay current without the user waiting. When OpenAI or Anthropic ships a new model version, recommendation patterns shift. We had no way to detect this — scores just quietly changed, and users saw different numbers without understanding why. The fix: Track model versions per query. When a model version changes, flag the score delta in the report: "Score changed from 72 to 65 — model updated from GPT-4.1 to GPT-4.5." Transparency here builds trust. Users stop thinking the tool is broken and start understanding the landscape. We found this matters even more for niche products. In our portfolio, tools like Magical Song https://magicalsong.com — an AI music generator for personalized gifts — saw wild visibility swings between model versions. Tracking those shifts helped us understand which AI models retain context about smaller brands. Our v1 score was a weighted average. Users saw "Your score is 47/100" and had no idea what that meant or how to improve it. Support tickets were mostly "why is my score low?" — we had no good answer because the methodology was opaque even to us. The fix: Decomposed scoring with per-factor breakdown. The report now shows exactly which AI models mention you, in what context, with what sentiment, and for which prompts. Each factor has its own sub-score. Users can see "Claude recommends you for 3 out of 10 test queries, mostly in the 'alternatives to X' category." This was the single biggest improvement for retention. When users can see exactly what to fix, they come back to measure progress. Building Be Recommended https://berecommended.com taught us that the hardest part of querying AI models is not the querying — it is everything around it: rate limits, caching, version tracking, and making results interpretable. The AI visibility space moves fast, and a tool that worked last month can silently degrade. Across the Inithouse portfolio, we keep finding that transparency compounds — whether in AI visibility reports, code audits, or personalized content. Tools that show their work earn repeat usage. If you want to check your own AI visibility score, Be Recommended https://berecommended.com runs a free instant analysis across multiple AI models with full methodology transparency.