I've noticed that several startups have been switching from leetcode-style assessments to some version of "clone starter code, build feature, submit code".
A key issue with this seems to be that smarter AI models (like Opus 4.6) end up spoiling key insights of the problem by helping them too much with system design and ideation.
I set up an assessment platform which basically serves as a middle-man proxy to record all requests between Claude Code and the Anthropic endpoint.
I've recently been experimenting with a feature which prevents Opus-class models from providing too much insight by instead making sure that the LLM's suggestions are geared towards only naive and brute-force problem insights unless explicitly challenged. This should prevent increasingly intelligent models from collapsing the resolution of signal that would normally be obtained from such an assessment.
Live demo: [https://app.gonfire.io](https://app.gonfire.io) (showhn@gonfire.io / Aa123123123123)
Comments URL: [https://news.ycombinator.com/item?id=48300444](https://news.ycombinator.com/item?id=48300444)
Points: 1