Not coding assistants or agents, and not things that obviously would just use the latest frontier model like really hard work.
I'm talking about the daily AI features users interact with directly: generating content, rewriting things, recommendations, workflow helpers, contextual suggestions, etc.
For my app I mostly use Gemini Flash via OpenRouter because the workloads are fairly structured. It works well enough, but there are now dozens of models available and I'm not sure how most teams are evaluating them. Are people building proper eval suites? Comparing cost/latency? Testing a handful of models and picking the cheapest one that's good enough? I think most people fall in that last bucket and I want to at the least hear what others landed on. Besides Flash I found using a qwen or minimax model worked fairly well.
Curious what you're running in production and how you chose it.
Comments URL: [https://news.ycombinator.com/item?id=48576393](https://news.ycombinator.com/item?id=48576393)
Points: 2