LLM-Insights, local demo for people comments and ideas LLM-Insights released a local-first testing and optimization tool for iterative content creation that runs multi-model A/B tests, refines prompts using rubric-based grading, and generates scored synthetic data on user hardware. The system sends prompts to competing LLM models, grades responses against configurable rubrics, and automatically rewrites prompts using grader feedback to improve results across cycles. The tool operates locally by default via Ollama with optional cloud API support, ensuring no data leaves the machine unless users choose a cloud provider. A local-first testing and optimization harness for iterative content creation — run multi-model A/B tests, refine prompts automatically with rubric-based grading, and generate scored synthetic data. Built for brand content workflows, prompt engineering, and LLM evaluation on your own hardware. You write a prompt — a piece of brand copy, a product description, a creative brief, or any content task. The tool sends it to two competing LLM models, grades both answers against a configurable rubric, optionally rewrites the prompt using grader feedback, and repeats the cycle — keeping the best answer each round. Every variable is controlled from the UI: which models compete, what the rubric measures, how categories are weighted, and when the loop stops. The pipeline runs locally by default using Ollama, with optional cloud API support Mistral, Google Gemini for hybrid setups. No data leaves your machine unless you choose a cloud provider. Each run produces a structured record of prompts, answers, scores, token counts, and model metadata — useful as refined synthetic data, prompt optimization logs, or content quality benchmarks. Custom Grading Rubrics — Define up to 8 grading categories, each with its own free-text rubric description, dedicated grader model, and weight. The default rubric covers accuracy, clarity, conciseness, creativity, and structure. Save named configurations and switch between them at any time. Automatic Prompt Optimization — The system rewrites your prompt after each iteration using grader feedback, category weights, and best answers as context. Techniques are applied automatically while preserving the original intent, including Zero-Shot Prompting, Few-Shot Prompting, Chain-of-Thought CoT , Self-Consistency, Least-to-Most Prompting, Tree of Thoughts ToT , Directional Stimulus Prompting, Role Prompting, Generated Knowledge Prompting, Chain-of-Verification CoVe , and Skeleton-of-Thought. Multi-Model A/B Testing — Assign different models to each answering slot and compare their outputs head-to-head. The Advanced panel supports per-iteration model assignments for systematic cross-model comparisons. Parallel Multi-Category Grading — Layer 3 grades each category in parallel using a thread pool, grouped by grader model. Failed graders fall back to a default score without stopping the pipeline. Retries with backoff are built in. Synthetic Data Generation — Every run produces structured prompt, answer, multi-dimensional scores tuples and original prompt, improved prompt pairs. The JSONL ledger records prompts, replies, models, scores, and token counts. Multi-prompt sessions chain the best answer from the previous prompt as context into the next. Token Tracking — Input, output, and total token counts are recorded per model per layer per iteration and aggregated by provider. Token usage is visible in the deeper analysis charts. Tie Detection — When multiple iterations produce the same best score, the system identifies tied answers, deduplicates by text similarity, and reports alternatives. Session Review and Analysis — Browse, load, and analyze past runs with per-prompt iteration stats, score grids, and an in-depth analysis modal featuring average grade bar charts, radar overlays, per-category score breakdowns, token usage charts, runtime comparisons, and adjustable weight sliders for live what-if recalculation. REST API — All UI actions are backed by JSON endpoints /iteration , /is-processing , /get backup data , /update weights , /save advanced models , /grader settings , /grader setting/