{"slug": "frontier-language-model-intelligence-over-time", "title": "Frontier Language Model Intelligence, over Time", "summary": "Artificial Analysis released its Frontier Language Model Intelligence index, tracking performance, cost, and execution time of leading AI models over time. The index evaluates models on agentic tasks, coding, reasoning, and knowledge, providing independent benchmarks for model selection.", "body_md": "# Independent analysis of AI\n\nUnderstand the AI landscape to choose the best model and provider for your use case\n\nHighlights\n\nPersonalized model recommender\n\nGet personalized recommendations based on your priorities for intelligence, speed, and cost\n\nExplore agents for general work, coding, customer support, and more\n\nCompare AI agents across capabilities, pricing, and platform support\n\nExplore premium plans\n\nAccess expanded benchmark data, custom visualizations, industry reports, and more\n\n## Intelligence\n\nIntelligence of leading AI models based on our independent evaluations\n\n### Artificial Analysis Intelligence Index\n\n### Artificial Analysis Intelligence Index by Open Weights / Proprietary\n\n### Intelligence vs. Cost to Run Artificial Analysis Intelligence Index\n\n[Create custom visualizationsCreate your own charts and tables comparing models and providers, save groups of models, and export data.Go to Data Playground](/data-playground)\n\n### Frontier Language Model Intelligence, Over Time\n\nPerformance, cost, and execution time for leading coding agents on end-to-end software engineering tasks\n\n[Explore Artificial Analysis Coding Agent Index](/agents/coding-agents)\n\n### Artificial Analysis Coding Agent Index\n\n## Image & Video Leaderboards\n\nTop models from our Image Arena and Video Arena leaderboards, with 95% confidence intervals\n\n### Text to Image Leaderboard\n\n[See the full leaderboard here.](/image/leaderboard/text-to-image)\n\n### Intelligence Evaluations\n\nAgentic real-world work tasks, (Elo-500)/2000\n\nAgentic coding & terminal use\n\nAgentic tool use\n\nLong context reasoning\n\nKnowledge\n\n1 - hallucination rate\n\nReasoning & knowledge\n\nScientific reasoning\n\nCoding\n\nInstruction following\n\nPhysics reasoning\n\nLong-horizon agentic tasks\n\n[ITBench-AA](/evaluations/itbench-aa)New\n\nKubernetes incident root-cause analysis\n\nVisual reasoning\n\nAA-Omniscience is a knowledge and hallucination benchmark that rewards accuracy, punishes bad guesses and provides a comprehensive view of which models produce factually reliable outputs across different domains\n\n### AA-Omniscience Index\n\nGDPval-AA evaluates AI models on real-world, economically valuable tasks across a wide range of occupations\n\n### GDPval-AA Leaderboard\n\n[ITBench-AA](/evaluations/itbench-aa)New\n\nITBench-AA evaluates AI agents on Kubernetes incident root-cause analysis from offline incident snapshots\n\n### ITBench-AA Average precision at full recall\n\nArtificial Analysis Openness Index assesses how 'open' models are on the basis of their availability and transparency across different components.\n\n### Artificial Analysis Openness Index: Components\n\n### Artificial Analysis Openness Index vs. Artificial Analysis Intelligence Index\n\n## Output Tokens\n\nOutput tokens of leading AI models based on our independent evaluations\n\n### Output Tokens Used to Run Artificial Analysis Intelligence Index\n\n## Cost Efficiency\n\nCost of leading AI models based on our independent evaluations\n\n### Cost to Run Artificial Analysis Intelligence Index\n\n## Speed & Latency\n\nComparison of first-party API performance\n\n### Output Speed\n\n## PriceUpdated\n\nPrice of leading AI models based on our independent evaluations\n\n### Pricing: Cache Hit, Input, and Output\n\n[NewHardware Benchmarking](/benchmarks/hardware)\n\nComprehensive benchmarking of GPUs for language model inference\n\n[Video Arena & Leaderboard](/video/arena)\n\nCompare leading Text to Video and Image to Video models\n\n[Image Arena & Leaderboard](/image/arena)\n\nCompare leading Image Generation and Image Editing models\n\n[Speech Arena & Leaderboard](/text-to-speech/arena)\n\nCompare leading Text to Speech models", "url": "https://wpnews.pro/news/frontier-language-model-intelligence-over-time", "canonical_source": "https://artificialanalysis.ai/?model-creators=anthropic%2Cmistral#frontier-language-model-intelligence-over-time", "published_at": "2026-06-14 00:04:22+00:00", "updated_at": "2026-06-14 00:31:30.260586+00:00", "lang": "en", "topics": ["large-language-models", "artificial-intelligence", "ai-research", "ai-products", "ai-tools"], "entities": ["Artificial Analysis", "AA-Omniscience", "GDPval-AA", "ITBench-AA", "Kubernetes"], "alternates": {"html": "https://wpnews.pro/news/frontier-language-model-intelligence-over-time", "markdown": "https://wpnews.pro/news/frontier-language-model-intelligence-over-time.md", "text": "https://wpnews.pro/news/frontier-language-model-intelligence-over-time.txt", "jsonld": "https://wpnews.pro/news/frontier-language-model-intelligence-over-time.jsonld"}}