{"slug": "ai-2026ai", "title": "AI 2026AI", "summary": "In 2026, AI applications are widely deployed in production but present unique challenges such as unstable model outputs, high latency, and unpredictable costs, which traditional application performance monitoring (APM) cannot address. The article introduces core methods for AI application observability, including logging, metrics (token consumption, latency, cost), tracing, and evaluation, and provides Python code examples for tracking AI latency, token usage, and classifying AI-specific errors.", "body_md": "# AI 应用可观测性完全指南：2026年生产环境AI监控实战\n\n## 前言\n\n2026 年，AI 应用已经广泛应用于生产环境。但 AI 应用有其独特性：模型输出不稳定、延迟高、成本难以预测。\n\n传统的应用监控（APM）无法满足 AI 监控的需求。本文介绍 AI 应用可观测性的核心方法。\n\n## 什么是 AI 可观测性\n\n### 传统监控 vs AI 监控\n\n| 维度 | 传统监控 | AI 监控 |\n\n|------|---------|---------|\n\n| 延迟 | HTTP 请求耗时 | API 调用 + 模型推理耗时 |\n\n| 错误率 | 4xx/5xx 状态码 | 拒绝、幻觉、格式错误 |\n\n| 成本 | 固定云资源 | Token 消耗波动 |\n\n| 质量 | 可精确测量 | 需要额外评估 |\n\n### AI 可观测性四大支柱\n\n```\n├── Logging（AI 请求日志）\n\n├── Metrics（Token 消耗、延迟、成本）\n\n├── Tracing（AI 调用链路追踪）\n\n└── Evaluation（输出质量评估）\n```\n\n## 核心指标体系\n\n### 1. 延迟指标\n\n``` python\nimport time\n\nfrom functools import wraps\n\nclass AILatencyTracker:\n\ndef __init__(self):\n\nself.latencies = []\n\ndef track(self, func):\n\n\"\"\"装饰器追踪延迟\"\"\"\n\n@wraps(func)\n\nasync def async_wrapper(*args, **kwargs):\n\nstart = time.time()\n\nresult = await func(*args, **kwargs)\n\nelapsed = time.time() - start\n\nself.record(\"success\", elapsed)\n\nreturn result\n\nexcept Exception as e:\n\nelapsed = time.time() - start\n\nself.record(\"error\", elapsed)\n\n@wraps(func)\n\ndef sync_wrapper(*args, **kwargs):\n\nstart = time.time()\n\nresult = func(*args, **kwargs)\n\nelapsed = time.time() - start\n\nself.record(\"success\", elapsed)\n\nreturn result\n\nexcept Exception as e:\n\nelapsed = time.time() - start\n\nself.record(\"error\", elapsed)\n\nimport asyncio\n\nif asyncio.iscoroutinefunction(func):\n\nreturn async_wrapper\n\nreturn sync_wrapper\n\ndef record(self, status: str, latency: float):\n\nself.latencies.append({\n\n\"timestamp\": time.time(),\n\n\"status\": status,\n\n\"latency_ms\": latency * 1000\n\ndef get_stats(self) -> dict:\n\n\"\"\"获取统计信息\"\"\"\n\nif not self.latencies:\n\nlatencies = [l[\"latency_ms\"] for l in self.latencies]\n\n\"count\": len(latencies),\n\n\"avg_ms\": sum(latencies) / len(latencies),\n\n\"p50_ms\": sorted(latencies)[len(latencies) // 2],\n\n\"p95_ms\": sorted(latencies)[int(len(latencies) * 0.95)],\n\n\"p99_ms\": sorted(latencies)[int(len(latencies) * 0.99)],\n```\n\n### 2. Token 消耗指标\n\n``` python\nclass TokenTracker:\n\ndef __init__(self):\n\nself.records = []\n\nself.total_input_tokens = 0\n\nself.total_output_tokens = 0\n\ndef record(self, model: str, input_tokens: int, output_tokens: int, cost: float):\n\n\"\"\"记录 Token 消耗\"\"\"\n\nself.total_input_tokens += input_tokens\n\nself.total_output_tokens += output_tokens\n\nself.records.append({\n\n\"timestamp\": time.time(),\n\n\"model\": model,\n\n\"input_tokens\": input_tokens,\n\n\"output_tokens\": output_tokens,\n\n\"total_tokens\": input_tokens + output_tokens,\n\n\"cost\": cost\n\ndef get_daily_cost(self) -> dict:\n\n\"\"\"获取每日成本\"\"\"\n\ntoday = time.time() - 86400  # 24小时前\n\nrecent = [r for r in self.records if r[\"timestamp\"] > today]\n\ntotal_cost = sum(r[\"cost\"] for r in recent)\n\ntotal_tokens = sum(r[\"total_tokens\"] for r in recent)\n\n\"cost_today\": total_cost,\n\n\"tokens_today\": total_tokens,\n\n\"avg_cost_per_request\": total_cost / len(recent) if recent else 0\n\ndef get_model_breakdown(self) -> dict:\n\n\"\"\"按模型分类统计\"\"\"\n\nbreakdown = {}\n\nfor r in self.records:\n\nmodel = r[\"model\"]\n\nif model not in breakdown:\n\nbreakdown[model] = {\"cost\": 0, \"tokens\": 0, \"count\": 0}\n\nbreakdown[model][\"cost\"] += r[\"cost\"]\n\nbreakdown[model][\"tokens\"] += r[\"total_tokens\"]\n\nbreakdown[model][\"count\"] += 1\n\nreturn breakdown\n```\n\n### 3. 错误分类\n\n```\nclass AIErrorClassifier:\n\nERROR_TYPES = {\n\n\"rate_limit\": {\"retry\": True, \"severity\": \"medium\"},\n\n\"auth_error\": {\"retry\": False, \"severity\": \"high\"},\n\n\"model_error\": {\"retry\": True, \"severity\": \"medium\"},\n\n\"timeout\": {\"retry\": True, \"severity\": \"low\"},\n\n\"invalid_request\": {\"retry\": False, \"severity\": \"high\"},\n\n\"content_filtered\": {\"retry\": False, \"severity\": \"medium\"},\n\n@classmethod\n\ndef classify(cls, error: Exception) -> dict:\n\n\"\"\"分类错误类型\"\"\"\n\nerror_str = str(error).lower()\n\nif \"429\" in error_str or \"rate_limit\" in error_str:\n\nreturn {\"type\": \"rate_limit\", **cls.ERROR_TYPES[\"rate_limit\"]}\n\nelif \"401\" in error_str or \"auth\" in error_str:\n\nreturn {\"type\": \"auth_error\", **cls.ERROR_TYPES[\"auth_error\"]}\n\nelif \"500\" in error_str or \"internal\" in error_str:\n\nreturn {\"type\": \"model_error\", **cls.ERROR_TYPES[\"model_error\"]}\n\nelif \"timeout\" in error_str:\n\nreturn {\"type\": \"timeout\", **cls.ERROR_TYPES[\"timeout\"]}\n\nelif \"400\" in error_str or \"invalid\" in error_str:\n\nreturn {\"type\": \"invalid_request\", **cls.ERROR_TYPES[\"invalid_request\"]}\n\nelif \"filtered\" in error_str or \"content\" in error_str:\n\nreturn {\"type\": \"content_filtered\", **cls.ERROR_TYPES[\"content_filtered\"]}\n\nreturn {\"type\": \"unknown\", \"retry\": False, \"severity\": \"high\"}\n\n@classmethod\n\ndef should_retry(cls, error: Exception) -> bool:\n\n\"\"\"判断是否应该重试\"\"\"\n\nclassification = cls.classify(error)\n\nreturn classification.get(\"retry\", False)\n```\n\n## 日志体系\n\n### 结构化 AI 日志\n\n``` python\nimport json\n\nimport logging\n\nfrom datetime import datetime\n\nclass AILogger:\n\ndef __init__(self, log_file: str = \"ai_logs.jsonl\"):\n\nself.log_file = log_file\n\nself.logger = logging.getLogger(\"ai\")\n\nself.logger.setLevel(logging.INFO)\n\nhandler = logging.FileHandler(log_file)\n\nhandler.setFormatter(logging.Formatter('%(message)s'))\n\nself.logger.addHandler(handler)\n\ndef log_request(self,\n\nrequest_id: str,\n\nmodel: str,\n\nprompt: str,\n\nresponse: str = None,\n\nlatency_ms: float = None,\n\ntokens_used: int = None,\n\ncost: float = None,\n\nerror: str = None):\n\n\"\"\"记录 AI 请求\"\"\"\n\nlog_entry = {\n\n\"timestamp\": datetime.utcnow().isoformat(),\n\n\"type\": \"ai_request\",\n\n\"request_id\": request_id,\n\n\"model\": model,\n\n\"prompt_length\": len(prompt),\n\n\"response_length\": len(response) if response else None,\n\n\"latency_ms\": latency_ms,\n\n\"tokens_used\": tokens_used,\n\n\"cost\": cost,\n\n\"error\": error,\n\n\"success\": error is None\n\nself.logger.info(json.dumps(log_entry, ensure_ascii=False))\n\ndef log_evaluation(self, request_id: str, quality_score: float, categories: dict):\n\n\"\"\"记录质量评估结果\"\"\"\n\nlog_entry = {\n\n\"timestamp\": datetime.utcnow().isoformat(),\n\n\"type\": \"quality_evaluation\",\n\n\"request_id\": request_id,\n\n\"quality_score\": quality_score,\n\n\"categories\": categories\n\nself.logger.info(json.dumps(log_entry, ensure_ascii=False))\n\nai_logger = AILogger(\"ai_production_logs.jsonl\")\n\nai_logger.log_request(\n\nrequest_id=\"req_001\",\n\nmodel=\"gpt-5.4\",\n\nprompt=\"解释什么是机器学习\",\n\nresponse=\"机器学习是...\",\n\nlatency_ms=250,\n\ntokens_used=1500,\n```\n\n### 日志分析查询\n\n``` python\nimport json\n\nclass LogAnalyzer:\n\ndef __init__(self, log_file: str):\n\nself.log_file = log_file\n\ndef load_logs(self, limit: int = None):\n\nwith open(self.log_file, 'r') as f:\n\nfor i, line in enumerate(f):\n\nif limit and i >= limit:\n\nlogs.append(json.loads(line))\n\nreturn logs\n\ndef get_error_rate(self, hours: int = 24) -> float:\n\n\"\"\"计算错误率\"\"\"\n\ncutoff = datetime.utcnow().timestamp() - hours * 3600\n\nlogs = self.load_logs()\n\nrecent = [l for l in logs if datetime.fromisoformat(l[\"timestamp\"]).timestamp() > cutoff]\n\nif not recent:\n\nerrors = sum(1 for l in recent if not l.get(\"success\", True))\n\nreturn errors / len(recent)\n\ndef get_expensive_requests(self, top_n: int = 10) -> list:\n\n\"\"\"获取最贵的请求\"\"\"\n\nlogs = self.load_logs()\n\nsorted_logs = sorted(\n\n[l for l in logs if l.get(\"cost\")],\n\nkey=lambda x: x.get(\"cost\", 0),\n\nreverse=True\n\nreturn sorted_logs[:top_n]\n\ndef get_slow_requests(self, threshold_ms: float = 5000) -> list:\n\n\"\"\"获取慢请求\"\"\"\n\nlogs = self.load_logs()\n\nreturn [l for l in logs if l.get(\"latency_ms\", 0) > threshold_ms]\n```\n\n## 追踪链路\n\n### LangChain + OpenTelemetry\n\n``` python\nfrom opentelemetry import trace\n\nfrom opentelemetry.sdk.trace import TracerProvider\n\nfrom opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter\n\nprovider = TracerProvider()\n\nprocessor = BatchSpanProcessor(ConsoleSpanExporter())\n\nprovider.add_span_processor(processor)\n\ntrace.set_tracer_provider(provider)\n\ntracer = trace.get_tracer(__name__)\n\nclass AIServiceWithTracing:\n\ndef __init__(self):\n\nself.llm = OpenAI()\n\nself.vector_db = VectorDB()\n\n@tracer.start_as_current_span(\"ai_request\")\n\nasync def process_request(self, user_input: str, user_id: str):\n\nspan = trace.get_current_span()\n\nspan.set_attribute(\"user_id\", user_id)\n\nspan.set_attribute(\"input_length\", len(user_input))\n\n# 1. 检索相关文档\n\nwith tracer.start_as_current_span(\"retrieve_context\") as span:\n\ndocs = self.vector_db.search(user_input)\n\nspan.set_attribute(\"docs_retrieved\", len(docs))\n\n# 2. 调用 LLM\n\nwith tracer.start_as_current_span(\"llm_call\") as span:\n\nstart = time.time()\n\nresponse = self.llm.generate(user_input, docs)\n\nspan.set_attribute(\"model\", \"gpt-5.4\")\n\nspan.set_attribute(\"latency_ms\", (time.time() - start) * 1000)\n\nspan.set_attribute(\"response_length\", len(response))\n\nspan.set_attribute(\"success\", True)\n\nreturn response\n\nexcept Exception as e:\n\nspan.set_attribute(\"success\", False)\n\nspan.set_attribute(\"error\", str(e))\n```\n\n## 输出质量评估\n\n### 自动质量评估\n\n``` python\nclass AIOutputEvaluator:\n\ndef __init__(self):\n\nself.llm = OpenAI()\n\ndef evaluate(self, prompt: str, response: str) -> dict:\n\n\"\"\"评估输出质量\"\"\"\n\nevaluation_prompt = f\"\"\"\n\n评估以下 AI 输出的质量：\n\n用户输入：{prompt}\n\nAI 输出：{response}\n\n评估维度（每项 1-5 分）：\n\n1. 相关性：输出是否与问题相关\n\n2. 准确性：信息是否正确\n\n3. 完整性：是否完整回答了问题\n\n4. 清晰度：表达是否清晰易读\n\n5. 安全性：是否有不当内容\n\n\"relevance\": 4,\n\n\"accuracy\": 5,\n\n\"completeness\": 4,\n\n\"clarity\": 5,\n\n\"safety\": 5,\n\n\"overall_score\": 4.6,\n\n\"issues\": [\"问题1\", \"问题2\"],\n\n\"suggestions\": [\"建议1\", \"建议2\"]\n\nresult = self.llm.generate(evaluation_prompt)\n\nreturn json.loads(result)\n\nreturn {\"error\": \"评估解析失败\", \"raw\": result}\n\ndef batch_evaluate(self, requests: list) -> list:\n\nresults = []\n\nfor req in requests:\n\nevaluation = self.evaluate(req[\"prompt\"], req[\"response\"])\n\nresults.append({\n\n\"request_id\": req[\"id\"],\n\n**evaluation\n\nreturn results\n\ndef detect_hallucination(self, response: str, context: str) -> dict:\n\ndetection_prompt = f\"\"\"\n\n检测以下回答是否存在幻觉（编造不存在的信息）：\n\n上下文/背景：{context}\n\nAI 回答：{response}\n\n1. 是否有具体事实（人名、日期、数字）需要验证\n\n2. 这些事实是否在上下文中\n\n3. 是否有明显编造的内容\n\n\"has_hallucination\": true/false,\n\n\"confidence\": 0.85,\n\n\"risky_content\": [\"具体可疑内容\"],\n\n\"reason\": \"判断理由\"\n\nresult = self.llm.generate(detection_prompt)\n\nreturn json.loads(result)\n\nreturn {\"has_hallucination\": False, \"confidence\": 0}\n```\n\n## Prometheus 监控面板\n\n### 指标导出\n\n``` python\nfrom prometheus_client import Counter, Histogram, Gauge, generate_latest\n\nREQUEST_COUNT = Counter(\n\n'ai_requests_total',\n\n'Total AI requests',\n\n['model', 'status']\n\nREQUEST_LATENCY = Histogram(\n\n'ai_request_latency_seconds',\n\n'AI request latency',\n\nTOKEN_USAGE = Counter(\n\n'ai_tokens_used_total',\n\n'Total tokens used',\n\n['model', 'type']  # type: input/output\n\nCOST_USAGE = Counter(\n\n'ai_cost_total',\n\n'Total API cost',\n\nACTIVE_REQUESTS = Gauge(\n\n'ai_active_requests',\n\n'Number of active requests',\n\n@app.middleware(\"http\")\n\nasync def track_requests(request: Request, call_next):\n\nmodel = request.headers.get(\"X-Model\", \"unknown\")\n\nACTIVE_REQUESTS.labels(model=model).inc()\n\nstart = time.time()\n\nresponse = await call_next(request)\n\nlatency = time.time() - start\n\nREQUEST_COUNT.labels(model=model, status=response.status_code).inc()\n\nREQUEST_LATENCY.labels(model=model).observe(latency)\n\nACTIVE_REQUESTS.labels(model=model).dec()\n\nreturn response\n\n@app.get(\"/metrics\")\n\ndef metrics():\n\nreturn Response(content=generate_latest())\n```\n\n## 告警配置\n\n### 关键告警规则\n\n```\n# alertmanager.yml 或监控配置\n\n- name: ai_application\n\n- alert: HighAIErrorRate\n\nsum(rate(ai_requests_total{status=\"error\"}[5m]))\n\nsum(rate(ai_requests_total[5m])) > 0.05\n\nseverity: critical\n\nannotations:\n\nsummary: \"AI 请求错误率超过 5%\"\n\n- alert: HighAILatency\n\nhistogram_quantile(0.95,\n\nsum(rate(ai_request_latency_seconds_bucket[5m])) by (le)\n\nseverity: warning\n\nannotations:\n\nsummary: \"AI 请求 P95 延迟超过 10 秒\"\n\n- alert: HighAICost\n\nincrease(ai_cost_total[1h]) > 100\n\nseverity: warning\n\nannotations:\n\nsummary: \"AI 调用成本小时增长超过 $100\"\n\n- alert: AIRateLimit\n\nincrease(ai_requests_total{status=\"429\"}[5m]) > 10\n\nseverity: warning\n\nannotations:\n\nsummary: \"AI API 限流频繁发生\"\n```\n\n## Grafana 仪表板\n\n### 关键面板\n\n```\n┌─────────────────────────────────────────────────────────────┐\n\n│  AI Application Dashboard                                    │\n\n├─────────────────────────────────────────────────────────────┤\n\n│                                                             │\n\n│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │\n\n│  │ Requests    │  │ Error Rate  │  │ Avg Latency │         │\n\n│  │ 12,345     │  │ 2.3%       │  │ 1.2s        │         │\n\n│  └─────────────┘  └─────────────┘  └─────────────┘         │\n\n│                                                             │\n\n│  ┌─────────────────────────────────────────────────────┐   │\n\n│  │ Token Usage Over Time                               │   │\n\n│  │ ████████████████░░░░░░░░░░░░░░░░░░                  │   │\n\n│  └─────────────────────────────────────────────────────┘   │\n\n│                                                             │\n\n│  ┌─────────────────────────────────────────────────────┐   │\n\n│  │ Cost by Model                                       │   │\n\n│  │ GPT-5.4: $45.2 (67%)                              │   │\n\n│  │ Claude: $22.1 (33%)                                │   │\n\n│  └─────────────────────────────────────────────────────┘   │\n\n│                                                             │\n\n│  ┌─────────────────────────────────────────────────────┐   │\n\n│  │ Quality Score Distribution                           │   │\n\n│  │ ██████████████████████████░░░░░░░░░░░░              │   │\n\n│  └─────────────────────────────────────────────────────┘   │\n\n└─────────────────────────────────────────────────────────────┘\n```\n\n## 最佳实践\n\n### 1. 数据采样\n\n```\nclass SamplingLogger:\n\n\"\"\"采样记录，避免存储成本过高\"\"\"\n\nSAMPLE_RATE = 0.1  # 10% 采样\n\ndef __init__(self):\n\nself.full_logger = AILogger()\n\nself.sample_count = 0\n\ndef should_log(self) -> bool:\n\n\"\"\"判断是否应该记录完整日志\"\"\"\n\nself.sample_count += 1\n\nif self.sample_count % int(1 / self.SAMPLE_RATE) == 0:\n\nreturn True\n\nreturn False\n\ndef log(self, entry: dict):\n\nif self.should_log():\n\nself.full_logger.log_request(**entry)\n```\n\n### 2. 成本预警\n\n``` python\nclass CostAlert:\n\ndef __init__(self, threshold_daily: float = 100):\n\nself.threshold_daily = threshold_daily\n\nself.token_tracker = TokenTracker()\n\ndef check_and_alert(self):\n\n\"\"\"检查成本并告警\"\"\"\n\ndaily = self.token_tracker.get_daily_cost()\n\nif daily[\"cost_today\"] > self.threshold_daily:\n\n\"alert\": True,\n\n\"message\": f\"今日 AI 成本 ${daily['cost_today']:.2f} 超过阈值 ${self.threshold_daily}\",\n\n\"action\": \"review_recent_requests\"\n\nreturn {\"alert\": False}\n```\n\n## 总结\n\nAI 应用可观测性是生产环境的必备：\n\n**延迟追踪**：P50/P95/P99 延迟指标** Token 消耗**：按模型、按时间的成本分析**错误分类**：区分可重试和不可重试错误**质量评估**：自动评估输出质量，检测幻觉**告警配置**：错误率、延迟、成本告警\n\n没有可观测性，就没有 AI 应用的生产治理。\n\n*本文是 AI 工程化系列之一。*\n\n*This article contains affiliate links. If you sign up through the links above, I may earn a commission at no additional cost to you.*\n\n## Ready to Build Your AI Business?\n\n** Get started with Systeme.io for free** — All-in-one platform for building your online business with AI tools.", "url": "https://wpnews.pro/news/ai-2026ai", "canonical_source": "https://dev.to/zny10289/ai-2026ai-47pd", "published_at": "2026-05-20 23:49:10+00:00", "updated_at": "2026-05-21 00:01:46.566570+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "developer-tools", "enterprise-software"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/ai-2026ai", "markdown": "https://wpnews.pro/news/ai-2026ai.md", "text": "https://wpnews.pro/news/ai-2026ai.txt", "jsonld": "https://wpnews.pro/news/ai-2026ai.jsonld"}}