cd /news/ai-tools/connect-infrastructure-issues-to-app… · home topics ai-tools article
[ARTICLE · art-29928] src=signoz.io ↗ pub= topic=ai-tools verified=true sentiment=↑ positive

Connect Infrastructure Issues to App Errors with Noz

SigNoz released a new AI-powered tool called Noz that correlates infrastructure issues with application errors, enabling developers to quickly determine whether error spikes are caused by platform problems like pod restarts or code issues. The tool aligns timelines of service errors and Kubernetes metrics, queries resource pressure, and suggests the likely direction of causality.

read2 min views5 publishedJun 14, 2026

Errors spiked on a service and you suspect the platform, not the code. Instead of flipping between the Services view and your Kubernetes dashboards, you ask Noz to put both on the same timeline.

Prerequisites

  • A SigNoz Cloudaccount withNozavailable. - Application traces or logs and infrastructure metrics (for example, Kubernetes pod metrics) flowing to SigNoz.

Step 1: Line up the timelines

Open Noz from the top-right header and ask the correlation directly:

Do the Kubernetes pod restarts line up with the error spike on the orders service today?

Noz pulls the service's error rate and the pod restart counts over the same window and tells you whether the two move together.

Step 2: Check resource pressure

Are any pods for this service crash-looping or hitting memory limits right now?

Noz queries the workload's restart reasons and CPU/memory metrics, so you can see whether OOM kills or saturation explain the restarts.

Step 3: Establish direction

Use Add Context → Services to attach the service, then ask Noz to reason about cause:

Is the error spike caused by the restarts, or did the errors trigger the restarts?

Noz weighs which signal moved first and explains the likely direction, with Suggested Actions to dig into the failing pods or the application traces next.

Tips

Explain the restarts, don't just count them. Restart counts alone are weak; ask for the OOM kills and resource limits behind them.Settle which moved first. Whether infra or the app led decides whether you scale the platform or fix the code.

Under the Hood

Under the Hood

To answer, Noz works through several agentic steps, visible under Worked through N steps:

Step What It Did
Ran builder query Aggregated the service's error rate over the window
Ran builder query Pulled pod restart counts and CPU/memory for the workload
Reasoned Aligned the two timelines and judged which signal led the other

Next Steps

Investigate What Changed After a Deploy with Noz- Rule out a release before blaming the platform.Get a Weekly Reliability Report with Noz- See whether this service is trending less reliable.

If you need help with the steps in this topic, please reach out to us on SigNoz Community Slack.

If you are a SigNoz Cloud user, please use in product chat support located at the bottom right corner of your SigNoz instance or contact us at cloud-support@signoz.io.

── more in #ai-tools 4 stories · sorted by recency
── more on @signoz 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/connect-infrastructu…] indexed:0 read:2min 2026-06-14 ·