Why your vulnerability dashboard is lying to you (and how to fix it)

The article explains that vulnerability dashboards often display inaccurate data due to the "asset identity problem," where different security tools (e.g., Tenable, CrowdStrike, ServiceNow) use inconsistent identifiers like hostnames or IP addresses for the same asset, causing patched vulnerabilities to appear unresolved. To fix this, the author recommends correlating assets using multiple identifiers with explicit confidence scoring, prioritizing hard IDs (e.g., MAC address) over hostnames and IPs, and avoiding silent merges of ambiguous matches. The author has open-sourced a Python tool called `security-asset-correlator` to automate this correlation process.

You open your vulnerability dashboard on a Monday morning and see 47 critical CVEs across 12 assets. By Thursday, your team has patched 11 of the 12 assets. But the dashboard still shows 40 criticals. What happened? The assets were patched. The dashboard doesn't know that, because the vulnerability scanner sees a different record than the asset your team was tracking. The same physical server exists in your tools as: When Tenable reports the CVE patched on 10.0.4.22 , your dashboard doesn't automatically know that 10.0.4.22 is the same machine as prod-api-07.internal . So it still shows the finding as open on the CrowdStrike record. This is the asset identity problem. Most security teams have it. Almost nobody talks about it. "We use the hostname" — Hostnames are normalized differently by every tool. Tenable might see prod-api-07 , CrowdStrike sees prod-api-07.internal , ServiceNow has PRODAPI007 from a manual entry made 8 months ago. "We use the IP address" — IPs change. NAT means the scanner sees a different IP than the one the EDR agent reports. A host that was 10.0.4.22 last week might be 10.0.4.31 today. "We have a CMDB" — Great, how fresh is it? Most CMDBs are 30–60% stale within 6 months of implementation. And you still need to write the correlation logic to feed it. The core insight is that no single identifier is reliable across tools, but combining multiple identifiers with explicit confidence scoring gets you very far. Here's the priority order: Layer 1 — Hard IDs confidence: 0.95–1.0 Match on instanceId , EDR agentId , or MAC address. These are tool-native stable identifiers. If two records share a hard ID, they're the same asset with near-certainty. Layer 2 — Hostname confidence: 0.45–0.85 Normalize first: strip .local , .internal , case-fold, drop -prod /-dev suffixes. Then match. Confidence scales with how unique the hostname looks. Layer 3 — IP address confidence: 0.60–0.75 Public IPs get higher confidence than private IPs. Apply a staleness decay: an IP seen 30 days ago is worth less than one seen yesterday. Private IPs in NAT-heavy environments are unreliable and scored conservatively. Layer 4 — Metadata confidence: up to 0.50 OS family + cloud region + account ID. Useful as a tie-breaker. Not enough alone. Combine layers 2 and 3: 0.60 × hostname score + 0.40 × ip score . Merge if the composite score is ≥ 0.70. Flag for human review if 0.50–0.69. Create a new canonical record if < 0.50. The key design principle: ambiguous matches are never silently merged. A 50% confident merge creates ghost duplicates that are worse than no merge at all. Once you've matched records, you merge them. But "merge" has a lot of edge cases: prod-api-07 and the EDR says prod-api-07.internal ? Answer: EDR is more authoritative for hostnames; AWS is more authoritative for region.Every field disagreement should be logged with full lineage. Conflicts are data. I've been writing this glue layer at multiple companies. Last week I open-sourced it. bash pip install security-asset-correlator https://github.com/apurvtyagi/security-asset-correlator