{"slug": "ai-powered-root-cause-correlating-file-access-with-apm-via-dynatrace", "title": "AI-Powered Root Cause: Correlating File Access with APM via Dynatrace", "summary": "A serverless Lambda pipeline ships FSx for ONTAP audit logs to Dynatrace via the Log Ingest API v2, enabling Davis AI to automatically correlate file access anomalies with application performance degradation. The system identifies root causes such as \"500 users hitting the same NFS share simultaneously\" causing app slowdowns, with logs visible in the Logs Viewer within one to two minutes. Dynatrace builds a topology map of the entire stack, using time-window correlation and entity connectivity to find causal relationships between storage events and application metrics.", "body_md": "We built a serverless Lambda pipeline that ships FSx for ONTAP audit logs to Dynatrace via the Log Ingest API v2. The real value: Dynatrace's Davis AI can automatically correlate file access anomalies with application performance degradation — answering \"why is the app slow?\" with \"because 500 users hit the same NFS share simultaneously.\"\n\n```\nFSx for ONTAP → S3 Access Point → EventBridge Scheduler → Lambda → Dynatrace Log Ingest API v2\n                                                                         │\n                                                                         ▼\n                                                                    Davis AI\n                                                              ┌───────────────────┐\n                                                              │ Correlates:       │\n                                                              │ • File access     │\n                                                              │   anomalies       │\n                                                              │ • APM metrics     │\n                                                              │ • Infrastructure  │\n                                                              │   health          │\n                                                              │                   │\n                                                              │ → Root cause      │\n                                                              │   in seconds      │\n                                                              └───────────────────┘\n```\n\nVerified on Dynatrace SaaS Trial (Tokyo-equivalent region). Logs visible in Logs Viewer within 1-2 minutes.\n\nThis is Part 11 of the [Serverless Observability for FSx for ONTAP](https://dev.to/aws-builders/why-your-fsx-for-ontap-audit-logs-deserve-better-than-ec2-kod) series.\n\nMost observability tools treat storage logs as isolated data. Dynatrace is different — it builds a **topology map** of your entire stack and uses Davis AI to find causal relationships through time-window correlation and entity connectivity:\n\n| Scenario | Without Dynatrace | With Dynatrace |\n|---|---|---|\n| App latency spike | \"Check the logs\" | Davis AI detects temporal correlation: file access to /vol/data/ increased 10x within the same 5-minute window as app response time degradation, connected via topology (app → NFS mount → SVM) |\n| Storage I/O anomaly | Manual investigation | Automatic correlation via shared topology entities — Davis identifies which services are affected based on entity relationships |\n| User reports slow file access | Grep through audit logs | DQL query + topology view showing the full dependency path from user request to storage operation |\n\nThe key differentiator: **Davis AI correlates events across entities that share topology connections within overlapping time windows** — not just keyword matching or manual dashboard correlation.\n\n```\n┌─────────────────────────────────────────────────────────┐\n│ Event Sources                                           │\n├─────────────────────────────────────────────────────────┤\n│                                                         │\n│  EventBridge Scheduler                                  │\n│  rate(5 minutes) ──→ Lambda                             │\n│                       │ lists new files via             │\n│                       │ S3 Access Point                 │\n│                       │ (checkpoint in SSM)             │\n│                       ▼                                 │\n│           Dynatrace Log Ingest API v2                   │\n│           (Api-Token auth)                              │\n│                       │                                 │\n│  EMS Webhook          │                                 │\n│  ──→ API GW ──→ Lambda ─────────────┤                   │\n│     (ems_handler)                   │                   │\n│                                     ▼                   │\n│  FPolicy                       Dynatrace                │\n│  ──→ ECS Fargate ──→ SQS      (Logs Viewer,             │\n│  ──→ Bridge Lambda              Davis AI,               │\n│  ──→ EventBridge                DQL,                    │\n│  ──→ Lambda (fpolicy_handler)   Dashboards)             │\n│  ──────────────────────────────────────────────────────┤│\n└─────────────────────────────────────────────────────────┘\n```\n\nWhen you ship FSx for ONTAP logs to Dynatrace alongside your APM data, Davis AI can detect patterns like:\n\nThis works because Dynatrace maps your FSx for ONTAP SVM as a **custom device entity** in its topology, connecting it to the applications that access it.\n\n`logs.ingest`\n\n`dt0c01.<TOKEN_ID>.<TOKEN_SECRET>`\n\n```\naws secretsmanager create-secret \\\n  --name \"dynatrace/fsxn-api-token\" \\\n  --secret-string '{\"api_token\":\"dt0c01.XXXXXXXX.YYYYYYYY\"}' \\\n  --region ap-northeast-1\naws cloudformation deploy \\\n  --template-file integrations/dynatrace/template.yaml \\\n  --stack-name fsxn-dynatrace-integration \\\n  --parameter-overrides \\\n    S3AccessPointArn=arn:aws:s3:ap-northeast-1:123456789012:accesspoint/fsxn-audit-ap \\\n    DynatraceApiTokenSecretArn=arn:aws:secretsmanager:ap-northeast-1:123456789012:secret:dynatrace/fsxn-api-token-XXXXXX \\\n    DynatraceEnvUrl=https://abc12345.live.dynatrace.com \\\n    S3BucketName=my-fsxn-audit-bucket \\\n  --capabilities CAPABILITY_NAMED_IAM \\\n  --region ap-northeast-1\n```\n\nNavigate to **Logs** → **View logs** → **Run query**:\n\n```\nfetch logs\n| filter log.source == \"fsxn-ontap\"\n```\n\nLogs should appear within 1-2 minutes.\n\nEach audit log event is shipped with structured attributes for DQL querying:\n\n```\n{\n  \"content\": \"{\\\"EventID\\\":\\\"4663\\\",\\\"UserName\\\":\\\"admin@corp.local\\\",...}\",\n  \"log.source\": \"fsxn-ontap\",\n  \"dt.source_entity\": \"CUSTOM_DEVICE-fsxn-svm-prod-01\",\n  \"timestamp\": \"2026-01-15T12:00:00Z\",\n  \"severity\": \"info\",\n  \"fsxn.svm\": \"svm-prod-01\",\n  \"fsxn.operation\": \"ReadData\",\n  \"fsxn.user\": \"admin@corp.local\",\n  \"fsxn.path\": \"/vol/data/file.txt\",\n  \"fsxn.s3_key\": \"audit/2026/01/15/audit-001.json\"\n}\n```\n\nThe `dt.source_entity`\n\nfield links logs to a custom device in Dynatrace's topology, enabling Davis AI correlation.\n\nDynatrace Query Language (DQL) provides powerful analytics:\n\n```\n// All failed file access attempts (using structured attributes)\nfetch logs\n| filter log.source == \"fsxn-ontap\"\n| filter fsxn.result == \"Failure\"\n| summarize count(), by: {fsxn.user, fsxn.path}\n\n// Top operations by volume\nfetch logs\n| filter log.source == \"fsxn-ontap\"\n| summarize count(), by: {fsxn.operation}\n| sort count() desc\n\n// Access timeline for a specific SVM\nfetch logs\n| filter fsxn.svm == \"svm-prod-01\"\n| makeTimeseries count(), interval: 5m\n// File access volume vs app response time (side-by-side)\nfetch logs\n| filter log.source == \"fsxn-ontap\"\n| makeTimeseries file_ops = count(), interval: 5m\n\n// Correlate with service metrics in a dashboard\n// (Place this next to a service response time tile)\n\n// Find users causing the most I/O during a performance incident\nfetch logs\n| filter log.source == \"fsxn-ontap\"\n| filter timestamp >= now() - 1h\n| summarize ops = count(), by: {fsxn.user}\n| sort ops desc\n| limit 10\n// Detect potential ransomware (mass file modifications)\nfetch logs\n| filter log.source == \"fsxn-ontap\"\n| filter fsxn.operation == \"WriteData\" OR fsxn.operation == \"Delete\"\n| makeTimeseries write_ops = count(), interval: 1m\n| filter write_ops > 100\n\n// After-hours access\nfetch logs\n| filter log.source == \"fsxn-ontap\"\n| filter hour(timestamp) < 7 OR hour(timestamp) > 19\n| summarize count(), by: {fsxn.user, fsxn.path}\n```\n\n| Deployment | URL Format | Data Location |\n|---|---|---|\n| SaaS | `https://<env-id>.live.dynatrace.com` |\nDynatrace-managed (region-specific) |\n| Managed | `https://<your-domain>/e/<env-id>` |\nYour infrastructure |\n| ActiveGate | `https://<host>:9999/e/<env-id>` |\nYour network (proxy) |\n\nFor data sovereignty requirements, Dynatrace Managed or ActiveGate keeps all data within your infrastructure.\n\nDynatrace pricing is based on Davis Data Units (DDU):\n\n| Monthly Log Volume | DDU/day (est.) | Monthly DDU Cost |\n|---|---|---|\n| 1 GB | ~1 DDU | Minimal (within base allocation) |\n| 10 GB | ~10 DDU | ~$25/month (at $2.50/DDU) |\n| 100 GB | ~100 DDU | ~$250/month |\n\n| Component | Monthly Cost (10 GB/month) |\n|---|---|\n| Lambda (5-min polling) | ~$3 |\n| EventBridge Scheduler | ~$1 |\n| Secrets Manager | ~$1 |\nDynatrace DDU |\n~$25 |\nTotal |\n~$30 |\n\nDDU pricing varies by contract. The 14-day trial includes generous DDU allocation for validation. Check your license terms for production estimates.\n\n| # | Discovery | Impact |\n|---|---|---|\n| 1 |\nAPI returns HTTP 204 on success (not 200) |\nLambda must treat 204 as success |\n| 2 | Trial environment has 1-2 minute ingestion lag | Wait before checking Logs Viewer |\n| 3 |\n`logs.ingest` scope is required — `ReadConfig` /`WriteConfig` won't work |\nToken creation must select correct scope |\n| 4 |\n`logs.read` scope needed separately for API-based queries |\nCreate a second token for automation |\n| 5 | Log entries older than 24 hours may be rejected | Use current timestamps in test data |\n| 6 | Max 1MB per request (smallest batch limit in this series) | Lambda splits large batches |\n| 7 | Firehose delivery requires ActiveGate (not direct to SaaS) | Use Lambda direct for simplicity |\n\nTo get the most from Davis AI correlation, all three prerequisites must be in place:\n\n`dt.source_entity`\n\nfield set`dt.source_entity`\n\n) — this creates the storage-side topology node. Use the `POST /api/v2/entities/custom`\n\n) or Settings API to pre-create the device entity before first log ingestion\n\nPrerequisites for correlation: Davis AI correlation only activates when all three components are connected in the topology. Without OneAgent on the application hosts, Davis AI cannot establish the causal link between file access patterns and application performance. The custom device entity must use a consistent naming convention (e.g.,`CUSTOM_DEVICE-fsxn-{svm-name}`\n\n) across all log entries.\n\n```\nApplication (OneAgent) ──→ NFS/SMB ──→ FSx for ONTAP (SVM)\n       │                                      │\n       │ APM metrics                          │ Audit logs\n       ▼                                      ▼\n                    Dynatrace Davis AI\n                    (automatic correlation)\n```\n\nThis integration follows the project's [Production Readiness Levels](https://github.com/Yoshiki0705/fsxn-observability-integrations#production-readiness-levels--%E6%9C%AC%E7%95%AA%E6%BA%96%E5%82%99%E3%83%AC%E3%83%99%E3%83%AB):\n\n| Level | What You Get | Go/No-Go to Next |\n|---|---|---|\n| Level 1 (this Quick Start) | Audit poller + DLQ | Logs arrive, checkpoint advances, DLQ empty 24h |\n| Level 2 | + DQL dashboards + alerts | SLOs met 7 days, security review done |\n| Level 3 | + DynamoDB ledger + Davis AI correlation | SLOs met 30 days, compliance pack |\n| Level 4 | + OTel Collector + redaction + OneAgent | Multi-backend, PII redaction, full topology |\n\nData classification: Dynatrace receives`fsxn.user`\n\nand`fsxn.path`\n\nfields (PII/sensitive). Dynatrace SaaS environments are region-specific — select a region matching your data residency requirements. For Managed/ActiveGate deployments, data stays in your infrastructure. See[Data Classification Guide].\n\nFull criteria: [Pipeline SLO Definitions](https://github.com/Yoshiki0705/fsxn-observability-integrations/blob/main/docs/en/pipeline-slo.md) | [DLQ Replay Runbook](https://github.com/Yoshiki0705/fsxn-observability-integrations/blob/main/docs/en/runbooks/dlq-replay.md)\n\n| Template | Purpose | Key Parameters |\n|---|---|---|\n`template.yaml` |\nFSx audit log poller | S3AccessPointArn, DynatraceApiTokenSecretArn, DynatraceEnvUrl |\n`template-ems.yaml` |\nEMS webhook handler | DynatraceApiTokenSecretArn, DynatraceEnvUrl |\n`template-fpolicy.yaml` |\nFPolicy EventBridge handler | DynatraceApiTokenSecretArn, DynatraceEnvUrl, EventBusName |\n\n*Questions about the Dynatrace integration or Davis AI correlation? Drop a comment below.*\n\n**GitHub**: [github.com/Yoshiki0705/fsxn-observability-integrations](https://github.com/Yoshiki0705/fsxn-observability-integrations)", "url": "https://wpnews.pro/news/ai-powered-root-cause-correlating-file-access-with-apm-via-dynatrace", "canonical_source": "https://dev.to/aws-builders/ai-powered-root-cause-correlating-file-access-with-apm-via-dynatrace-4ffl", "published_at": "2026-05-31 00:26:51+00:00", "updated_at": "2026-05-31 00:41:59.122136+00:00", "lang": "en", "topics": ["artificial-intelligence", "ai-products", "ai-tools", "ai-infrastructure", "mlops"], "entities": ["Dynatrace", "Davis AI", "FSx for ONTAP", "AWS Lambda", "EventBridge", "S3 Access Point", "Log Ingest API v2", "Amazon FSx"], "alternates": {"html": "https://wpnews.pro/news/ai-powered-root-cause-correlating-file-access-with-apm-via-dynatrace", "markdown": "https://wpnews.pro/news/ai-powered-root-cause-correlating-file-access-with-apm-via-dynatrace.md", "text": "https://wpnews.pro/news/ai-powered-root-cause-correlating-file-access-with-apm-via-dynatrace.txt", "jsonld": "https://wpnews.pro/news/ai-powered-root-cause-correlating-file-access-with-apm-via-dynatrace.jsonld"}}