Building a Local AI SOC Analyst on an M1 MacBook Pro

The article describes the development of a local AI-powered SOC analyst that runs on an M1 MacBook Pro, designed to assist with daily security operations by triaging and analyzing alerts from existing cloud-native monitoring tools like Datadog, PagerDuty, and Sysdig. The solution uses Ollama to run local models (llama3.2:3b and qwen3:8b) within a Python harness, focusing on summarizing findings, correlating evidence, and producing daily security notes without automating production changes. A key lesson was that the model alone was insufficient; success required combining the right model with controlled prompts, use-case-driven analysis, and realistic hardware expectations.

How I solved a real SOC operations problem for Datadog, AWS, Cloudflare, Sysdig, PagerDuty with an AI runner, a local AI harness with a tricky model selection process Executive Summary We started with a practical SOC problem: build an AI-based SOC analyst that runs locally on an M1 MacBook Pro and helps with daily security operations across an existing cloud-native monitoring and alerting stack. The environment already had strong telemetry and alerting coverage: - AWS CloudTrail - AWS Security Hub - Route53 VPC DNS Firewall - SES - SNS - Cloudflare logs - Application logs - GitHub audit logs crawler - Datadog Cloud Security detections - Datadog monitors for Kubernetes and AWS metrics - Datadog dashboards covering many SOC use cases - Sysdig runtime policies for Kubernetes - PagerDuty alert routing The problem was not lack of logs or alerts. The real challenge was analyst workflow. The SOC still needed a repeatable way to review alerts, correlate evidence, summarize findings, identify missing context, and produce daily security notes without manually jumping between tools every time. The working solution became a local AI SOC analyst pattern: Ollama Local model runner llama3.2:3b Stable default model for M1 daily SOC work qwen3:8b Optional larger model for focused deeper analysis Python harness SOC workflow, prompts, guardrails, and integrations AI runner CLI Analyst-facing command-line interface Datadog Primary log, signal, dashboard, and monitoring source PagerDuty Alert and incident routing source Sysdig Separate runtime policy signal source Human analyst Final decision authority The important lesson was that the model alone was not the solution. The working solution came from combining the right model, a controlled harness, bounded prompts, use-case-driven analysis, and realistic expectations about local MacBook hardware. The Original Problem The goal was to build a local AI-based SOC analyst on an M1 MacBook Pro. The main telemetry flow looked like this: AWS CloudTrail AWS Security Hub Route53 VPC DNS Firewall SES SNS Cloudflare logs Application logs GitHub audit logs crawler | v Datadog | v Datadog Cloud Security rules Datadog monitors Datadog dashboards | v PagerDuty Sysdig was separate: Kubernetes runtime activity | v Sysdig runtime policies | v PagerDuty That distinction mattered. Datadog was the central place for logs, detections, monitors, and dashboards. Sysdig was not sending its logs to Datadog, so Sysdig alerts had to be treated as a separate runtime security signal path. The expected solution was not a generic local chatbot. The expected solution was a repeatable local SOC assistant that could support: - Daily SOC review - Alert triage - CloudTrail analysis - AWS Security Hub finding review - Route53 DNS Firewall activity review - SES and SNS activity review - Cloudflare security event review - GitHub audit log review - Application log review - PagerDuty incident summarization - Sysdig runtime alert review - SOC note drafting - Recommended follow-up queries Key Design Decision: AI Should Not Replace Detection We made one important architectural decision early: the local AI model should not become the detector. Datadog and Sysdig already perform that role: - Datadog receives logs and metrics. - Datadog Cloud Security rules generate security signals. - Datadog monitors detect operational and Kubernetes-related issues. - Sysdig runtime policies detect Kubernetes runtime policy violations. - PagerDuty routes alerts from Datadog and Sysdig. The local AI should sit above those systems as a triage and analysis layer. That means the AI helps answer: - What happened? - Which user, workload, IP, service, account, repository, or API was involved? - Is this likely malicious, expected change, duplicate, benign true positive, or false positive? - What evidence is missing? - Which Datadog queries should be run next? - Should this be escalated? - What should the SOC note say? - Is containment recommended, and does it require human approval? This keeps the control boundary clean. Detection stays with Datadog and Sysdig. Alerting stays with PagerDuty. The local AI helps the analyst move faster, ask better questions, and document the investigation more consistently. Final Architecture The final working architecture was intentionally simple: +------------------------------+ | AWS / Cloudflare / GitHub | | Apps / SES / SNS / DNS FW | +---------------+--------------+ | v +---------+ | Datadog | | Logs | | Signals | | Metrics | | Monitors| +----+----+ | v +---------+ |PagerDuty| +----+----+ +------------------+ +---------+ | Sysdig Runtime |------- |PagerDuty| | Policies | +---------+ +------------------+ | v +------------------------------+ | Local AI SOC Analyst | | M1 MacBook Pro | | | | Ollama | | llama3.2:3b / qwen3:8b | | Python SOC Harness | | AI Runner CLI | +------------------------------+ The local AI analyst was designed as read-only first. It can summarize, correlate, recommend, and draft. It should not automatically make production changes. Human approval should still be required for actions such as: - Disabling IAM users - Rotating access keys - Blocking IPs globally - Changing Cloudflare WAF behavior - Muting Datadog monitors - Resolving PagerDuty incidents - Changing Sysdig policies - Quarantining Kubernetes workloads - Modifying production infrastructure This matters because a wrong automated containment action can create a larger operational incident than the original alert. What the AI Runner Does The AI runner is the analyst-facing command-line interface. It is what we run during daily operations. Examples: python ai runner.py triage-json samples/sample cloudtrail delete trail.json \ --use-case UC-006.3-cloudtrail-logging-disabled python ai runner.py security-signals --hours 24 python ai runner.py pagerduty --hours 24 python ai runner.py daily --hours 24 --out reports/daily soc report.md The runner coordinates the work: - Pull security data from the configured source. - Select the right SOC prompt. - Build a bounded event bundle. - Send the prompt and evidence to Ollama. - Receive structured analysis from the local model. - Print the result or write a report. - Keep the workflow repeatable. The runner is not the intelligence layer by itself. Its value is operational discipline. It prevents the analyst from manually copying logs, manually selecting prompts, manually formatting output, and manually saving results every time. What the Harness Does The harness is the control layer around the model. This is the difference between a chatbot and a SOC workflow tool. The harness handles: - Datadog API access - PagerDuty API access - Optional Sysdig API access - Use-case-specific prompts - SOC output structure - Context size limits - Model timeout configuration - Evidence-oriented analysis - Daily report generation - Read-only operating behavior - Repeatable command structure The harness gives the model boundaries. For SOC operations, this is critical. A local AI model should not receive an unbounded pile of logs and be asked, “Is anything bad?” That produces weak output and increases hallucination risk. Instead, the harness asks focused questions: - Analyze this CloudTrail event for possible defense evasion. - Summarize Datadog security signals from the last 24 hours. - Review PagerDuty incidents for security relevance. - Draft a daily SOC report from bounded evidence. - Identify missing evidence and recommended follow-up queries. The model reasons. The harness controls the task. Model Selection Strategy At first, a larger model such as qwen3:8b looked attractive because the problem involved cloud logs, security reasoning, and structured analysis. That was a reasonable starting point. Larger models can be useful when the event bundle is small and the question requires deeper reasoning. However, the target machine was an M1 MacBook Pro, not a dedicated GPU workstation. That changed the practical answer. During testing, the first small triage workflow succeeded, but the machine became sluggish. Later, the heavier daily report failed with a local Ollama timeout: ReadTimeout: HTTPConnectionPool host='127.0.0.1', port=11434 : Read timed out. read timeout=300 That error was useful because it showed: - The Python harness was running. - The harness reached Ollama on localhost. - Ollama was processing the request. - The model did not complete within the configured timeout. So the issue was not the SOC design. The issue was local inference load: model size, prompt size, timeout, and hardware limits. The model strategy was adjusted: | Task | Model | Why | |---|---|---| | Smoke testing | llama3.2:3b | Fast and stable on M1 | | Daily SOC report | llama3.2:3b | More reliable for bounded daily reporting | | Focused deeper investigation | qwen3:8b | Useful when the event bundle is smaller | | Large multi-source correlation | Avoid on M1 unless carefully limited | Can cause slowdowns or timeouts | The final default became: SOC MODEL=llama3.2:3b SOC FAST MODEL=llama3.2:3b This was the right operational tradeoff. A smaller model that finishes reliably is more useful than a larger model that freezes the analyst workstation or times out during daily operations. Hardware Constraint: The M1 MacBook Pro Matters The M1 MacBook Pro can run useful local AI workflows, but the workflow must be tuned. The main constraints were: - Local model cold start time - Memory pressure - Swap usage - Large prompt size - Long generation time - Ollama timeout - Large 24-hour log bundles The fix was not to abandon the local approach. The fix was to make the workflow smaller and more controlled: Use a smaller default model. Limit daily prompt size. Start with 6-hour reports. Increase to 24 hours after validation. Increase the Ollama timeout where needed. Avoid sending excessive raw logs to the model. Use focused use-case prompts. That is what made the solution usable. Problems We Hit and How We Fixed Them 1. ollama ps Showing Nothing When checking which model was running, ollama ps returned nothing. That does not always mean something is broken. ollama ps shows models currently loaded in memory. If the model finished and unloaded, it may show nothing. Useful checks: ollama list Shows installed models. ollama ps Shows currently loaded models. ollama run llama3.2:3b Manually starts a model. This distinction helped avoid misdiagnosing a normal Ollama state as a failure. 7. Mac was Freezing The Mac became sluggish after running the local model. The likely cause was local inference load, especially if a larger model was used. The fix was to run the smaller model first: SOC MODEL=llama3.2:3b python ai runner.py triage-json samples/sample cloudtrail delete trail.json \ --use-case UC-006.3-cloudtrail-logging-disabled For stability, Ollama can also be limited: export OLLAMA NUM PARALLEL=1 export OLLAMA MAX LOADED MODELS=1 export OLLAMA KEEP ALIVE=30m 7. Daily Report Timeout The daily command failed because the model did not return within the configured timeout: ReadTimeout: HTTPConnectionPool host='127.0.0.1', port=11434 : Read timed out. read timeout=300 The fix had three parts: - Use llama3.2:3b for daily reports. - Reduce the daily prompt size. - Increase the local model timeout where appropriate. A safer first run was: SOC MODEL=llama3.2:3b python ai runner.py daily --hours 6 --out reports/daily soc report.md Then scale to: SOC MODEL=llama3.2:3b python ai runner.py daily --hours 24 --out reports/daily soc report.md The lesson: daily reports should summarize bounded evidence, not feed unlimited raw logs into a local model. First Successful SOC Triage The first successful test used a sample CloudTrail StopLogging event. That is a meaningful test because attempts to stop CloudTrail logging may indicate defense evasion, unauthorized administrative activity, or compromised credentials. The AI produced a high-risk SOC-style result similar to: { "severity": "High", "confidence": 85, "disposition": "true positive", "summary": "Suspicious attempt to stop CloudTrail logging...", "suspicious indicators": "StopLogging event by IAM user 'svc-deploy'", "Source IP 203.0.113.45", "User agent python-requests/2.32" } This proved the core workflow: Local venv works. Dependencies are installed. AI runner executes. Harness builds the prompt. Ollama receives the request. Local model returns SOC-style analysis. The next improvement was to tighten expected output so the model always includes missing evidence and recommended follow-up queries. For production SOC use, those fields matter because they keep the analyst grounded in evidence. Example SOC Use Cases CloudTrail Logging Disabled Use case: UC-006.3-cloudtrail-logging-disabled Purpose: Investigate possible CloudTrail tampering or defense evasion. Example command: python ai runner.py datadog-query \ --query 'source:cloudtrail @evt.name: StopLogging OR DeleteTrail OR UpdateTrail OR PutEventSelectors ' \ --hours 24 \ --use-case UC-006.3-cloudtrail-logging-disabled Follow-up evidence should include: - Actor identity - Source IP - User agent - IAM permissions - Change ticket - Trail status after the event - Related IAM changes - Security Hub findings - Other Datadog signals for the same account or identity IAM Privilege Escalation Use case: UC-007-iam-privilege-escalation Example command: python ai runner.py datadog-query \ --query 'source:cloudtrail @evt.name: AttachUserPolicy OR PutUserPolicy OR CreateAccessKey OR UpdateAssumeRolePolicy OR PassRole ' \ --hours 24 \ --use-case UC-007-iam-privilege-escalation The AI should help determine whether the activity was expected administration, automated deployment behavior, or suspicious privilege escalation. Cloudflare WAF Activity Use case: UC-011-cloudflare-waf-attack Example command: python ai runner.py datadog-query \ --query 'source:cloudflare @action:block OR @action:challenge OR @security action:block ' \ --hours 24 \ --use-case UC-011-cloudflare-waf-attack The AI should summarize source distribution, attacked paths, WAF actions, spike patterns, and whether any traffic bypassed protections. Route53 DNS Firewall Activity Use case: UC-010-route53-dns-firewall-blocks Example command: python ai runner.py datadog-query \ --query 'source:route53resolverdnsfirewall OR source:route53 @action:block' \ --hours 24 \ --use-case UC-010-route53-dns-firewall-blocks The AI should help identify suspicious domains, affected workloads, recurring clients, and whether the blocked activity suggests malware, misconfiguration, or expected testing. GitHub Audit Risk Use case: UC-014-github-audit-risk Example command: python ai runner.py datadog-query \ --query 'source:github @action: deploy key OR @action: repo OR @action: workflow OR @action: branch protection ' \ --hours 24 \ --use-case UC-014-github-audit-risk The AI should focus on risky repository changes, workflow changes, deploy key activity, branch protection changes, and unusual administrative actions. Those mentioned cases are one of few. The possibility is huge here. If you can follow the architecture then success will be yours. Daily SOC Workflow The stable workflow became: 1. Start Ollama ollama serve 2. Activate the project environment cd /Users/tariqual/Documents/local ai soc analyst source .venv/bin/activate 3. Confirm model availability ollama list 4. Run a smoke test python ai runner.py triage-json samples/sample cloudtrail delete trail.json \ --use-case UC-006.3-cloudtrail-logging-disabled 5. Run a safe daily report first SOC MODEL=llama3.2:3b python ai runner.py daily --hours 6 --out reports/daily soc report.md 6. Run the full daily report after the safe run works SOC MODEL=llama3.2:3b python ai runner.py daily --hours 24 --out reports/daily soc report.md 7. Review the output as an analyst The report should be reviewed for: - P0 and P1 items - CloudTrail administrative changes - Security Hub critical or high findings - Cloudflare attack patterns - Route53 DNS Firewall blocks - SES or SNS abuse indicators - GitHub audit activity - PagerDuty incidents - Sysdig runtime alerts - Missing evidence - Recommended Datadog queries - Escalation or containment recommendations The daily report is an analyst aid. It is not an automatic incident declaration. Why This Works The final solution works because it respects both the SOC workflow and the hardware. It does not try to make the local model do everything. It uses the existing security stack correctly: Datadog detects and stores telemetry. Sysdig detects runtime policy violations. PagerDuty routes alerts. The local AI harness gathers and structures evidence. The model reasons over bounded context. The analyst makes the final decision. That is a realistic AI SOC operating model. What We Learned 1. The model is only one part of the solution A strong model without a workflow becomes a chatbot. A smaller model with a strong harness can become a useful SOC assistant. 2. Local hardware must shape the design The M1 MacBook Pro can support useful local AI workflows, but model size and prompt size must be controlled. 3. Daily SOC reporting needs summarization, not raw log dumping Large prompts cause slowdowns and timeouts. The better pattern is to query, reduce, summarize, and then report. 4. Read-only first is the right security posture The AI can recommend containment, but production changes should remain human-approved. 5. Evidence discipline matters The AI output should separate observed facts, assumptions, missing evidence, and recommended next actions. 6. The harness is the operational control plane The harness provides repeatability, guardrails, prompts, source integration, and output structure. That is what makes the solution operationally useful. Final Outcome We achieved a working local AI SOC analyst solution that fits the original problem set. The final solution: - Runs locally on an M1 MacBook Pro. - Uses Ollama as the local model runner. - Uses llama3.2:3b as the stable default model. - Allows qwen3:8b for focused deeper analysis when the machine can handle it. - Uses a Python harness to control prompts, context, and workflows. - Uses an AI runner CLI for repeatable SOC commands. - Works with Datadog, PagerDuty, and optional Sysdig integration. - Supports CloudTrail, Security Hub, Route53 DNS Firewall, SES, SNS, Cloudflare, GitHub audit, application logs, and Kubernetes-related alert review. - Produces useful triage output and daily SOC reports. - Avoids unsafe automation by keeping containment human-approved. The biggest success was not just getting a model to run locally. The success was turning local AI into a controlled SOC workflow that works despite hardware limitations. That is the practical path for introducing AI into security operations: start with a real problem, keep the architecture simple, control the blast radius, tune for the hardware, and make the analyst workflow better.