{"slug": "the-counteroffensive-automated-spam-reporting-with-spamhaus", "title": "The Counteroffensive: Automated Spam Reporting with Spamhaus", "summary": "A developer created a Python script that automates spam reporting to Spamhaus by scanning a Junk folder via IMAP, extracting sending IPs, domains, and URLs from authenticated spam messages, and submitting them to Spamhaus's REST API. The script uses a custom IMAP keyword flag to track processed messages and includes a capability test to ensure the server supports custom keywords before running. This automation enables continuous reporting of spam infrastructure that evades standard email authentication filters like SPF, DKIM, and DMARC.", "body_md": "*How to go from finding spam in your inbox to automatically reporting the infrastructure behind it*\n\nIn my [previous article](https://dev.to/battlehardened/why-your-email-is-an-open-door-for-spammers-and-how-to-lock-it-1k1n), I covered how to harden your email domain with SPF, DKIM, and DMARC. The configuration works well. It kills the vast majority of inbound spam before it ever reaches your device.\n\nBut there's a category of spam those tools can't touch: mail from operators who set up authentication correctly, on purpose, specifically to evade your filters. Fully authenticated. Low spam scores. Rotating domains across a dozen TLDs. AI-generated cover text to confuse content classifiers.\n\nThat mail gets through. It lands in your Junk folder, caught by client-side content analysis. And there it sits.\n\nMarking it as junk and deleting it is the wrong response. That spam is coming from infrastructure that Spamhaus may not know about yet — and Spamhaus is how blocklists get built that protect everyone. If you have evidence of an active spam campaign, reporting it is the right move.\n\nThe problem is that reporting spam manually is tedious. Spamhaus has a submission portal, but visiting it for each individual message is not a workflow anyone will sustain.\n\nThis article covers how to automate it.\n\nA Python script that:\n\nThe script uses a custom IMAP keyword flag (`$SpamhausProcessed`\n\n) to track which messages have already been processed. This means:\n\nOn startup, the script runs a functional capability test — it attempts to set and immediately remove a test flag on the first available message. If your server doesn't support custom keywords, the script aborts cleanly rather than failing silently mid-run.\n\nA message is flagged as processed once it has been examined, regardless of whether individual API submissions succeeded. This is an intentional design choice: it prevents the script from reprocessing the same message indefinitely if a single indicator fails. Spamhaus returns HTTP 208 for already-known indicators, which handles any duplicate submissions across runs gracefully.\n\n**Spamhaus account and API token:**\n\nRegister at [submit.spamhaus.org](https://submit.spamhaus.org), then go to [auth.spamhaus.org/account](https://auth.spamhaus.org/account), scroll to \"API Key Creation\", and create a key. Copy it immediately — it's only shown once.\n\n**Python dependencies:**\n\n```\npip install bs4 requests\n```\n\n**Environment variables:**\n\n```\nexport IMAP_SERVER=mail.example.com\nexport IMAP_PORT=993\nexport IMAP_USER=you@example.com\nexport IMAP_PASSWORD=your_imap_password\nexport SPAMHAUS_TOKEN=your_spamhaus_api_token\n```\n\nOptional variables:\n\n```\nexport IMAP_FOLDER=Junk      # folder to watch (default: Junk)\nexport DRY_RUN=1             # parse without submitting or flagging\nexport DELAY=2               # seconds between new API submissions (default: 2)\nexport VERBOSE_LIST=1        # log every submission with its status\n```\n\nSpamhaus exposes a REST API at `https://submit.spamhaus.org/portal/api/v1`\n\n. All requests require a Bearer token header.\n\nFour submission types are relevant here:\n\n| Endpoint | Threat type | What it submits |\n|---|---|---|\n`POST submissions/add/ip` |\n`spam` |\nSending IP address |\n`POST submissions/add/domain` |\n`spam` |\nSending or landing domain |\n`POST submissions/add/url` |\n`scam` |\nMalicious URL from message body |\n`POST submissions/add/email` |\n`spam` |\nRaw email as evidence |\n\n**On threat type codes:** The threat types used here are conservative defaults — `spam`\n\nfor IPs and domains, `scam`\n\nfor URLs. Stronger classifications like `bulletproof`\n\nor `phish`\n\nrequire evidence beyond what's available from a single message. The API documentation shows example codes that don't always work for your account tier. Verify valid codes first:\n\n```\ncurl -s -H \"Authorization: Bearer $SPAMHAUS_TOKEN\" \\\n  https://submit.spamhaus.org/portal/api/v1/lookup/threats-types\n```\n\nConservative classifications aren't a weakness — Spamhaus has far more context than any individual submitter and will reclassify based on their own intelligence. What carries the most weight is the raw email submission. The full message gives analysts everything: the authentication chain in the headers, the sending infrastructure, the evasion techniques in the HTML, and the campaign fingerprint in the MIME structure. A precise threat type label matters far less than giving Spamhaus the evidence to make that determination themselves.\n\n**Rate limiting:** The API returns HTTP 429 when you exceed your submission rate. The script retries up to 3 times with a 60-second wait between attempts.\n\n**HTTP 208 means already reported.** If you submit something Spamhaus already has, they return 208. This is not an error — the script logs it as \"already reported\" and moves on. No sleep is applied on 208 responses; the delay only fires on successful new submissions (200) to pace actual API writes.\n\nWhen a message arrives at your mail server, your MTA performs an SPF check and writes a `Received-SPF`\n\nheader recording the result. That header contains `client-ip=`\n\n— the IP address of the server that connected to deliver the message. That's the sending IP we want.\n\nThe script reads only the topmost `Received-SPF`\n\nheader because headers are prepended on arrival — the topmost one was written by your server when the message came in, and is the only one you can trust. Lower headers could have been injected by the spammer before sending, forged to make the mail look like it came from somewhere legitimate.\n\n```\nspf_headers = msg.get_all('Received-SPF') or []\nif spf_headers:\n    match = re.search(r'client-ip=([0-9a-fA-F.:]+)', str(spf_headers[0]))\n```\n\nIf no `Received-SPF`\n\nheader is present, no IP is extracted and the IP submission is skipped. The alternative — walking the `Received`\n\nchain — risks reporting a legitimate forwarding service or ESP as the spam source. Domain, URL, and email submissions still proceed regardless.\n\nPrivate, loopback, link-local, and reserved addresses are filtered using Python's `ipaddress`\n\nmodule, which covers the full RFC 1918/4193/6598 range correctly.\n\nDomains are extracted from four sources for maximum coverage:\n\n`From`\n\n, `Reply-To`\n\n, and `Return-Path`\n\nheaders using `email.utils.getaddresses`\n\nfor RFC-compliant address parsing`DKIM-Signature d=`\n\ntag, which identifies the signing domain regardless of what `From`\n\nclaimsThe primary domain (used as the anchor for the raw email submission) prefers DKIM `d=`\n\nover `Return-Path`\n\n. Spammers often separate these deliberately — DKIM signs for the infrastructure domain while `Return-Path`\n\nuses a throwaway address.\n\nAll domains are IDNA-normalized before submission to collapse internationalized variants.\n\nURLs are extracted from the HTML body and normalized before deduplication:\n\n`utm_*`\n\n, `fbclid`\n\n, `gclid`\n\n, etc.) are stripped`?b=2&a=1`\n\nand `?a=1&b=2`\n\ndeduplicate correctly`:80`\n\n, `:443`\n\n) are strippedUnsubscribe links are skipped. Landing domains are extracted from each URL and submitted as domain indicators alongside the full URL — in spam campaigns, the URL domain is often the highest-value IOC.\n\nSPF, DKIM, and DMARC results are parsed from the topmost `Authentication-Results`\n\nheader. Line folding is stripped before parsing so compound headers on multiple lines are read correctly. Results inform the submission reason string but do not change the threat type — authentication success alone doesn't imply intent.\n\nFor each sending IP that passes the deduplication check, the script queries RIPE Stat (which aggregates all five RIRs globally) to get the network name, organization, and country. This enriches the submission reason with real infrastructure data:\n\n```\nSpam source. RIR: netname=EXAMPLE-NET org=Example Hosting Ltd country=XX.\nAuth: spf=pass dkim=pass dmarc=pass (p=none). Found in Junk folder.\n```\n\nResults are cached using `lru_cache(maxsize=2048)`\n\n. Each IP lookup makes an HTTP request to RIPE Stat — without caching, a batch of 50 messages from the same sending IP would trigger 50 identical network requests. With caching, the first call for a given IP hits the network and stores the result; every subsequent call with the same IP returns the stored result instantly.\n\nThe `maxsize=2048`\n\ncap prevents unbounded memory growth in daemon mode. Without a limit, the cache accumulates one entry per unique IP seen since the script started — a slow memory leak over weeks of continuous operation. Once 2048 entries are cached, the least recently used are evicted to make room for new ones. For a personal inbox this limit is effectively never reached, but it's the right engineering choice regardless.\n\nThe lookup is deferred until after the deduplication check — no network I/O for IPs already seen in the current run.\n\nThree layers work together:\n\nWithin a single run, a `state_tracker`\n\ndict holds sets of already-seen IPs, domains, URLs, and email domains. The same indicator is only submitted once per run regardless of how many messages contain it.\n\nAcross runs, the IMAP flag on each message means already-processed messages are skipped entirely on the next run.\n\nAt the Spamhaus level, HTTP 208 handles any indicators that slip through — the API is idempotent.\n\nSave this as `spam-monitor.py`\n\n(The latest version is on [Github](https://github.com/Sageth/spamhaus-reporting):\n\n``` bash\n#!/usr/bin/env python3\n\"\"\"\nspam-monitor.py — Automated spam analysis and Spamhaus submission\n\nMonitors an IMAP Junk folder for spam, extracts infrastructure indicators,\nand submits them to the Spamhaus API. Uses a custom IMAP flag for state\ntracking — no local database or flat files required.\n\nRequired environment variables:\n    IMAP_SERVER      — e.g. mail.example.com\n    IMAP_PORT        — e.g. 993 (default)\n    IMAP_USER        — your full email address\n    IMAP_PASSWORD    — your IMAP password\n    SPAMHAUS_TOKEN   — your Spamhaus submission API token\n\nOptional environment variables:\n    IMAP_FOLDER      — folder to watch (default: Junk)\n    DRY_RUN          — set to \"1\" to parse without submitting (default: 0)\n    DELAY            — seconds between API calls (default: 2)\n    VERBOSE_LIST     — set to \"1\" to log every submission with its status (default: 0)\n\nUsage:\n    python3 spam-monitor.py             # run once\n    python3 spam-monitor.py --daemon    # run continuously\n    DRY_RUN=1 python3 spam-monitor.py   # dry run\n\"\"\"\n\nimport imaplib\nimport email\nimport email.policy\nimport os\nimport re\nimport sys\nimport json\nimport time\nimport logging\nimport argparse\nimport socket\nimport ipaddress\nimport urllib.request\nimport requests\nfrom collections import defaultdict\nfrom email.utils import getaddresses\nfrom functools import lru_cache\nfrom urllib.parse import urlparse, urlencode, parse_qsl, urlunparse\nfrom bs4 import BeautifulSoup\n\n# ─────────────────────────────────────────────\n# CONFIGURATION FROM ENVIRONMENT\n# ─────────────────────────────────────────────\n\nIMAP_SERVER    = os.environ.get('IMAP_SERVER', '')\nIMAP_PORT      = int(os.environ.get('IMAP_PORT', 993))\nIMAP_USER      = os.environ.get('IMAP_USER', '')\nIMAP_PASSWORD  = os.environ.get('IMAP_PASSWORD', '')\nSPAMHAUS_TOKEN = os.environ.get('SPAMHAUS_TOKEN', '')\nIMAP_FOLDER    = os.environ.get('IMAP_FOLDER', 'Junk')\nDRY_RUN        = os.environ.get('DRY_RUN', '0').strip() == '1'\nDELAY          = float(os.environ.get('DELAY', '2'))\nVERBOSE_LIST   = os.environ.get('VERBOSE_LIST', '0').strip() == '1'\n\nSPAMHAUS_API    = 'https://submit.spamhaus.org/portal/api/v1'\nRIR_API         = 'https://stat.ripe.net/data/whois/data.json'\n\nPROCESSED_FLAG  = '$SpamhausProcessed'\nCAPABILITY_FLAG = '$SpamhausCapabilityTest'\n\n_TRACKING_PARAMS = frozenset({\n    'utm_source', 'utm_medium', 'utm_campaign', 'utm_term', 'utm_content',\n    'fbclid', 'gclid', 'msclkid', 'mc_eid', 'mc_cid',\n})\n\nsocket.setdefaulttimeout(60)\n\n# ─────────────────────────────────────────────\n# LOGGING\n# ─────────────────────────────────────────────\n\nlogging.basicConfig(\n    level=logging.INFO,\n    format='%(asctime)s %(levelname)s %(message)s',\n    datefmt='%Y-%m-%d %H:%M:%S'\n)\nlog = logging.getLogger(__name__)\n\n# ─────────────────────────────────────────────\n# UTILITIES\n# ─────────────────────────────────────────────\n\ndef _normalize_domain(domain):\n    if not domain:\n        return ''\n    try:\n        return domain.strip().encode('idna').decode('ascii').lower()\n    except Exception:\n        return domain.strip().lower()\n\ndef _is_internal_ip(ip):\n    try:\n        return _is_internal_addr(ipaddress.ip_address(ip))\n    except ValueError:\n        return True\n\ndef _is_internal_addr(addr):\n    return (addr.is_private or addr.is_loopback or\n            addr.is_link_local or addr.is_reserved)\n\n# ─────────────────────────────────────────────\n# EMAIL PARSING\n# ─────────────────────────────────────────────\n\ndef extract_sending_ip(msg):\n    spf_headers = msg.get_all('Received-SPF') or []\n    if spf_headers:\n        match = re.search(r'client-ip=([0-9a-fA-F.:]+)', str(spf_headers[0]))\n        if match:\n            ip = match.group(1).strip()\n            if not _is_internal_ip(ip):\n                return ip\n    return None\n\ndef extract_envelope_domains(msg):\n    domains = set()\n    for field in ('From', 'Reply-To', 'Return-Path'):\n        headers_raw = [str(h) for h in (msg.get_all(field) or [])]\n        for _, addr in getaddresses(headers_raw):\n            if '@' in addr:\n                domain = _normalize_domain(addr.rsplit('@', 1)[1])\n                if domain:\n                    domains.add(domain)\n    for dkim_header in msg.get_all('DKIM-Signature') or []:\n        flat = re.sub(r'\\s+', '', str(dkim_header))\n        match = re.search(r'\\bd=([a-zA-Z0-9.-]+\\.[a-zA-Z]{2,})', flat, re.IGNORECASE)\n        if match:\n            domains.add(_normalize_domain(match.group(1)))\n    return domains\n\ndef extract_primary_domain(msg):\n    for dkim_header in msg.get_all('DKIM-Signature') or []:\n        flat = re.sub(r'\\s+', '', str(dkim_header))\n        match = re.search(r'\\bd=([a-zA-Z0-9.-]+\\.[a-zA-Z]{2,})', flat, re.IGNORECASE)\n        if match:\n            return _normalize_domain(match.group(1))\n    headers_raw = [str(h) for h in (msg.get_all('Return-Path') or [])]\n    for _, addr in getaddresses(headers_raw):\n        if '@' in addr:\n            return _normalize_domain(addr.rsplit('@', 1)[1])\n    return None\n\ndef extract_auth_results(msg):\n    auth_headers = msg.get_all('Authentication-Results') or []\n    if not auth_headers:\n        return {'spf': 'unknown', 'dkim': 'unknown', 'dmarc': 'unknown', 'dmarc_policy': 'unknown'}\n    auth = re.sub(r'\\s+', ' ', str(auth_headers[0]))\n    def extract(pattern):\n        m = re.search(pattern, auth, re.IGNORECASE)\n        return m.group(1).lower() if m else 'unknown'\n    spf          = extract(r'\\bspf=(pass|fail|softfail|neutral|none|permerror|temperror)\\b')\n    dkim         = extract(r'\\bdkim=(pass|fail|none|policy|neutral|temperror|permerror)\\b')\n    dmarc        = extract(r'\\bdmarc=(pass|fail|none|bestguesspass|temperror|permerror)\\b')\n    dmarc_policy = extract(r'\\b(?:policy\\.[A-Za-z_-]*|p)=([A-Za-z]+)')\n    return {'spf': spf, 'dkim': dkim, 'dmarc': dmarc, 'dmarc_policy': dmarc_policy}\n\ndef normalize_url(href):\n    try:\n        parsed = urlparse(href)\n        port = parsed.port\n        clean_params = sorted(\n            (k, v) for k, v in parse_qsl(parsed.query)\n            if k.lower() not in _TRACKING_PARAMS\n        )\n        hostname = _normalize_domain(parsed.hostname or '')\n        if not hostname:\n            return None\n        if (parsed.scheme == 'https' and port == 443) or (parsed.scheme == 'http' and port == 80):\n            port = None\n        netloc = hostname if port is None else f'{hostname}:{port}'\n        return urlunparse(parsed._replace(netloc=netloc, query=urlencode(clean_params)))\n    except Exception:\n        return None\n\ndef extract_cta_urls(msg):\n    urls = set()\n    for part in msg.walk():\n        if part.get_content_type() == 'text/html':\n            soup = None\n            try:\n                html = part.get_payload(decode=True).decode('utf-8', errors='ignore')\n                soup = BeautifulSoup(html, 'html.parser')\n                for a in soup.find_all('a', href=True):\n                    href = a['href'].strip()\n                    if not href.startswith(('http://', 'https://')):\n                        continue\n                    if any(s in href.lower() for s in ('unsub', 'optout', 'opt-out', 'remove', 'list-unsubscribe')):\n                        continue\n                    normalized = normalize_url(href)\n                    if normalized:\n                        urls.add(normalized)\n            except Exception as e:\n                log.debug(f'URL extraction error: {e}')\n            finally:\n                if soup:\n                    soup.decompose()\n    return list(urls)\n\n@lru_cache(maxsize=2048)\ndef rir_lookup(ip):\n    if not ip:\n        return {}\n    try:\n        url = f'{RIR_API}?resource={ip}'\n        req = urllib.request.Request(url, headers={'Accept': 'application/json'})\n        with urllib.request.urlopen(req, timeout=8) as resp:\n            data = json.loads(resp.read())\n        records = data.get('data', {}).get('records', [])\n        result = {}\n        for group in records:\n            for record in group:\n                key = record.get('key', '').lower()\n                if key in ('netname', 'org', 'country', 'descr'):\n                    result[key] = record.get('value', '')\n        return result\n    except Exception as e:\n        log.debug(f'RIR lookup failed for {ip}: {e}')\n        return {}\n\ndef parse_message(raw_bytes):\n    msg = email.message_from_bytes(raw_bytes, policy=email.policy.default)\n    return {\n        'ip':               extract_sending_ip(msg),\n        'primary_domain':   extract_primary_domain(msg),\n        'envelope_domains': extract_envelope_domains(msg),\n        'urls':             extract_cta_urls(msg),\n        'auth':             extract_auth_results(msg),\n        'subject':          str(msg.get('Subject', '')),\n        'rspamd':           str(msg.get('X-Rspamd-Score', 'N/A')),\n    }\n\n# ─────────────────────────────────────────────\n# SPAMHAUS API\n# ─────────────────────────────────────────────\n\nTHREAT_IP     = 'spam'\nTHREAT_DOMAIN = 'spam'\nTHREAT_URL    = 'scam'\nTHREAT_EMAIL  = 'spam'\n\nREASON_IP = lambda ripe, auth: (\n    f'Spam source. RIR: netname={ripe.get(\"netname\",\"unknown\")} '\n    f'org={ripe.get(\"org\", ripe.get(\"descr\",\"unknown\"))} '\n    f'country={ripe.get(\"country\",\"unknown\")}. '\n    f'Auth: spf={auth.get(\"spf\")} dkim={auth.get(\"dkim\")} '\n    f'dmarc={auth.get(\"dmarc\")} (p={auth.get(\"dmarc_policy\",\"unknown\")}). '\n    f'Found in Junk folder.'\n)\nREASON_DOMAIN = 'Spam domain found in Junk folder.'\nREASON_URL    = 'Scam URL extracted from spam email body.'\nREASON_EMAIL  = 'Spam email found in Junk folder.'\n\ndef spamhaus_request(endpoint, payload=None, method='POST', retries=3):\n    url     = f'{SPAMHAUS_API}/{endpoint}'\n    headers = {'Authorization': f'Bearer {SPAMHAUS_TOKEN}'}\n    for attempt in range(1, retries + 1):\n        try:\n            resp = requests.request(\n                method, url,\n                headers=headers,\n                json=payload if payload is not None else None,\n                timeout=30\n            )\n            if resp.status_code == 429:\n                log.warning(f'Rate limited — waiting 60s (attempt {attempt}/{retries})')\n                time.sleep(60)\n                continue\n            elif resp.status_code == 208:\n                return 208, resp.json() if resp.text else {}\n            elif not resp.ok:\n                try:\n                    err_payload = resp.json()\n                except Exception:\n                    err_payload = {'error': resp.text}\n                log.error(f'HTTP {resp.status_code}: {err_payload}')\n                return resp.status_code, err_payload\n            return resp.status_code, resp.json() if resp.text else {}\n        except Exception as e:\n            log.error(f'Request error: {e}')\n            return 0, {}\n    return 429, {'message': 'rate limit retries exhausted'}\n\ndef submit(submission_type, key, object_value, threat_type, reason):\n    label = key.replace('email:', '') if submission_type == 'email' else key\n    if DRY_RUN:\n        log.info(f'  [DRY RUN] Would submit {submission_type.upper()}: {label}')\n        return\n    status, body = spamhaus_request(f'submissions/add/{submission_type}', {\n        'threat_type': threat_type,\n        'reason': reason,\n        'source': {'object': object_value}\n    })\n    if status in (200, 208):\n        log.info(f'  {submission_type.upper()} {label} — {\"OK\" if status == 200 else \"already reported\"}')\n        if status == 200:\n            time.sleep(DELAY)\n    else:\n        log.warning(f'  {submission_type.upper()} {label} — failed ({status}): {body}')\n\ndef check_submission_count():\n    status, data = spamhaus_request('submissions/count', method='GET')\n    if status != 200:\n        log.warning(f'Could not fetch submission count: HTTP {status}')\n        return\n    total       = data.get('total', 0)\n    matched     = data.get('matched', 0)\n    new         = total - matched\n    pct_matched = int(matched / total * 100) if total else 0\n    pct_new     = int(new / total * 100) if total else 0\n    log.info(\n        f'Spamhaus totals (30 days): {total} submitted — '\n        f'{matched} corroborated ({pct_matched}%), '\n        f'{new} new intelligence ({pct_new}%)'\n    )\n    status, items = spamhaus_request('submissions/list?items=10000', method='GET')\n    if status != 200:\n        log.warning(f'Could not fetch submissions list: HTTP {status}')\n        return\n    groups = defaultdict(lambda: {'listed': 0, 'checked': 0, 'pending': 0})\n    for item in items:\n        t = item.get('submission_type', 'unknown')\n        if item.get('listed'):\n            groups[t]['listed'] += 1\n        elif item.get('last_check'):\n            groups[t]['checked'] += 1\n        else:\n            groups[t]['pending'] += 1\n    for t, counts in sorted(groups.items()):\n        log.info(\n            f'  {t.upper()}: {counts[\"listed\"]} listed, '\n            f'{counts[\"checked\"]} checked/not listed, '\n            f'{counts[\"pending\"]} pending'\n        )\n    if VERBOSE_LIST:\n        log.info('--- Verbose submission list ---')\n        for item in items:\n            stype = item.get('submission_type', '?')\n            if stype == 'email':\n                obj = item.get('attributes', {}).get('subject', '(no subject)')\n            else:\n                obj = item.get('source', {}).get('object', '?')\n            listed = item.get('listed')\n            if listed:\n                status_str = f'listed: {\", \".join(listed)}'\n            elif item.get('last_check'):\n                status_str = 'checked, not listed'\n            else:\n                status_str = 'pending review'\n            log.info(f'  {stype.upper()} {obj} — {status_str}')\n\n# ─────────────────────────────────────────────\n# PROCESSING\n# ─────────────────────────────────────────────\n\ndef process_message(raw_bytes, state_tracker):\n    parsed = parse_message(raw_bytes)\n    auth   = parsed['auth']\n\n    log.info(f'  IP={parsed[\"ip\"]} primary_domain={parsed[\"primary_domain\"]}')\n    log.info(f'  Subject: {parsed[\"subject\"]}')\n    log.info(f'  Rspamd: {parsed[\"rspamd\"]}')\n    log.info(f'  Auth: spf={auth.get(\"spf\")} dkim={auth.get(\"dkim\")} dmarc={auth.get(\"dmarc\")} (p={auth.get(\"dmarc_policy\")})')\n\n    if parsed['ip'] and parsed['ip'] not in state_tracker['ips']:\n        state_tracker['ips'].add(parsed['ip'])\n        ripe = rir_lookup(parsed['ip'])\n        if ripe:\n            log.info(f'  RIR: netname={ripe.get(\"netname\")} country={ripe.get(\"country\")}')\n        submit('ip', parsed['ip'], parsed['ip'], THREAT_IP, REASON_IP(ripe, auth))\n\n    for domain in parsed['envelope_domains']:\n        if domain not in state_tracker['domains']:\n            state_tracker['domains'].add(domain)\n            submit('domain', domain, domain, THREAT_DOMAIN, REASON_DOMAIN)\n\n    if parsed['primary_domain'] and parsed['primary_domain'] not in state_tracker['emails']:\n        state_tracker['emails'].add(parsed['primary_domain'])\n        key = f'email:{parsed[\"primary_domain\"]}'\n        MAX_EMAIL_BYTES = 1024 * 1024\n        email_sample = raw_bytes[:MAX_EMAIL_BYTES].decode('utf-8', errors='replace')\n        submit('email', key, email_sample, THREAT_EMAIL, REASON_EMAIL)\n\n    for url in parsed['urls']:\n        if url not in state_tracker['urls']:\n            state_tracker['urls'].add(url)\n            submit('url', url, url, THREAT_URL, REASON_URL)\n        try:\n            hostname = _normalize_domain(urlparse(url).hostname or '')\n            if hostname and hostname not in parsed['envelope_domains'] and hostname not in state_tracker['domains']:\n                state_tracker['domains'].add(hostname)\n                submit('domain', hostname, hostname, THREAT_DOMAIN,\n                       f'Landing domain extracted from spam URL. {REASON_DOMAIN}')\n        except Exception as e:\n            log.debug(f'Could not extract landing domain from URL: {e}')\n\n# ─────────────────────────────────────────────\n# IMAP\n# ─────────────────────────────────────────────\n\ndef connect_imap():\n    conn = imaplib.IMAP4_SSL(IMAP_SERVER, IMAP_PORT, timeout=60)\n    conn.login(IMAP_USER, IMAP_PASSWORD)\n    log.info(f'Connected to {IMAP_SERVER}:{IMAP_PORT} as {IMAP_USER}')\n    return conn\n\ndef run_once():\n    if not all([IMAP_SERVER, IMAP_USER, IMAP_PASSWORD, SPAMHAUS_TOKEN]):\n        log.error('Missing required environment variables.')\n        sys.exit(1)\n\n    if DRY_RUN:\n        log.info('*** DRY RUN mode — no submissions or flags will be applied ***')\n\n    conn = None\n    total_processed = 0\n\n    try:\n        conn = connect_imap()\n\n        if conn.select(f'\"{IMAP_FOLDER}\"', readonly=False)[0] != 'OK':\n            log.error(f'Could not select folder: {IMAP_FOLDER}')\n            return\n\n        status, data = conn.uid('search', None, f'NOT KEYWORD {PROCESSED_FLAG}')\n        if status != 'OK' or not data[0]:\n            log.info(f'Folder {IMAP_FOLDER}: No unprocessed messages.')\n            return\n\n        uids = data[0].split()\n        log.info(f'Folder {IMAP_FOLDER}: {len(uids)} unprocessed message(s)')\n\n        if not DRY_RUN:\n            test_status, _ = conn.uid('store', uids[0], '+FLAGS', CAPABILITY_FLAG)\n            if test_status != 'OK':\n                log.critical('IMAP server rejected custom keyword flags — cannot track state. Aborting.')\n                return\n            try:\n                conn.uid('store', uids[0], '-FLAGS', CAPABILITY_FLAG)\n            except Exception:\n                pass\n\n        state_tracker = {'ips': set(), 'domains': set(), 'urls': set(), 'emails': set()}\n\n        for uid in uids:\n            status, msg_data = conn.uid('fetch', uid, '(RFC822)')\n            if status != 'OK' or not msg_data or not msg_data[0]:\n                continue\n\n            raw_bytes = msg_data[0][1]\n            log.info(f'Processing message UID {uid.decode()}')\n\n            try:\n                process_message(raw_bytes, state_tracker)\n                total_processed += 1\n                if not DRY_RUN:\n                    conn.uid('store', uid, '+FLAGS', PROCESSED_FLAG)\n                    log.info(f'  Flagged message UID {uid.decode()} as processed')\n            except Exception as e:\n                log.error(f'  Failed to process message UID {uid.decode()}: {e}')\n\n    finally:\n        log.info(f'Done. {total_processed} message(s) processed.')\n        if conn:\n            if total_processed:\n                try:\n                    check_submission_count()\n                except Exception as e:\n                    log.error(f'Could not fetch submission count: {e}')\n            try:\n                conn.logout()\n            except Exception:\n                pass\n\ndef run_daemon(interval=300):\n    log.info(f'Daemon mode — checking every {interval}s')\n    while True:\n        try:\n            run_once()\n        except Exception as e:\n            log.error(f'Error in run loop: {e}')\n        log.info(f'Sleeping {interval}s...')\n        time.sleep(interval)\n\n# ─────────────────────────────────────────────\n# ENTRY POINT\n# ─────────────────────────────────────────────\n\nif __name__ == '__main__':\n    parser = argparse.ArgumentParser(description='Spam monitor and Spamhaus submitter')\n    parser.add_argument('--daemon', action='store_true', help='Run continuously')\n    parser.add_argument('--interval', type=int, default=300,\n                        help='Daemon check interval in seconds (default: 300)')\n    args = parser.parse_args()\n\n    if args.daemon:\n        run_daemon(args.interval)\n    else:\n        run_once()\n```\n\n**Always dry run first:**\n\n```\nDRY_RUN=1 python3 spam-monitor.py\n```\n\nThis parses every message and logs what would be submitted without touching the API or setting any flags. Check the output carefully before running live.\n\n**Single run:**\n\n```\npython3 spam-monitor.py\n```\n\n**Daemon mode (checks every 5 minutes):**\n\n```\npython3 spam-monitor.py --daemon --interval 300\n```\n\n**Cron job (every 10 minutes):**\n\n```\n*/10 * * * * cd /path/to/script && python3 spam-monitor.py\n```\n\n**Full submission detail:**\n\n```\nVERBOSE_LIST=1 python3 spam-monitor.py\n```\n\nAfter each run that processes at least one message, the script logs a summary of your Spamhaus submissions for the past 30 days, broken down by type and listing status:\n\n```\nSpamhaus totals (30 days): 312 submitted — 187 corroborated (59%), 125 new intelligence (40%)\n  DOMAIN: 84 listed, 12 checked/not listed, 7 pending\n  EMAIL: 41 listed, 8 checked/not listed, 3 pending\n  IP: 73 listed, 11 checked/not listed, 5 pending\n  URL: 35 listed, 6 checked/not listed, 4 pending\n```\n\nThis script is designed for personal use — a single inbox, running periodically, low submission volume. It works well in that context. A few things to be aware of:\n\n**IP extraction requires Received-SPF.** The script only extracts IPs from the topmost\n\n`Received-SPF`\n\nheader, which your MTA writes on arrival. If that header is absent — unusual on modern mail providers but possible on misconfigured servers — no IP is submitted. The `Received`\n\nchain is not used as a fallback because it risks reporting legitimate forwarding infrastructure.**Domains from envelope headers may include spoofed legitimate domains.** If a message spoofs `paypal.com`\n\nin the `From`\n\nheader and your server doesn't drop it, the script will attempt to report it. Spamhaus's analyst review process handles false positives, but it's worth monitoring your submission acceptance rate.\n\n**URL landing domains may be legitimate redirectors.** CDN hostnames, link shorteners, and ESP tracking domains sometimes appear in spam. The script submits them — whether that's useful depends on the campaign.\n\n**State is tied to IMAP keyword support.** Most modern IMAP servers support custom keywords (Dovecot, Cyrus, Gmail). Some hosted providers don't. The script tests for support at startup and aborts if the server rejects the flag.\n\n**This is a personal-use tool, not an enterprise pipeline.** SQLite-backed state, archive-instead-of-delete retention, and multi-account support are the natural next steps if you outgrow it. Follow [github.com/Sageth/spamhaus-reporting](https://github.com/Sageth/spamhaus-reporting) for updates.\n\nYour submissions are breadcrumbs, not the whole map. Spamhaus has vastly better detection infrastructure than any individual submitter. What you're providing is timing — you're reporting infrastructure while it's actively sending, not after the campaign has ended.\n\nThe RIR enrichment matters more than the reason text. Spamhaus analysts can see that a netname maps to a specific hosting provider known for bulletproof services. That infrastructure context is more useful than a paragraph of narrative about what the spam said.\n\nSubmitting the raw email alongside the IP and domain gives analysts the full picture — headers showing the authentication chain, HTML showing the evasion techniques, MIME structure showing the hidden text.\n\n**What the blocklists do:**\n\nThe Spamhaus Blocklist (SBL) lists IP addresses observed sending spam or hosting malicious infrastructure. Mail servers that check Spamhaus reject connections from listed IPs at the SMTP level — before a message is even accepted. A listed IP can't deliver mail to anyone using Spamhaus-backed filtering until the operator cleans up their act and gets delisted.\n\nThe Domain Blocklist (DBL) lists domains observed in spam campaigns — sending domains, hosting domains, and URLs found in message bodies. DBL listings propagate into DNS firewalls, email security products, and browser filters. A listed domain gets blocked across every product that queries Spamhaus, not just email.\n\nThe Hash Blocklist (HBL) lists cryptographic hashes of malicious content — email addresses, file hashes, cryptocurrency wallet addresses. Less visible in day-to-day reporting, but your raw email submissions contribute to it.\n\n**What the submission statuses mean:**\n\nAfter submission, Spamhaus reviews each indicator and returns one of three statuses:\n\nThe corroboration percentage in the summary log shows how many of your submissions matched intelligence Spamhaus already had. A high corroboration rate means you're seeing the same infrastructure they're already tracking — your timing confirmation is still useful. A high new intelligence rate means you're getting there first.\n\nIf a range gets listed on SBL, the operator has to spin up new infrastructure, acquire new IP space, reconfigure sending, and rebuild sending reputation from scratch. If a domain gets listed on DBL, they need new domains, new DNS, new authentication records. That costs time and money, and it's the friction that makes automated spam campaigns expensive to sustain.\n\n*See also: Why Your Email Is an Open Door for Spammers — And How to Lock It*", "url": "https://wpnews.pro/news/the-counteroffensive-automated-spam-reporting-with-spamhaus", "canonical_source": "https://dev.to/battlehardened/the-counteroffensive-automated-spam-reporting-with-spamhaus-j6e", "published_at": "2026-06-04 01:03:39+00:00", "updated_at": "2026-06-04 01:42:20.313247+00:00", "lang": "en", "topics": ["ai-tools"], "entities": ["Spamhaus", "Python"], "alternates": {"html": "https://wpnews.pro/news/the-counteroffensive-automated-spam-reporting-with-spamhaus", "markdown": "https://wpnews.pro/news/the-counteroffensive-automated-spam-reporting-with-spamhaus.md", "text": "https://wpnews.pro/news/the-counteroffensive-automated-spam-reporting-with-spamhaus.txt", "jsonld": "https://wpnews.pro/news/the-counteroffensive-automated-spam-reporting-with-spamhaus.jsonld"}}