{"slug": "building-a-daily-google-news-api-monitor-in-python", "title": "Building a Daily Google News API Monitor in Python", "summary": "Here is a factual summary of the article:\n\nThe article describes a Python-based tool that monitors Google News for brand mentions using the SearchApi.io API. Each morning, the tool fetches news articles for specified keywords, enriches them with sentiment analysis and summaries via OpenAI, stores the data in a SQLite database, and sends results through Slack or a web dashboard. The project consists of approximately 1,000 lines of Python code across ten files, structured as a Flask app with a single HTML dashboard.", "body_md": "I wanted a small, local tool that would search the news for brand mentions. I didnt want to pay over $100 a month so I decided to build my own.\n\nWhat I created was a tool that would search the news with a **Google News API** every morning for a list of keywords, run each result through a LLM for sentiment and a one-sentence summary, save everything to SQLite, and ping me the results on Slack/a web app.\n\nThe whole project came out to about 1,000 lines of Python across ten files. It is a Flask app with a SQLite database and a single HTML dashboard.\n\nHere's how it's wired together, with the code that matters from each layer.\n\nRepo:[google-news-monitor on GitHub]. Install instructions at the bottom.\n\n## The pipeline\n\nThe whole tool is one pipeline:\n\n```\nkeyword → Google News API → OpenAI enrichment → SQLite → (dashboard | REST | CLI | Slack)\n```\n\nEvery interface (the dashboard form, the REST API, the CLI, the daily cron) ends up calling the same `process_keyword()`\n\nfunction. Here is the entire core loop, from `monitor/pipeline.py`\n\n:\n\n``` python\ndef process_keyword(keyword, num=30, when=\"1d\", gl=None):\n    keyword = keyword.strip()\n    fetched = search.fetch_google_news(keyword, num=num, when=when, gl=gl)\n\n    new_count = 0\n    for art in fetched:\n        if not art.get(\"url\"):\n            continue\n        ai_result = ai.enrich_article(keyword, art.get(\"title\") or \"\",\n                                      art.get(\"snippet\") or \"\")\n        row = {\n            \"keyword\": keyword,\n            \"title\": art[\"title\"],\n            \"url\": art[\"url\"],\n            \"source\": art.get(\"source\"),\n            \"snippet\": art.get(\"snippet\"),\n            \"published_at\": art.get(\"published_at\"),\n            \"sentiment\": ai_result[\"sentiment\"],\n            \"ai_summary\": ai_result[\"summary\"],\n        }\n        article_id = db.save_article(row)\n        if article_id is not None:\n            new_count += 1\n            alerts.check_article(keyword, row, article_id)\n\n    return {\"keyword\": keyword, \"fetched\": len(fetched), \"new\": new_count}\n```\n\nFetch, enrich, save, alert. That is the whole tool, minus the interfaces wrapped around it.\n\n## Fetching from the Google News API\n\nI'm using [SearchApi.io](https://www.searchapi.io) as the entry point to the Google News API.\n\nOne issues i ran into was, google news matching is loose. search \"niche company 1\" and half the results are for the similar niche company 2, nothing to do with you. So there's a per-article flag for whether your keyword actually appears in the article body, with a toggle to hide everything else.\n\nFrom `monitor/search.py`\n\n:\n\n``` python\nSEARCHAPI_URL = \"https://www.searchapi.io/api/v1/search\"\n\ndef fetch_google_news(keyword, num=30, when=None, gl=None):\n    params = {\n        \"engine\": \"google_news\",\n        \"q\": keyword.strip(),\n        \"nfpr\": 1,                 # turn off \"did you mean...\"\n        \"num\": num,\n        \"api_key\": os.environ[\"SEARCHAPI_KEY\"],\n    }\n    if when:\n        params[\"when\"] = when      # 1h, 1d, 7d, 1m, 1y\n    if gl:\n        params[\"gl\"] = gl.lower()  # 2-letter country code\n\n    resp = requests.get(SEARCHAPI_URL, params=params, timeout=30)\n    resp.raise_for_status()\n    data = resp.json()\n\n    articles = []\n    for item in data.get(\"organic_results\", []) or []:\n        articles.append({\n            \"title\": item.get(\"title\"),\n            \"url\": item.get(\"link\"),\n            \"source\": item.get(\"source\", {}).get(\"name\"),\n            \"snippet\": item.get(\"snippet\"),\n            \"published_at\": parse_date(item.get(\"date\")),\n        })\n    return articles\n```\n\nOne thing the Google News API will trip you up on: the `date`\n\nfield arrives as free-form strings like `\"1 week ago\"`\n\n, `\"May 30, 2023\"`\n\n, or `\"Yesterday\"`\n\n. Not ISO timestamps. If you store those verbatim, your SQL filters will silently break and your charts will sort `\"May 30\"`\n\nalphabetically next to `\"2026-05-14\"`\n\n. I wrote a small parser (`monitor/dates.py`\n\n) that normalizes everything to `YYYY-MM-DD`\n\non the way in.\n\n## OpenAI enrichment with a JSON guardrail\n\nFor every article I want two things: a sentiment label (positive, negative, neutral) and a one-sentence summary. The trick is to force OpenAI to return parseable JSON so the database ingestion never sees free-form text.\n\nFrom `monitor/ai.py`\n\n:\n\n```\nARTICLE_SYSTEM_PROMPT = (\n    \"You are a media-monitoring analyst. For each article you receive, \"\n    \"classify the sentiment toward the tracked brand/keyword and write a \"\n    \"one-sentence summary. You MUST return a single JSON object - no prose, \"\n    \"no markdown, no code fences. \"\n    'Schema: {\"sentiment\": \"positive\"|\"negative\"|\"neutral\", '\n    '\"summary\": \"<one sentence>\"}. Never include any other keys.\"\n)\n\ndef enrich_article(keyword, title, snippet):\n    resp = openai_client().chat.completions.create(\n        model=os.environ.get(\"OPENAI_MODEL\", \"gpt-4o-mini\"),\n        response_format={\"type\": \"json_object\"},   # enforce JSON\n        messages=[\n            {\"role\": \"system\", \"content\": ARTICLE_SYSTEM_PROMPT},\n            {\"role\": \"user\", \"content\":\n                f\"Tracked keyword: {keyword}\\n\"\n                f\"Article title: {title}\\n\"\n                f\"Article snippet: {snippet or '(no snippet)'}\"},\n        ],\n        temperature=0.2,\n    )\n    data = json.loads(resp.choices[0].message.content or \"{}\")\n    sentiment = (data.get(\"sentiment\") or \"neutral\").lower()\n    if sentiment not in {\"positive\", \"negative\", \"neutral\"}:\n        sentiment = \"neutral\"\n    return {\"sentiment\": sentiment, \"summary\": (data.get(\"summary\") or \"\").strip()}\n```\n\n`response_format={\"type\": \"json_object\"}`\n\nforces the model to emit valid JSON. The system prompt also redundantly says \"no prose, no markdown, no code fences\" because models sometimes ignore the format flag anyway. Belt and braces.\n\nThere's also a `summarize_period(keyword, articles)`\n\nfunction that takes the full article list for a time window and writes a paragraph-long narrative summary. Same JSON-only pattern. That's what powers the \"AI period summary\" block at the top of the dashboard report.\n\n## SQLite with self-healing schema\n\nI wanted zero setup steps. No `flask db upgrade`\n\n, no SQL files to apply by hand. So the database creates itself on first connection:\n\n```\nSCHEMA = \"\"\"\nCREATE TABLE IF NOT EXISTS articles (\n    id INTEGER PRIMARY KEY AUTOINCREMENT,\n    keyword TEXT NOT NULL,\n    title TEXT NOT NULL,\n    url TEXT NOT NULL,\n    source TEXT,\n    snippet TEXT,\n    published_at TEXT,\n    sentiment TEXT,\n    ai_summary TEXT,\n    fetched_at TEXT NOT NULL DEFAULT (datetime('now')),\n    UNIQUE(keyword, url)\n);\n-- ...more tables for keywords, alerts, period summaries\n\"\"\"\n\n@contextmanager\ndef connect(path=DB_PATH):\n    first_run = not os.path.exists(path)\n    conn = sqlite3.connect(path)\n    conn.row_factory = sqlite3.Row\n    if first_run:\n        conn.executescript(SCHEMA)\n        conn.commit()\n    try:\n        yield conn\n        conn.commit()\n    finally:\n        conn.close()\n```\n\n`UNIQUE(keyword, url)`\n\ndoes the deduplication. If you re-search the same keyword, articles you already have don't get re-saved and don't get re-billed for OpenAI calls.\n\n## Three interfaces, one core\n\nThe dashboard, the REST API, and the CLI all sit on top of the same `pipeline.process_keyword()`\n\n. Each one is small.\n\nThe REST blueprint lives in `monitor/api.py`\n\n:\n\n``` python\n@bp.post(\"/api/search\")\ndef run_search():\n    data = request.get_json(silent=True) or {}\n    keyword = (data.get(\"keyword\") or \"\").strip()\n    if not keyword:\n        return jsonify({\"status\": \"error\", \"message\": \"keyword required\"}), 400\n    result = pipeline.process_keyword(keyword,\n                                       num=int(data.get(\"num\") or 30),\n                                       when=data.get(\"when\") or None,\n                                       gl=data.get(\"gl\") or None)\n    return jsonify({\"status\": \"ok\", \"result\": result})\n```\n\nThe CLI is in `cli.py`\n\n:\n\n```\n@cli.command()\n@click.argument(\"keyword\")\n@click.option(\"--when\", default=\"1d\")\n@click.option(\"--num\", default=50)\n@click.option(\"--gl\", default=None)\ndef search(keyword, when, num, gl):\n    result = pipeline.process_keyword(keyword, num=num, when=when, gl=gl)\n    click.echo(json.dumps(result, indent=2))\n```\n\nThe dashboard is one HTML file (`templates/dashboard.html`\n\n) using Tailwind via CDN and Chart.js for the volume chart. There is no build step. Forms POST to the same endpoints.\n\nThe GET endpoints auto-fetch when the DB is empty for that keyword. A single URL is enough to spin up a fresh monitor for a new term, which makes it easy to plug into your own scripts or hand to an AI agent that needs to know what the press is saying about a brand.\n\n## REST API\n\nAll endpoints return JSON. The server runs on `127.0.0.1:5000`\n\nby default.\n\n| Method | Path | Purpose |\n|---|---|---|\n| GET | `/healthz` |\nLiveness check |\n| GET | `/api/keywords` |\nList tracked keywords |\n| POST | `/api/keywords` |\nAdd a keyword |\n| DELETE | `/api/keywords/<keyword>` |\nStop tracking |\n| POST | `/api/search` |\nRun the pipeline once (fetch + enrich + save) |\n| POST | `/api/cron/run` |\nRun the daily job immediately |\n| GET | `/api/report/<keyword>` |\nFull report, every article in the period |\n| GET | `/api/matches/<keyword>` |\nSame payload, filtered to keyword matches only |\n| GET | `/api/analytics/<keyword>` |\nSentiment totals plus bucketed volume for the chart |\n| GET | `/api/alerts` |\nRecent breaking-news alerts |\n| GET | `/api/settings` |\nCurrent settings (keys are masked) |\n| POST | `/api/settings` |\nUpdate keys, model, Slack webhook |\n| POST | `/api/settings/test-slack` |\nSend a test Slack message |\n\nThe report and matches endpoints auto-fetch on first use. If there is nothing in the database for that keyword yet, the pipeline runs first and the report comes back populated. Subsequent calls are instant. Pass `?fetch=true`\n\nto force a refresh.\n\n**Query parameters** for `/api/report`\n\nand `/api/matches`\n\n:\n\n-\n`period`\n\n:`daily`\n\n,`weekly`\n\n,`monthly`\n\n, or`all`\n\n(default:`all`\n\nfor matches,`weekly`\n\nfor report) -\n`fetch`\n\n:`true`\n\nto force a fresh fetch even when data already exists -\n`num`\n\n: max results from Google News when auto-fetching (default 50) -\n`when`\n\n:`1h`\n\n,`1d`\n\n,`7d`\n\n,`1m`\n\n,`1y`\n\n(default: any time) -\n`gl`\n\n: 2-letter country code, e.g.`us`\n\n,`gb`\n\n,`de`\n\n**Example.** A single URL is enough to spin up a fresh monitor for a brand new keyword:\n\n```\nGET http://127.0.0.1:5000/api/matches/n8n?period=all\n```\n\n## Daily cron, in-process\n\nAPScheduler runs the daily job inside the same Flask process, so there is no system cron to configure and no separate worker to deploy.\n\n``` python\ndef start_scheduler():\n    hour = int(os.environ.get(\"DAILY_CRON_HOUR\", \"8\"))\n    minute = int(os.environ.get(\"DAILY_CRON_MINUTE\", \"0\"))\n    sched = BackgroundScheduler(daemon=True)\n    sched.add_job(pipeline.run_all_monitored, trigger=\"cron\",\n                  hour=hour, minute=minute, id=\"daily_monitor\",\n                  replace_existing=True)\n    sched.start()\n```\n\n`run_all_monitored()`\n\nwalks the list of keywords flagged with `monitored=1`\n\nin the database and runs the full pipeline for each.\n\n## Alerts\n\nThe alerts module checks every freshly-saved article against a list of risk phrases (`lawsuit`\n\n, `breach`\n\n, `outage`\n\n, `scandal`\n\n, …), checks if OpenAI returned `negative`\n\nsentiment, and posts a formatted message to Slack if either trips. It also tracks a 14-day rolling baseline and fires a separate alert if today's article volume is 3x that baseline.\n\n```\nRISK_PHRASES = [\"lawsuit\", \"sued\", \"investigation\", \"breach\", \"hack\",\n                \"outage\", \"scandal\", \"fired\", \"resigns\", \"bankruptcy\", \"recall\"]\n\ndef check_article(keyword, article, article_id):\n    haystack = (article.get(\"title\", \"\") + \" \" + article.get(\"snippet\", \"\")).lower()\n    matched = [p for p in RISK_PHRASES if p in haystack]\n    reasons = []\n    if matched:\n        reasons.append(f\"risk phrase: {', '.join(matched)}\")\n    if article.get(\"sentiment\", \"\").lower() == \"negative\":\n        reasons.append(\"negative sentiment\")\n    if reasons:\n        db.save_alert(keyword, article_id, \"; \".join(reasons))\n        send_slack(format_slack(keyword, reasons, article))\n```\n\n## Install it\n\n```\ngit clone https://github.com/SamJale/Google-News-Monitor-API.git\ncd google-news-monitor\npip install -r requirements.txt\npython app.py\n```\n\nA browser tab opens at `http://127.0.0.1:5000/`\n\n. Add your SearchApi.io and OpenAI keys through the Settings button in the UI (or edit `.env`\n\ndirectly).\n\nIf you want to drive it from the terminal:\n\n```\npython cli.py add \"anthropic\"                       # start tracking\npython cli.py search \"anthropic\" --when 7d --num 50 # one-shot\npython cli.py report \"anthropic\" --period weekly    # see the saved data\npython cli.py cron                                  # run the daily job now\n```\n\n## Things I would change if I were building it again\n\n- The OpenAI enrichment runs serially. For a keyword that returns 50 articles, that's 50 sequential API calls. Easy win: parallelize with\n`asyncio`\n\nor a thread pool. - The\n`data.db`\n\nfile lives in the project root. Probably should default to`~/.google-news-monitor/data.db`\n\nfor cleaner installs. - No retry logic on transient API errors. SearchApi.io and OpenAI both occasionally 500. Add exponential backoff.\n\nIf you build any of these, send a PR.\n\nIf you want to see what the running app looks like, screenshots are on the GitHub README. It is MIT licensed, runs entirely on your machine, and has no telemetry or sign-up.\n\n**Disclosure:** I work at SearchApi.io, which is the Google News data source this tool uses. Worth saying upfront before anyone digs.", "url": "https://wpnews.pro/news/building-a-daily-google-news-api-monitor-in-python", "canonical_source": "https://dev.to/sam_gale_376efd5d2fd14112/building-a-daily-google-news-api-monitor-in-python-l4k", "published_at": "2026-05-22 11:51:23+00:00", "updated_at": "2026-05-22 12:12:05.351007+00:00", "lang": "en", "topics": ["developer-tools", "artificial-intelligence", "open-source"], "entities": ["Google News API", "OpenAI", "SQLite", "Flask", "Slack", "GitHub"], "alternates": {"html": "https://wpnews.pro/news/building-a-daily-google-news-api-monitor-in-python", "markdown": "https://wpnews.pro/news/building-a-daily-google-news-api-monitor-in-python.md", "text": "https://wpnews.pro/news/building-a-daily-google-news-api-monitor-in-python.txt", "jsonld": "https://wpnews.pro/news/building-a-daily-google-news-api-monitor-in-python.jsonld"}}