# Building a Daily Google News API Monitor in Python

> Source: <https://dev.to/sam_gale_376efd5d2fd14112/building-a-daily-google-news-api-monitor-in-python-l4k>
> Published: 2026-05-22 11:51:23+00:00

I wanted a small, local tool that would search the news for brand mentions. I didnt want to pay over $100 a month so I decided to build my own.

What I created was a tool that would search the news with a **Google News API** every morning for a list of keywords, run each result through a LLM for sentiment and a one-sentence summary, save everything to SQLite, and ping me the results on Slack/a web app.

The whole project came out to about 1,000 lines of Python across ten files. It is a Flask app with a SQLite database and a single HTML dashboard.

Here's how it's wired together, with the code that matters from each layer.

Repo:[google-news-monitor on GitHub]. Install instructions at the bottom.

## The pipeline

The whole tool is one pipeline:

```
keyword → Google News API → OpenAI enrichment → SQLite → (dashboard | REST | CLI | Slack)
```

Every interface (the dashboard form, the REST API, the CLI, the daily cron) ends up calling the same `process_keyword()`

function. Here is the entire core loop, from `monitor/pipeline.py`

:

``` python
def process_keyword(keyword, num=30, when="1d", gl=None):
    keyword = keyword.strip()
    fetched = search.fetch_google_news(keyword, num=num, when=when, gl=gl)

    new_count = 0
    for art in fetched:
        if not art.get("url"):
            continue
        ai_result = ai.enrich_article(keyword, art.get("title") or "",
                                      art.get("snippet") or "")
        row = {
            "keyword": keyword,
            "title": art["title"],
            "url": art["url"],
            "source": art.get("source"),
            "snippet": art.get("snippet"),
            "published_at": art.get("published_at"),
            "sentiment": ai_result["sentiment"],
            "ai_summary": ai_result["summary"],
        }
        article_id = db.save_article(row)
        if article_id is not None:
            new_count += 1
            alerts.check_article(keyword, row, article_id)

    return {"keyword": keyword, "fetched": len(fetched), "new": new_count}
```

Fetch, enrich, save, alert. That is the whole tool, minus the interfaces wrapped around it.

## Fetching from the Google News API

I'm using [SearchApi.io](https://www.searchapi.io) as the entry point to the Google News API.

One issues i ran into was, google news matching is loose. search "niche company 1" and half the results are for the similar niche company 2, nothing to do with you. So there's a per-article flag for whether your keyword actually appears in the article body, with a toggle to hide everything else.

From `monitor/search.py`

:

``` python
SEARCHAPI_URL = "https://www.searchapi.io/api/v1/search"

def fetch_google_news(keyword, num=30, when=None, gl=None):
    params = {
        "engine": "google_news",
        "q": keyword.strip(),
        "nfpr": 1,                 # turn off "did you mean..."
        "num": num,
        "api_key": os.environ["SEARCHAPI_KEY"],
    }
    if when:
        params["when"] = when      # 1h, 1d, 7d, 1m, 1y
    if gl:
        params["gl"] = gl.lower()  # 2-letter country code

    resp = requests.get(SEARCHAPI_URL, params=params, timeout=30)
    resp.raise_for_status()
    data = resp.json()

    articles = []
    for item in data.get("organic_results", []) or []:
        articles.append({
            "title": item.get("title"),
            "url": item.get("link"),
            "source": item.get("source", {}).get("name"),
            "snippet": item.get("snippet"),
            "published_at": parse_date(item.get("date")),
        })
    return articles
```

One thing the Google News API will trip you up on: the `date`

field arrives as free-form strings like `"1 week ago"`

, `"May 30, 2023"`

, or `"Yesterday"`

. Not ISO timestamps. If you store those verbatim, your SQL filters will silently break and your charts will sort `"May 30"`

alphabetically next to `"2026-05-14"`

. I wrote a small parser (`monitor/dates.py`

) that normalizes everything to `YYYY-MM-DD`

on the way in.

## OpenAI enrichment with a JSON guardrail

For every article I want two things: a sentiment label (positive, negative, neutral) and a one-sentence summary. The trick is to force OpenAI to return parseable JSON so the database ingestion never sees free-form text.

From `monitor/ai.py`

:

```
ARTICLE_SYSTEM_PROMPT = (
    "You are a media-monitoring analyst. For each article you receive, "
    "classify the sentiment toward the tracked brand/keyword and write a "
    "one-sentence summary. You MUST return a single JSON object - no prose, "
    "no markdown, no code fences. "
    'Schema: {"sentiment": "positive"|"negative"|"neutral", '
    '"summary": "<one sentence>"}. Never include any other keys."
)

def enrich_article(keyword, title, snippet):
    resp = openai_client().chat.completions.create(
        model=os.environ.get("OPENAI_MODEL", "gpt-4o-mini"),
        response_format={"type": "json_object"},   # enforce JSON
        messages=[
            {"role": "system", "content": ARTICLE_SYSTEM_PROMPT},
            {"role": "user", "content":
                f"Tracked keyword: {keyword}\n"
                f"Article title: {title}\n"
                f"Article snippet: {snippet or '(no snippet)'}"},
        ],
        temperature=0.2,
    )
    data = json.loads(resp.choices[0].message.content or "{}")
    sentiment = (data.get("sentiment") or "neutral").lower()
    if sentiment not in {"positive", "negative", "neutral"}:
        sentiment = "neutral"
    return {"sentiment": sentiment, "summary": (data.get("summary") or "").strip()}
```

`response_format={"type": "json_object"}`

forces the model to emit valid JSON. The system prompt also redundantly says "no prose, no markdown, no code fences" because models sometimes ignore the format flag anyway. Belt and braces.

There's also a `summarize_period(keyword, articles)`

function that takes the full article list for a time window and writes a paragraph-long narrative summary. Same JSON-only pattern. That's what powers the "AI period summary" block at the top of the dashboard report.

## SQLite with self-healing schema

I wanted zero setup steps. No `flask db upgrade`

, no SQL files to apply by hand. So the database creates itself on first connection:

```
SCHEMA = """
CREATE TABLE IF NOT EXISTS articles (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    keyword TEXT NOT NULL,
    title TEXT NOT NULL,
    url TEXT NOT NULL,
    source TEXT,
    snippet TEXT,
    published_at TEXT,
    sentiment TEXT,
    ai_summary TEXT,
    fetched_at TEXT NOT NULL DEFAULT (datetime('now')),
    UNIQUE(keyword, url)
);
-- ...more tables for keywords, alerts, period summaries
"""

@contextmanager
def connect(path=DB_PATH):
    first_run = not os.path.exists(path)
    conn = sqlite3.connect(path)
    conn.row_factory = sqlite3.Row
    if first_run:
        conn.executescript(SCHEMA)
        conn.commit()
    try:
        yield conn
        conn.commit()
    finally:
        conn.close()
```

`UNIQUE(keyword, url)`

does the deduplication. If you re-search the same keyword, articles you already have don't get re-saved and don't get re-billed for OpenAI calls.

## Three interfaces, one core

The dashboard, the REST API, and the CLI all sit on top of the same `pipeline.process_keyword()`

. Each one is small.

The REST blueprint lives in `monitor/api.py`

:

``` python
@bp.post("/api/search")
def run_search():
    data = request.get_json(silent=True) or {}
    keyword = (data.get("keyword") or "").strip()
    if not keyword:
        return jsonify({"status": "error", "message": "keyword required"}), 400
    result = pipeline.process_keyword(keyword,
                                       num=int(data.get("num") or 30),
                                       when=data.get("when") or None,
                                       gl=data.get("gl") or None)
    return jsonify({"status": "ok", "result": result})
```

The CLI is in `cli.py`

:

```
@cli.command()
@click.argument("keyword")
@click.option("--when", default="1d")
@click.option("--num", default=50)
@click.option("--gl", default=None)
def search(keyword, when, num, gl):
    result = pipeline.process_keyword(keyword, num=num, when=when, gl=gl)
    click.echo(json.dumps(result, indent=2))
```

The dashboard is one HTML file (`templates/dashboard.html`

) using Tailwind via CDN and Chart.js for the volume chart. There is no build step. Forms POST to the same endpoints.

The GET endpoints auto-fetch when the DB is empty for that keyword. A single URL is enough to spin up a fresh monitor for a new term, which makes it easy to plug into your own scripts or hand to an AI agent that needs to know what the press is saying about a brand.

## REST API

All endpoints return JSON. The server runs on `127.0.0.1:5000`

by default.

| Method | Path | Purpose |
|---|---|---|
| GET | `/healthz` |
Liveness check |
| GET | `/api/keywords` |
List tracked keywords |
| POST | `/api/keywords` |
Add a keyword |
| DELETE | `/api/keywords/<keyword>` |
Stop tracking |
| POST | `/api/search` |
Run the pipeline once (fetch + enrich + save) |
| POST | `/api/cron/run` |
Run the daily job immediately |
| GET | `/api/report/<keyword>` |
Full report, every article in the period |
| GET | `/api/matches/<keyword>` |
Same payload, filtered to keyword matches only |
| GET | `/api/analytics/<keyword>` |
Sentiment totals plus bucketed volume for the chart |
| GET | `/api/alerts` |
Recent breaking-news alerts |
| GET | `/api/settings` |
Current settings (keys are masked) |
| POST | `/api/settings` |
Update keys, model, Slack webhook |
| POST | `/api/settings/test-slack` |
Send a test Slack message |

The report and matches endpoints auto-fetch on first use. If there is nothing in the database for that keyword yet, the pipeline runs first and the report comes back populated. Subsequent calls are instant. Pass `?fetch=true`

to force a refresh.

**Query parameters** for `/api/report`

and `/api/matches`

:

-
`period`

:`daily`

,`weekly`

,`monthly`

, or`all`

(default:`all`

for matches,`weekly`

for report) -
`fetch`

:`true`

to force a fresh fetch even when data already exists -
`num`

: max results from Google News when auto-fetching (default 50) -
`when`

:`1h`

,`1d`

,`7d`

,`1m`

,`1y`

(default: any time) -
`gl`

: 2-letter country code, e.g.`us`

,`gb`

,`de`

**Example.** A single URL is enough to spin up a fresh monitor for a brand new keyword:

```
GET http://127.0.0.1:5000/api/matches/n8n?period=all
```

## Daily cron, in-process

APScheduler runs the daily job inside the same Flask process, so there is no system cron to configure and no separate worker to deploy.

``` python
def start_scheduler():
    hour = int(os.environ.get("DAILY_CRON_HOUR", "8"))
    minute = int(os.environ.get("DAILY_CRON_MINUTE", "0"))
    sched = BackgroundScheduler(daemon=True)
    sched.add_job(pipeline.run_all_monitored, trigger="cron",
                  hour=hour, minute=minute, id="daily_monitor",
                  replace_existing=True)
    sched.start()
```

`run_all_monitored()`

walks the list of keywords flagged with `monitored=1`

in the database and runs the full pipeline for each.

## Alerts

The alerts module checks every freshly-saved article against a list of risk phrases (`lawsuit`

, `breach`

, `outage`

, `scandal`

, …), checks if OpenAI returned `negative`

sentiment, and posts a formatted message to Slack if either trips. It also tracks a 14-day rolling baseline and fires a separate alert if today's article volume is 3x that baseline.

```
RISK_PHRASES = ["lawsuit", "sued", "investigation", "breach", "hack",
                "outage", "scandal", "fired", "resigns", "bankruptcy", "recall"]

def check_article(keyword, article, article_id):
    haystack = (article.get("title", "") + " " + article.get("snippet", "")).lower()
    matched = [p for p in RISK_PHRASES if p in haystack]
    reasons = []
    if matched:
        reasons.append(f"risk phrase: {', '.join(matched)}")
    if article.get("sentiment", "").lower() == "negative":
        reasons.append("negative sentiment")
    if reasons:
        db.save_alert(keyword, article_id, "; ".join(reasons))
        send_slack(format_slack(keyword, reasons, article))
```

## Install it

```
git clone https://github.com/SamJale/Google-News-Monitor-API.git
cd google-news-monitor
pip install -r requirements.txt
python app.py
```

A browser tab opens at `http://127.0.0.1:5000/`

. Add your SearchApi.io and OpenAI keys through the Settings button in the UI (or edit `.env`

directly).

If you want to drive it from the terminal:

```
python cli.py add "anthropic"                       # start tracking
python cli.py search "anthropic" --when 7d --num 50 # one-shot
python cli.py report "anthropic" --period weekly    # see the saved data
python cli.py cron                                  # run the daily job now
```

## Things I would change if I were building it again

- The OpenAI enrichment runs serially. For a keyword that returns 50 articles, that's 50 sequential API calls. Easy win: parallelize with
`asyncio`

or a thread pool. - The
`data.db`

file lives in the project root. Probably should default to`~/.google-news-monitor/data.db`

for cleaner installs. - No retry logic on transient API errors. SearchApi.io and OpenAI both occasionally 500. Add exponential backoff.

If you build any of these, send a PR.

If you want to see what the running app looks like, screenshots are on the GitHub README. It is MIT licensed, runs entirely on your machine, and has no telemetry or sign-up.

**Disclosure:** I work at SearchApi.io, which is the Google News data source this tool uses. Worth saying upfront before anyone digs.
