# Building a Real-Time Financial Sentiment API: Handling Noise and LLM Hallucinations

> Source: <https://dev.to/jp1/building-a-real-time-financial-sentiment-api-handling-noise-and-llm-hallucinations-3306>
> Published: 2026-05-30 23:25:14+00:00

Financial markets move faster than human cognition. A geopolitical headline can trigger automated oil liquidations within milliseconds. A single earnings report can wipe out a company’s valuation before a retail trader finishes reading the first paragraph.

I set out to build a production-grade system that could automatically ingest unstructured global financial news feeds, parse the entities affected, determine the sentiment polarity, and expose the results as machine-readable market signals.

This post details the technical architecture of the Market Sentiment API, the data engineering pipeline, and how I solved critical edge cases like LLM cost optimisation and ticker hallucinations.

The program processes incoming data through a pipeline designed to minimise LLM token overhead and optimise latency.

The core data pipeline consists of four distinct phases:

Getting information: Every 5 minutes news is obtained from RSS feeds (Bloomberg, Reuters, Financial Times, CNBC, BBC, Al Jazeera).

Filtering news: Relevant news is obtained by checking if the articles falls in 6 sections(company, war, policy, commodity, tech, disaster) using keywords commonly found in each section.

Sentiment extraction: An LLM extracts tickers, sentiment, and contextual summary

State Aggregation & Momentum Tracking: Relevant articles are gathered together and an LLM is used to get overall sentiment and momentum direction and confidence rating.

Passing every raw RSS headline directly to an LLM creates astronomical token costs and introduces latency. More than 70% of standard business news lacks immediate market-moving impact.

To solve this at zero token cost, the ingestion engine passes incoming headlines through a localised string boundary matcher before the data ever touches an LLM.

The program dynamically loads domain-specific keywords from external text asset files (companies.txt, war.txt, policy.txt, etc.) into memory as Python sets for O(1) lookups. It then uses strict regex word boundaries (\b) to prevent false-positive partial matches (e.g., ensuring "gasoline" or "gas" matches cleanly without breaking on unrelated strings).

``` python
companies_set = get_set("companies.txt")

def match_set(title, keyword_set):
    title = title.lower()
    for k in keyword_set:
        if re.search(rf"\b{re.escape(k)}\b", title):
            return True

    return False

def company_news(title: str) -> bool:
    return match_set(title, companies_set)
```

Once an article passes the initial keyword filter, it reaches the first LLM layer. The goal here is to take the raw headline and description and transform it into structured financial output.

However, using a standard, unconstrained text prompt introduces a major failure mode: **ticker hallucination**. Out-of-the-box models frequently look at context clues and deduce tickers that are not explicitly mentioned (such as adding NVDA to a generic article about semiconductor logistics) or map companies to completely wrong asset symbols.

To eliminate variable outputs the following is added to llm instructions:

```
Input:
Title: Oil prices surge after Iran conflict escalates
Description: Markets fear supply disruptions in the Middle East

Output:
{
  "signals": [
    {"asset": "CL=F", "signal_score": 0.9},
    {"asset": "XLE", "signal_score": 0.7},
    {"asset": "SPY", "signal_score": -0.3}
  ],
  "summary": "Escalating Middle East tensions boosted oil prices and energy stocks while pressuring broader equities."
}
```

I advise using multiple example responses to enforce same output format.

A singular asset can appear across multiple news sources within the same extraction window, often yielding conflicting sentiment lines. If BBC prints a mildly bearish note on a ticker while the Financial Times breaks a highly bullish exclusive twenty minutes later, looking at individual articles in isolation provides an incomplete picture.

To resolve this, the system pulls all historical data captured over a rolling window and groups them by ticker symbol. These pooled source inputs are then passed through a second LLM state-aggregation layer.

Instead of a simple mathematical average, the LLM is advised to use each articles sentiment and hours since published to get the following responses overall sentiment, confidence and momentum.

The final output structural layout wraps the top-tier aggregated metrics alongside an array containing the exact downstream articles that built the consensus:

```
{
  "ticker": "para",
  "overall_sentiment": "neutral",
  "overall_sentiment_score": 0.5,
  "overall_confidence": 0.65,
  "sentiment_momentum": "neutral",
  "articles_analysed": 1,
  "summary": "Recent positive sentiment from a buyout attempt indicates potential, but overall confidence remains low due to limited coverage.",
  "signals": [
    {
      "title": "Paramount Is Pulling Every Lever to Sell LBO Debt",
      "summary": "Paramount's aggressive leveraged buyout attempt for Warner Bros. Discovery generated positive sentiment for both companies, suggesting potential growth and strategic consolidation in the media sector.",
      "signals": [
        {
          "asset": "WBD",
          "signal_score": 0.6
        },
        {
          "asset": "PARA",
          "signal_score": 0.5
        }
      ],
      "description": "Paramount Skydance Corp. stretched, then stretched, then stretched again in its audacious $110 billion takeover bid for Warner Bros. Discovery Inc.",
      "published_at": "2026-05-30T19:00:00Z",
      "since_published_hr": 4.220252061944445,
      "source": "Bloomberg News",
      "url": "https://www.bloomberg.com/news/articles/2026-05-30/paramount-is-pulling-every-lever-to-sell-lbo-debt-credit-weekly"
    }
  ]
}
```

Explore the Contract Shapes: Check out our [interactive Swagger UI](https://jp1v.github.io/market_sentiment_openapi/) Documentation to run mock requests and map out the exact JSON payloads.

Integrate via RapidAPI: Grab a free tier developer token on [RapidAPI](https://rapidapi.com/JP1V/api/market-sentiment1) to begin injecting live macro-sentiment triggers directly into your automated algorithmic models, quantitative trading bots, or custom terminal dashboards.
