{"slug": "building-a-real-time-financial-sentiment-api-handling-noise-and-llm", "title": "Building a Real-Time Financial Sentiment API: Handling Noise and LLM Hallucinations", "summary": "A developer built a production-grade Market Sentiment API that ingests unstructured global financial news feeds, parses entities, and determines sentiment polarity to expose machine-readable market signals. The system processes data through a four-phase pipeline—news ingestion, keyword filtering, LLM-based sentiment extraction, and state aggregation—while solving critical edge cases like LLM cost optimization and ticker hallucinations. To prevent false-positive matches and reduce token costs, the ingestion engine uses localized string boundary matching with domain-specific keyword sets before any data reaches the LLM.", "body_md": "Financial markets move faster than human cognition. A geopolitical headline can trigger automated oil liquidations within milliseconds. A single earnings report can wipe out a company’s valuation before a retail trader finishes reading the first paragraph.\n\nI set out to build a production-grade system that could automatically ingest unstructured global financial news feeds, parse the entities affected, determine the sentiment polarity, and expose the results as machine-readable market signals.\n\nThis post details the technical architecture of the Market Sentiment API, the data engineering pipeline, and how I solved critical edge cases like LLM cost optimisation and ticker hallucinations.\n\nThe program processes incoming data through a pipeline designed to minimise LLM token overhead and optimise latency.\n\nThe core data pipeline consists of four distinct phases:\n\nGetting information: Every 5 minutes news is obtained from RSS feeds (Bloomberg, Reuters, Financial Times, CNBC, BBC, Al Jazeera).\n\nFiltering news: Relevant news is obtained by checking if the articles falls in 6 sections(company, war, policy, commodity, tech, disaster) using keywords commonly found in each section.\n\nSentiment extraction: An LLM extracts tickers, sentiment, and contextual summary\n\nState Aggregation & Momentum Tracking: Relevant articles are gathered together and an LLM is used to get overall sentiment and momentum direction and confidence rating.\n\nPassing every raw RSS headline directly to an LLM creates astronomical token costs and introduces latency. More than 70% of standard business news lacks immediate market-moving impact.\n\nTo solve this at zero token cost, the ingestion engine passes incoming headlines through a localised string boundary matcher before the data ever touches an LLM.\n\nThe program dynamically loads domain-specific keywords from external text asset files (companies.txt, war.txt, policy.txt, etc.) into memory as Python sets for O(1) lookups. It then uses strict regex word boundaries (\\b) to prevent false-positive partial matches (e.g., ensuring \"gasoline\" or \"gas\" matches cleanly without breaking on unrelated strings).\n\n``` python\ncompanies_set = get_set(\"companies.txt\")\n\ndef match_set(title, keyword_set):\n    title = title.lower()\n    for k in keyword_set:\n        if re.search(rf\"\\b{re.escape(k)}\\b\", title):\n            return True\n\n    return False\n\ndef company_news(title: str) -> bool:\n    return match_set(title, companies_set)\n```\n\nOnce an article passes the initial keyword filter, it reaches the first LLM layer. The goal here is to take the raw headline and description and transform it into structured financial output.\n\nHowever, using a standard, unconstrained text prompt introduces a major failure mode: **ticker hallucination**. Out-of-the-box models frequently look at context clues and deduce tickers that are not explicitly mentioned (such as adding NVDA to a generic article about semiconductor logistics) or map companies to completely wrong asset symbols.\n\nTo eliminate variable outputs the following is added to llm instructions:\n\n```\nInput:\nTitle: Oil prices surge after Iran conflict escalates\nDescription: Markets fear supply disruptions in the Middle East\n\nOutput:\n{\n  \"signals\": [\n    {\"asset\": \"CL=F\", \"signal_score\": 0.9},\n    {\"asset\": \"XLE\", \"signal_score\": 0.7},\n    {\"asset\": \"SPY\", \"signal_score\": -0.3}\n  ],\n  \"summary\": \"Escalating Middle East tensions boosted oil prices and energy stocks while pressuring broader equities.\"\n}\n```\n\nI advise using multiple example responses to enforce same output format.\n\nA singular asset can appear across multiple news sources within the same extraction window, often yielding conflicting sentiment lines. If BBC prints a mildly bearish note on a ticker while the Financial Times breaks a highly bullish exclusive twenty minutes later, looking at individual articles in isolation provides an incomplete picture.\n\nTo resolve this, the system pulls all historical data captured over a rolling window and groups them by ticker symbol. These pooled source inputs are then passed through a second LLM state-aggregation layer.\n\nInstead of a simple mathematical average, the LLM is advised to use each articles sentiment and hours since published to get the following responses overall sentiment, confidence and momentum.\n\nThe final output structural layout wraps the top-tier aggregated metrics alongside an array containing the exact downstream articles that built the consensus:\n\n```\n{\n  \"ticker\": \"para\",\n  \"overall_sentiment\": \"neutral\",\n  \"overall_sentiment_score\": 0.5,\n  \"overall_confidence\": 0.65,\n  \"sentiment_momentum\": \"neutral\",\n  \"articles_analysed\": 1,\n  \"summary\": \"Recent positive sentiment from a buyout attempt indicates potential, but overall confidence remains low due to limited coverage.\",\n  \"signals\": [\n    {\n      \"title\": \"Paramount Is Pulling Every Lever to Sell LBO Debt\",\n      \"summary\": \"Paramount's aggressive leveraged buyout attempt for Warner Bros. Discovery generated positive sentiment for both companies, suggesting potential growth and strategic consolidation in the media sector.\",\n      \"signals\": [\n        {\n          \"asset\": \"WBD\",\n          \"signal_score\": 0.6\n        },\n        {\n          \"asset\": \"PARA\",\n          \"signal_score\": 0.5\n        }\n      ],\n      \"description\": \"Paramount Skydance Corp. stretched, then stretched, then stretched again in its audacious $110 billion takeover bid for Warner Bros. Discovery Inc.\",\n      \"published_at\": \"2026-05-30T19:00:00Z\",\n      \"since_published_hr\": 4.220252061944445,\n      \"source\": \"Bloomberg News\",\n      \"url\": \"https://www.bloomberg.com/news/articles/2026-05-30/paramount-is-pulling-every-lever-to-sell-lbo-debt-credit-weekly\"\n    }\n  ]\n}\n```\n\nExplore the Contract Shapes: Check out our [interactive Swagger UI](https://jp1v.github.io/market_sentiment_openapi/) Documentation to run mock requests and map out the exact JSON payloads.\n\nIntegrate via RapidAPI: Grab a free tier developer token on [RapidAPI](https://rapidapi.com/JP1V/api/market-sentiment1) to begin injecting live macro-sentiment triggers directly into your automated algorithmic models, quantitative trading bots, or custom terminal dashboards.", "url": "https://wpnews.pro/news/building-a-real-time-financial-sentiment-api-handling-noise-and-llm", "canonical_source": "https://dev.to/jp1/building-a-real-time-financial-sentiment-api-handling-noise-and-llm-hallucinations-3306", "published_at": "2026-05-30 23:25:14+00:00", "updated_at": "2026-05-30 23:42:10.022990+00:00", "lang": "en", "topics": ["large-language-models", "natural-language-processing", "ai-infrastructure", "ai-products", "ai-tools"], "entities": ["Bloomberg", "Reuters", "Financial Times", "CNBC", "BBC", "Al Jazeera"], "alternates": {"html": "https://wpnews.pro/news/building-a-real-time-financial-sentiment-api-handling-noise-and-llm", "markdown": "https://wpnews.pro/news/building-a-real-time-financial-sentiment-api-handling-noise-and-llm.md", "text": "https://wpnews.pro/news/building-a-real-time-financial-sentiment-api-handling-noise-and-llm.txt", "jsonld": "https://wpnews.pro/news/building-a-real-time-financial-sentiment-api-handling-noise-and-llm.jsonld"}}