Track YC Demo Day Companies in Real Time (with code) This article provides a technical guide for tracking Y Combinator Demo Day companies in real time by querying YC's publicly accessible Algolia search API. It includes Python code to fetch the active batch roster, monitor for new companies as they appear during presentations, and automatically score companies based on a fund's investment thesis. The author emphasizes that data ingestion is straightforward, while the real challenge lies in efficiently prioritizing and filtering the ~250 companies that present. Track YC Demo Day Companies in Real Time with code Y Combinator Demo Day is the single most concentrated VC sourcing event of the year. Twice annually, ~250 companies present back-to-back over 1-2 days. Within 48 hours, the top 50 have term sheets. Within 7 days, the next 50 have term sheets. By day 14, the remaining 150 are either oversubscribed or starting to struggle. The associate's job at a multi-stage fund during Demo Day is roughly: - Within 6 hours: scrape every company's details from the YC site - Within 12 hours: triage to the 30-50 worth investigating - Within 24 hours: book first calls with the top 15 - Within 48 hours: close on the top 5 The bottleneck is step 1 — and it's an entirely-solvable bottleneck. YC's company directory updates in real time as Demo Day progresses. The Algolia-indexed search behind the YC site is publicly queryable. With 50 lines of Python you can pull the full active-batch roster in under 5 seconds, refresh every 90 seconds, and have a live feed during Demo Day itself. This post is the working code, the join logic, and the prioritization framework. The NexGenData YC Companies Directory https://apify.com/nexgendata/yc-companies-directory-scraper actor wraps this if you want a hosted version. The YC Algolia Endpoint YC's company list page https://www.ycombinator.com/companies is fully client-rendered. The page bundle includes a hardcoded Algolia application ID and a public read-only API key. Both are visible in the browser dev tools network tab. python import httpx YC ALGOLIA URL = "https://45bwzj1sgc-dsn.algolia.net/1/indexes/YCCompany production/query" YC ALGOLIA HEADERS = { "X-Algolia-Application-Id": "45BWZJ1SGC", "X-Algolia-API-Key": "Y2VkOWQyMTJlYjZkZjE3MDRkY2YyNjBmYmIzMjVhMzA1ZmRlYTQ4OTUyZjEyZjRiNzc0OWQ4MjRmMzVlYmUxN3RhZ0ZpbHRlcnM9JTViJTIyJTVEJmZpbHRlcnM9aXNIaXJpbmclM0F0cnVl", "Content-Type": "application/json", } async def fetch yc batch batch: str = "S26" - list dict : """Fetch all companies in a specific YC batch.""" payload = { "query": "", "hitsPerPage": 1000, "facetFilters": f"batch:{batch}" , } async with httpx.AsyncClient headers=YC ALGOLIA HEADERS, timeout=20 as client: r = await client.post YC ALGOLIA URL, json=payload r.raise for status return r.json .get "hits", A response hit looks like: { "name": "ExampleCo", "slug": "exampleco", "batch": "S26", "industry": "B2B", "subindustry": "DevTools", "team size": 4, "regions": "United States of America" , "isHiring": true, "stage": "Active", "tags": "api", "developer-tools" , "description": "ExampleCo lets developers...", "website": "https://exampleco.com", "long description": "ExampleCo is the missing layer between..." } For a full active batch, expect 200-280 hits. Demo Day batches are gradually populated over the 90-day program — by Demo Day itself, all companies are publicly searchable. Polling for Real-Time Updates During Demo Day, YC's batch index updates in waves as companies present. To get a live feed: python import asyncio from datetime import datetime async def live demo day tracker batch: str, interval: int = 90 : seen slugs = set while True: try: companies = await fetch yc batch batch new = c for c in companies if c "slug" not in seen slugs for c in new: print f" {datetime.now .isoformat } NEW: {c 'name' } - {c 'description' :80 }" seen slugs.add c "slug" except Exception as e: print f" poll error: {e}" await asyncio.sleep interval Polling every 90 seconds is gentle on YC's Algolia backend and keeps you within ~2 minutes of the actual update. Run it in a tmux session during Demo Day; pipe output to Slack via a webhook for team-wide visibility. Triage Logic: From 250 Companies to 30 Worth Investigating The hard part of Demo Day isn't ingest — it's prioritization. The naive approach read all 250 descriptions burns 3-4 hours and produces lukewarm shortlists. The better approach: pre-define your fund's thesis filters and score each company automatically. A simple scoring model: php def score yc company c: dict, thesis: dict - int: score = 0 Industry alignment 0-30 points if c.get "industry" in thesis "target industries" : score += 30 elif c.get "industry" in thesis "adjacent industries" : score += 15 Team size sweet spot 0-15 points team = c.get "team size", 0 if thesis "min team" <= team <= thesis "max team" : score += 15 elif team < thesis "min team" : score += 5 too early but not disqualifying Hiring signal 0-10 points if c.get "isHiring" : score += 10 Geography 0-10 points regions = c.get "regions", if any r in thesis "target regions" for r in regions : score += 10 Tag overlap 0-20 points, 5/tag up to 4 tag overlap = set c.get "tags", & set thesis "target tags" score += min 20, len tag overlap 5 Description-based filter — keyword presence 0-15 points desc = c.get "description", "" + " " + c.get "long description", "" .lower keyword hits = sum 1 for kw in thesis "target keywords" if kw in desc score += min 15, keyword hits 5 return score Sample thesis config for a B2B-SaaS-focused pre-seed fund: B2B SAAS PRESEED = { "target industries": "B2B" , "adjacent industries": "Fintech", "Healthcare" , "min team": 2, "max team": 8, "target regions": "United States of America", "Canada" , "target tags": "api", "developer-tools", "saas", "infrastructure", "automation", "analytics", "data" , "target keywords": "api", "platform", "automation", "developer", "dashboard", "analytics", "infrastructure" , } Run all 250 companies through the scoring function, sort by score descending, and the top 30 are your day-1 outbound list. Top 80 is your day-2/3 follow-up list. The bottom 140 you ignore unless something specific surfaces in a peer-investor conversation. This whole pipeline — fetch + score + sort — runs in under 8 seconds on a laptop. By contrast, manual triage of the same 250 companies takes 3-4 hours and is biased by reading order. Cross-Referencing With External Signals The real edge during Demo Day comes from cross-referencing YC's company data with external signals you've been tracking. Two sources that meaningfully sharpen the YC list: LinkedIn founder signal. For each YC company, look up the founder LinkedIn profiles. Founders with prior senior IC roles at brand-name companies FAANG, Stripe, Datadog, Snowflake, etc. score 1.5-2x on conversion vs first-time founders without that pedigree. Auto-adding a "founder pedigree" multiplier pulls the right companies forward without manual triage. Hacker News engagement. YC companies whose CEO has an HN account with 500 karma and recent post history are statistically more articulate, more likely to be making something engineers want to talk about, and more likely to convert on a thoughtful cold email. The NexGenData Hacker News Scraper https://apify.com/nexgendata/hacker-news-scraper actor pulls user metadata including karma and post counts. Show HN history. A YC company whose founder previously launched a Show HN post even a different project is, statistically, in the top quartile of demo day quality. Show HN selects for builders. Pull this with the NexGenData Show HN Tracker https://apify.com/nexgendata/hn-show-hn-tracker actor. The Cost of Not Automating This Most VC sourcing teams I've worked with at series-A firms don't have a real demo day automation pipeline. They send 1-2 associates to the live event, take notes, and triage by hand over the following week. By the time their shortlist is ready, the top 30 companies have already had calls with 5-10 funds and are in active term-sheet negotiations. The cost isn't the data — YC publishes everything for free. The cost is the speed delta. A team running this pipeline can triage on the day of demo day and book first calls within 48 hours. A team triaging by hand books first calls 5-10 days later, by which time the deal is set. Cost of building it yourself: ~1 day of engineering, then ~$5/month of compute. Cost of using the YC Companies Directory actor https://apify.com/nexgendata/yc-companies-directory-scraper : $0.01/company × ~270 companies/batch = ~$2.70/batch, runnable on demand. NexGenData publishes 195+ actors https://apify.com/nexgendata?fpr=2ayu9b covering startup-stage signals: YC alumni, Show HN, Product Hunt, Delaware C-corp formations, SEC Form D, and more. All pay-per-result.