Why I built a CLI to automate web research instead of relying on browser tabs A developer built a modular CLI tool called 'research-loop' to automate web research, addressing the inefficiency of manually collecting and synthesizing information from browser tabs. The tool runs a complete research loop—search, scrape, clean, synthesize, and report—without requiring a GPU or database, and supports scheduled monitoring with notifications via Discord or Telegram. It uses a five-file architecture for separation of concerns and includes fallback parsing strategies for non-article pages. A few months ago I noticed something annoying about how I worked: I was spending more time collecting information than actually thinking about it. The pattern was always the same. Open a search engine, open a dozen tabs, skim past the SEO filler and cookie banners, copy the paragraphs that actually mattered into a doc, paste the whole mess into an LLM and ask it to make sense of things. Then, a week later, do it again because whatever I was tracking had changed. At some point I stopped asking "how do I do this faster" and started asking why I was doing it by hand at all. ChatGPT and Perplexity are fine for a single question. They're worse at the part I actually needed help with, which was repetition: running the same research loop on a schedule, keeping a record of what changed, and getting a notification when it did. Neither tool is built to sit in the background and check on a topic for you. Plain scraping scripts have the opposite problem. They get you raw HTML, not understanding. You still have to strip out nav bars and footers by hand, and the moment you point one at a list-style page like Hacker News instead of a blog post, it falls apart. And bookmarking is just deferring the problem. A folder of forty saved links isn't research, it's homework you haven't done yet. I wanted something in between: automated enough to skip the tab-hoarding, but still producing something I could read and trust, not just a black-box answer. It's a modular CLI that runs the whole research loop, search, scrape, clean, synthesize, report, on its own, and stays lightweight enough to run on a laptop with no GPU and no database. A single run looks like this: you give it a topic and a focus area what you specifically want answered , it searches the web, pulls and cleans the pages, synthesizes a report, and writes it to disk. There's also a loop mode, so the same query can re-run every few hours and ping you on Discord or Telegram if you want to monitor something over time instead of researching it once. I deliberately didn't build this as one big script. It's five files, each doing one job, called in sequence: main.py → terminal UI and orchestration scraper.py → search + concurrent crawling + HTML parsing analyzer.py → synthesis AI or offline notifier.py → saving reports, sending alerts config manager → reading/writing settings main.py doesn't know anything about how scraping works internally, and scraper.py doesn't know anything about Discord webhooks. That separation made it much easier to add the offline summarizer later without touching the scraping code at all, and it's the kind of decision that only pays off once you try to change something six weeks in. Getting clean text out of arbitrary HTML. readability-lxml is good at finding "the article" inside a page, but it assumes the page is an article. Point it at a Hacker News thread or a GitHub repo listing and it often returns almost nothing, because there's no single article body to extract. The fix was to treat readability as the first attempt, not the only one: if it returns under 200 characters of usable text, the code falls back to a structural BeautifulSoup pass that looks for