{"slug": "linkloom-aiwebreader", "title": "Linkloom - AIWebReader", "summary": "LinkLoom, a new TypeScript/Bun toolkit, provides a unified API, CLI, and MCP server for converting web pages, PDFs, and iframes into clean markdown. The tool handles JavaScript-heavy pages through a stealth headless browser, extracts content from nested iframes, and converts PDFs to structured text or plain text without external dependencies.", "body_md": "A web scraping and content extraction toolkit for TypeScript/Bun.\n\nPass a URL, get clean markdown. That's the core. But LinkLoom also handles the cases that break simple scrapers: JavaScript-heavy pages rendered through a stealth browser, PDFs parsed into structured text, iframes pulled from nested frames, HTML tables converted to markdown tables, links extracted and classified. It exposes a library API, a CLI, and an MCP server — so you can use it from code, from the terminal, or from an AI client like Claude Desktop or Cursor.\n\nThe full list: URL-to-markdown conversion, HTML-to-markdown via Readability + Turndown, PDF-to-markdown via pdf.js, headless browser rendering through Camoufox (stealth Firefox on Playwright), iframe extraction with configurable wait strategies, link extraction and classification, table scraping, text embeddings via OpenAI or Gemini, a CLI for every feature, and an MCP server for AI tool-use workflows.\n\nBuilt with Bun, Camoufox, JSDOM, Readability, Turndown, and pdf.js-extract. Optional embedding support through LangChain.\n\n``` js\nimport { convertLinkToMarkdown } from \"linkloom\";\n\nconst markdown = await convertLinkToMarkdown(\"https://example.com\");\n```\n\nThat's it. One import, one call. The function auto-detects whether the URL points to an HTML page or a PDF and routes it to the right converter. You get back a string of clean markdown — no boilerplate, no configuration objects, no setup ceremony.\n\nThe CLI equivalent:\n\n```\nbunx @boris.barac/linkloom scrape https://example.com\n```\n\nSame result, different interface. Pipe it, redirect it, pass `-o output.md`\n\nto write to a file.\n\nBut plenty of pages don't hand you their content on the first request. They render everything with JavaScript — SPAs, dashboards, dynamically loaded articles. A simple fetch returns an empty shell. LinkLoom handles this through headless browser rendering via Camoufox, a stealth Firefox build on Playwright that avoids bot detection.\n\n``` js\nimport { renderers } from \"linkloom\";\n\nconst browser = await renderers.puppeterRendered.initialize();\nconst result = await renderers.puppeterRendered.renderPage(browser, url, {\n  timeout: 15000,\n  waitUntil: \"networkidle\",\n  viewport: { width: 1920, height: 1080 },\n  frames: { enabled: true, timeout: 5000 },\n});\nawait browser.close();\n```\n\nThe `renderPage`\n\nfunction loads the URL in a real browser, waits for the network to settle (or for a specific event), and returns the rendered HTML. The `frames`\n\noption tells it to also extract content from nested iframes — with its own timeout, because iframes load on their own schedule and you don't want one slow frame to block everything.\n\nThe CLI version:\n\n```\nbunx @boris.barac/linkloom render https://example.com --wait-until networkidle --timeout 15000\n```\n\nAdd `--selector \"table.stats\"`\n\nto extract only a specific element instead of the full page. Useful when you know exactly what you're after.\n\nThen there are PDFs. Research papers, technical reports, product documentation — a surprising amount of the web's useful content lives in PDFs, not HTML pages. The same `convertLinkToMarkdown`\n\ncall handles both, but you can also convert PDFs directly:\n\n``` js\nimport { pdfConverter } from \"linkloom\";\nimport { readFile } from \"node:fs/promises\";\n\nconst buffer = await readFile(\"document.pdf\");\nconst markdown = await pdfConverter.convertPdfToMarkdown(buffer);\nconst text = await pdfConverter.convertPdfToText(buffer);\n```\n\nTwo output modes: `convertPdfToMarkdown`\n\npreserves structure (headings, lists, formatting), while `convertPdfToText`\n\nstrips everything down to plain text. Pick whichever fits your pipeline.\n\nThe CLI:\n\n```\nbunx @boris.barac/linkloom pdf document.pdf -o output.md\n```\n\nUnder the hood it uses pdf.js-extract to parse the binary, so there's no external dependency on system tools like `pdftotext`\n\n. It works out of the box.\n\nContent conversion is half the job. The other half is pulling structured data out of pages — links, tables, the things that aren't prose.\n\n**Link extraction** finds and classifies URLs from plain text or HTML. Feed it a string and it returns every link, tagged as a PDF or a regular page:\n\n``` js\nimport { linkExtraction } from \"linkloom\";\n\nconst links = linkExtraction.extractLinks(\"check https://example.com/doc.pdf\");\nconst pdfLinks = await linkExtraction.extractDownloadLinksFromHtml(htmlContent);\n```\n\n`extractLinks`\n\nworks on raw text — it finds URLs and classifies them. `extractDownloadLinksFromHtml`\n\nparses an HTML document and pulls out links that point to downloadable files (PDFs, mostly). Useful when you're crawling a page and want to know which links lead to documents worth converting.\n\n**Table extraction** renders a page in the headless browser and pulls out HTML tables as structured data:\n\n``` js\nimport { tableExtraction, renderers } from \"linkloom\";\n\nconst browser = await renderers.puppeterRendered.initialize();\nconst data = await tableExtraction.extractTableData(browser, url, \"table\");\nconst md = tableExtraction.tableDataToMarkdownTable(data);\nawait browser.close();\n```\n\nThe third argument is a CSS selector — pass `\"table\"`\n\nfor all tables, or `\"table.stats\"`\n\nfor a specific one. The output is a markdown table string, ready to drop into a document.\n\nThe CLI shortcuts:\n\n```\nbunx @boris.barac/linkloom links https://example.com\nbunx @boris.barac/linkloom tables https://example.com/data --selector \"table.stats\"\n```\n\nAll of this is also available as an MCP server. If you use Claude Desktop, Cursor, or any MCP-compatible client, you can expose LinkLoom's tools without writing code — the AI calls them directly.\n\nSix tools: `scrape`\n\n, `html_to_markdown`\n\n, `pdf_to_markdown`\n\n, `render_page`\n\n, `extract_links`\n\n, `extract_tables`\n\n. Same capabilities as the library and CLI, but surfaced as tool calls an AI agent can use autonomously.\n\nConfiguration is a few lines of JSON. For Claude Desktop, edit `~/Library/Application Support/Claude/claude_desktop_config.json`\n\n:\n\n```\n{\n  \"mcpServers\": {\n    \"linkloom\": {\n      \"command\": \"bun\",\n      \"args\": [\"x\", \"@boris.barac/linkloom\", \"mcp\"]\n    }\n  }\n}\n```\n\nFor Cursor, add the same block to `.cursor/mcp.json`\n\nin your project or `~/.cursor/mcp.json`\n\nglobally. Any MCP client — point it at `bun x @boris.barac/linkloom mcp`\n\nand it works.\n\nThe server communicates over stdio. It reads JSON-RPC from stdin and writes responses to stdout. You don't run it directly; MCP clients spawn it as a child process. If you want to test it interactively, there's the MCP Inspector:\n\n```\nbunx @modelcontextprotocol/inspector bunx @boris.barac/linkloom mcp\n```\n\nThat opens a web UI where you can browse the available tools, call them with custom parameters, and inspect the JSON-RPC messages going back and forth.\n\n```\nbun add @boris.barac/linkloom\n```\n\nOr skip the install and use it directly:\n\n```\nbunx @boris.barac/linkloom scrape https://example.com\n```\n\nNo API keys needed for the core scraping pipeline. Only the optional text embedding feature requires an OpenAI or Gemini key.", "url": "https://wpnews.pro/news/linkloom-aiwebreader", "canonical_source": "https://dev.to/boris9027/linkloom-aiwebreader-7gl", "published_at": "2026-06-12 16:24:17+00:00", "updated_at": "2026-06-12 16:41:05.627520+00:00", "lang": "en", "topics": ["ai-tools", "ai-infrastructure"], "entities": ["Linkloom", "AIWebReader", "Bun", "Camoufox", "Playwright", "JSDOM", "Readability", "Turndown"], "alternates": {"html": "https://wpnews.pro/news/linkloom-aiwebreader", "markdown": "https://wpnews.pro/news/linkloom-aiwebreader.md", "text": "https://wpnews.pro/news/linkloom-aiwebreader.txt", "jsonld": "https://wpnews.pro/news/linkloom-aiwebreader.jsonld"}}