{"slug": "show-hn-dotdotduck-open-source-web-agent-sdk", "title": "Show HN: Dotdotduck – open-source Web Agent SDK", "summary": "Dotdotduck, an open-source Web Agent SDK that turns existing websites into AI-native sites by operating the DOM, was released on Hacker News. The SDK offers a palette UI, proactive offers, dwell-based interaction, immersive translation, and voice support, aiming to deflect customer-service tickets and reduce reliance on multiple vendors.", "body_md": "**Turn your existing site into an AI-native site.**\n\nAn embedded AI SDK that lives inside your page and operates the DOM — not a chatbot bolted to the corner.\n\n## dddk-demo.mp4\n\nSeveral physical entry points to send context into dddk. No new vocabulary to learn.\n\n-\n**Most customer-service tickets are page-solvable.**\"How do I X\" / \"where do I Y\" / \"track my order\" / \"change my plan\" — the answers all live on your site already; the gap is discoverability. A DOM-grounded agent that*operates the page*closes that gap. Deflect the easy 70% before they reach a human queue. -\n**Proactive offers convert.** Watching scroll · Dwell · time-on-page · last interaction lets the agent ask*\"want me to pull the tracking?\"*/*\"want a recommendation based on what you're looking at?\"*before the user thinks to ask. Subtitle-bar yes/no resolves in one keystroke — friction is the lowest physically possible. Same surface for cross-sell and upsell plays. -\n**The palette is a UI surface, not just a text list.** Each row's detail pane (and PanelSkills inside the palette) can render any**Pieces** tree — charts, tables, forms, mini-dashboards. That makes the palette a real productivity surface, not just a launcher:**Finance**—`AAPL`\n\nin the palette pulls a live price card + sparkline alongside the row.**Customer service**— type a question; the palette shows the matching FAQ entry with formatted answer inline, not a link to click.** Tool-type SaaS**— pack utilities (regex tester, JSON formatter, unit converter, internal lookup) straight into the palette so users never tab out. Same`Ctrl+K`\n\n, different verbs per product.\n\n-\n**Long-press beats \"screenshot + describe\".** With Dwell, the user holds an element, the agent gets selector + auto-screenshot in one gesture — chart, dashboard panel, table row, whatever. Users stop interrupting themselves to take a screenshot, paste it into chat, and write a paragraph explaining what they meant. Intent flows straight from finger to LLM. -\n**Break the language wall with one palette command.** Built-in immersive translate renders every paragraph of the current page bilingually side by side — one keystroke turns your English-only docs / knowledge base / product copy into a Chinese / Japanese / Korean / Spanish-readable surface. Batched into a handful of LLM calls per page (a 200-paragraph article costs ~7 calls). For cross-border SaaS, content platforms, or any product serving multiple regions, that's one fewer translation-engineering project on the roadmap. -\n**One SDK instead of stitching six vendors.** Palette + agent + inline AI + voice + Dwell + proactive + analytics + immersive translate ship as one install. The conventional alternative is Algolia for search, Intercom for chat, Mixpanel for analytics, Whisper for voice, plus the brittle glue code between them. dddk is one dependency, one theme system, one intent stream. -\n**Yes / no / multi-choice = free RL labels.** Every Space-accept and double-Space-reject is a clean, intentional signal — what the user actually wanted vs didn't, said by the user, recorded with the original prompt. No more inferring from clickstream noise. The training set for whatever you fine-tune or evaluate next is already collected. -\n**Voice doesn't stop at the browser.** The same`Voice`\n\n+`utility`\n\nLLM shape powers IoT panels, kiosk terminals, service machines, and accessibility-first surfaces for elderly users or anyone who'd rather not type. One mental model across every device that has a microphone.\n\nArchitectural rework of the webagent core. One breaking change (`coreActions`\n\nis the default install, not all 12 builtin actions). Full notes: [release-notes.md](/PerhapxinLab/dotdotduck/blob/main/docs/v0.2.0/dddk/release-notes.md).\n\n**Cost validation — done.** `gpt-5.4-nano`\n\nruns the full monolithic webagent loop with the same task-success rate as `gpt-5.4-mini`\n\nat roughly an order of magnitude lower cost. That's the new default for `webagent`\n\n+ `plan`\n\nroles on [dddk.perhapxin.com](https://dddk.perhapxin.com).\n\n**Highlights:**\n\n- ✅\n**TaskAgent**— third agent kind alongside WebAgent + InlineAgent. Conversation + host-defined tool calling, no DOM, plain protocol.`ask()`\n\n/`streamAsk()`\n\n. Same`AgentSession`\n\nshape so multiple TaskAgents share conversation history when wired to the same session. - ✅\n**WebAgent multi-instance + shared sessions**—`dddk.sessions`\n\nnamed-session registry +`dddk.agents`\n\nnamed-instance registry. Inject the same`AgentSession`\n\ninto different WebAgents (one persona per route) and`dddk.agents.setActive(name)`\n\non route change. - ✅\n**Opt-in action bundles**— default install is`coreActions`\n\n(5: narrate / navigate / click / border / scroll_to). Pass`formActions`\n\n/`flowActions`\n\n/`extraActions`\n\nto opt in.`builtinActions`\n\nkept as union for back-compat. (Breaking change.) - ✅\n**New actions**—`hold_key`\n\n,`double_click`\n\n,`long_press`\n\n,`drag`\n\n,`press_key`\n\nextended with`modifiers`\n\n.`narrate`\n\npromoted from CoT-only primitive to first-class action in the registry. - ✅\n**Cursor on every action**—`cursorTrail: true`\n\nnow covers click / border / highlight / fill_input / scroll_to / narrate-with-about.`scroll_to`\n\nswaps cursor glyph to a mouse-wheel icon mid-scroll. New API:`moveCursorTo(el)`\n\n,`cursorPulse()`\n\n,`setCursorMode('pointer' | 'scroll' | 'reading')`\n\n. - ✅\n**Planner sees the DOM**— every planning call now receives a current-page snapshot in`hostContext`\n\n, so the planner can spot routes / links visible on the page even when the briefed sitemap missed them. Cap via`plannerDomMaxLength`\n\n(default 8000). - ✅\n**Navigate path validation**—`navigate`\n\nrejects paths not in the sitemap and returns the valid path list to the LLM for retry. Stops the loop from chasing hallucinated paths into 404s. - ✅\n**Streaming envelope parser**— scanner-based incremental JSON parser. Each action dispatches the moment its tool-args`{ }`\n\nbalances, instead of waiting for the outer envelope to close. Opt in via`enableStreamingEnvelope: true`\n\n. - ✅\n**Live registry**—`webagent.registerTool(def) → ToolHandle`\n\nand`webagent.registerContextProvider(role, fn) → ContextProviderHandle`\n\n. Handle's`remove()`\n\nunregisters; context-provider remove restores the SDK default rather than emptying the slot. - ✅\n**Context providers split**— six slots (`url`\n\n,`page_summary`\n\n,`dom`\n\n,`screenshot`\n\n,`history`\n\n,`selection`\n\n) with default providers SDK-installed in the WebAgent constructor. - ✅\n**InlineAgent scoping**—`inlineAgent.attachScope(selector, config)`\n\nfor per-region action sets. Innermost-wins on the selection's anchor element; callback fallback via`setScopeResolver`\n\n. - ✅\n— agent-loop closure UI:`onLoopEnd`\n\nhook`silent`\n\n/`text`\n\n/`feedback`\n\n(Space accepts · double-tap rejects · Esc nulls) /`ask_user`\n\n(closing question with options). - ✅\n— emitted whenever a tool handler returns`agent_tool_failed`\n\nintent event`{ ok: false }`\n\nor throws. - ✅\n**Inline palette + rich rows**—`dddk.palette.mountInline(host, opts?)`\n\npersistently embeds the palette inside a host element (no backdrop). Ctrl/⌘+K raises the modal on top, close restores the inline. New`PaletteItem.lines: string[]`\n\n+`image: string`\n\n+`submitButton: boolean`\n\n. - ✅\n**Self-hosted analytics layer**(`@perhapxin/dddk/analytics`\n\n) — IndexedDB-backed`EventStore`\n\n+`toCSV`\n\n/`toNDJSON`\n\n/`toSQL`\n\nexporters + function-based`SqlSchemaMapper`\n\n. Canonical`dddk_events`\n\nDDL ships for SQLite / Postgres / MySQL. - ✅\n**Mini dashboard**(`@perhapxin/dddk/analytics/dashboard`\n\n) —`renderDashboard(container, store)`\n\nmounts six vanilla-SVG charts. EN / zh-TW labels, optional auto-refresh. - ✅\n**Session-lifecycle hardening**— hard reload (F5 / Ctrl+R / Ctrl+Shift+R) always clears session regardless of`sessionContinuityMs`\n\n; default`sessionContinuityMs`\n\nflipped from`5 * 60 * 1000`\n\nto`0`\n\n(each ask is its own session unless host opts in). - ✅\n**Subtitle click/tap = Space**— single click on the subtitle surface accepts; double-click rejects. Mouse / touch / pen all work.\n\nItems consciously deferred from v0.2:\n\n**Cross-type session sharing with full re-serialization**— TaskAgent reading WebAgent's session already works (CoT`agent_step`\n\nturns are silently skipped); the reverse (WebAgent reading TaskAgent's plain-chat turns and re-wrapping them as CoT envelope shape) is more work.**Multi-agent delegation**— a TaskAgent calling a WebAgent (or vice versa) via a tool. Workable; introduces orchestrator-routing complexity that wants real use-case validation first.**buildMessages migration through provider registry**—`url`\n\n/`page_summary`\n\n/`history`\n\n/`selection`\n\n/`screenshot`\n\nare consulted via providers;`dom`\n\nis still inline because`currentIndexMap`\n\nfor selector resolution is coupled to the call site. Untangling that is mechanical but careful refactor.**TaskAgent tool-args incremental streaming**—`streamAsk`\n\nalready streams text deltas and toolCallStart / toolCallEnd markers. Streaming the tool arguments AS the LLM types them is on the roadmap.**Cross-tab session share for TaskAgent**— WebAgent already crosstabs; TaskAgent doesn't yet.\n\nv0.1.x bug fixes continue to ship on the `v0.1.x`\n\nbranch.\n\ndotdotduck is in active development. It works, but expect rough edges. A few things up front:\n\n**Clone the repo to evaluate properly.** The bundled docs are useful as a map, but the source is the source of truth.`git clone https://github.com/PerhapxinLab/dotdotduck`\n\ninto your project directory and read the code alongside the[online docs](https://dddk.perhapxin.com/docs)— that's the recommended way to understand what's actually implemented.**The docs are AI-drafted.** They're written and maintained with Claude Code. They stay close to the code by convention, but if something looks wrong, grep the repo before assuming the docs are right.**Found a bug or unclear behaviour?** Open an issue at[github.com/PerhapxinLab/dotdotduck/issues](https://github.com/PerhapxinLab/dotdotduck/issues)— one-liners help shape the roadmap.\n\n[dddk.perhapxin.com](https://dddk.perhapxin.com) doubles as dotdotduck's official landing page AND as the real-world test bed for the package — every release ships first to this site and gets exercised end-to-end before being tagged. The standing challenge: serve the demo well using the **smallest viable model** at each role, so the same recipe holds up when other teams adopt dddk on a cost budget. Expect the model picks below to keep shifting as smaller checkpoints catch up.\n\nCurrent stack:\n\n**4-axis LLM router**(`webagent`\n\n/`vision`\n\n/`utility`\n\n/`plan`\n\n) — host configures one model per role; the bundled demo runs OpenAI`gpt-5.4-nano`\n\nfor the main agent loop and planner,`gpt-5.4-mini`\n\nfor InlineAgent + voice cleanup.**Speech-to-text**→ the browser's Web Speech API (the SDK default; fine for demo, no SLA — production hosts wire`transcribe`\n\nwith Whisper / Deepgram / etc.)\n\nNone of this is baked into `@perhapxin/dddk`\n\n. The package itself ships LLM provider adapters (OpenAI / Google / proxy, plus any OpenAI-compatible vendor via `baseURL`\n\n— e.g. DeepSeek, Qwen, OpenRouter) and a `transcribe(audio)`\n\nextension point. Bring your own keys, models, and ASR vendor — the SDK doesn't lock you in.\n\n**What's new in v0.2.0**→[release notes](https://dddk.perhapxin.com/docs/v0.2.0/dddk/release-notes)·[migration guide](https://dddk.perhapxin.com/docs/v0.2.0/dddk/migrating)**Full docs**→[dddk.perhapxin.com/docs](https://dddk.perhapxin.com/docs/v0.2.0/dddk/overview)** Agent**(DOM-grounded loop + InlineAgent + sitemap + Memory) →[/dddk/agent](https://dddk.perhapxin.com/docs/v0.2.0/dddk/agent/overview)**LLM** providers + router + adapter registry →[/dddk/llm](https://dddk.perhapxin.com/docs/v0.2.0/dddk/llm/providers)**Skills** system + evals →[/dddk/skills](https://dddk.perhapxin.com/docs/v0.2.0/dddk/skills/overview)**Modules**(voice / Dwell / inline / immersive translate / proactive / analytics) →[/dddk/modules](https://dddk.perhapxin.com/docs/v0.2.0/dddk/modules/overview)**Toolbox**(search + recommend) →[/dddk/toolbox](https://dddk.perhapxin.com/docs/v0.2.0/dddk/toolbox/overview)** Theming**→[/dddk/theming](https://dddk.perhapxin.com/docs/v0.2.0/dddk/theming)\n\n```\npnpm add @perhapxin/dddk\n# or: npm i @perhapxin/dddk\njs\nimport { DotDotDuck, OpenAIProvider } from '@perhapxin/dddk';\nimport '@perhapxin/dddk/styles.css';\n\nconst dddk = new DotDotDuck({\n  llm: new OpenAIProvider({\n    apiKey: import.meta.env.VITE_OPENAI_KEY,\n    model: 'gpt-5.4-mini',\n  }),\n  siteName: 'YourSaaS',\n  skills: [\n    {\n      id: 'introduce',\n      type: 'script',\n      name: 'Tour the app',\n      steps: [\n        { subtitle: 'Welcome!', action: (t) => t.spotlight('.hero') },\n        { subtitle: 'Here is pricing.', action: (t) => t.highlight('.pricing'), waitForUser: true },\n      ],\n    },\n  ],\n});\n\ndddk.mount();\n```\n\nPress `Ctrl/⌘+K`\n\n, type `/introduce`\n\n, watch it run. The full [quickstart guide](https://dddk.perhapxin.com/docs/v0.2.0/dddk/quickstart-frameworks) covers React / Vue / Svelte / Solid wiring.\n\nEverything visual reads from CSS custom properties — `--dddk-bg`\n\n, `--dddk-accent`\n\n, `--dddk-radius`\n\n, `--dddk-font`\n\n, and friends. Override at `:root`\n\nor scope inside any wrapper.\n\n```\n:root {\n  --dddk-accent: #6366f1;       /* your brand colour */\n  --dddk-radius: 10px;\n  --dddk-font: 'Inter', system-ui, sans-serif;\n}\n```\n\nDark mode is automatic: `[data-theme=\"dark\"]`\n\nanywhere up the tree, OR `@media (prefers-color-scheme: dark)`\n\n— whichever fires first. Custom modes (sepia, high-contrast, brand-specific) work by overriding the same variables under a new selector.\n\nAGPL-3.0-or-later. See [LICENSE](/PerhapxinLab/dotdotduck/blob/main/LICENSE) for the full text.\n\nBuilt by Perhapxin Team", "url": "https://wpnews.pro/news/show-hn-dotdotduck-open-source-web-agent-sdk", "canonical_source": "https://github.com/PerhapxinLab/dotdotduck", "published_at": "2026-06-29 06:38:38+00:00", "updated_at": "2026-06-29 06:58:50.481054+00:00", "lang": "en", "topics": ["ai-agents", "ai-tools", "developer-tools", "large-language-models", "generative-ai"], "entities": ["Dotdotduck", "Hacker News", "Algolia", "Intercom", "Mixpanel", "Whisper", "gpt-5.4-nano", "gpt-5.4-mini"], "alternates": {"html": "https://wpnews.pro/news/show-hn-dotdotduck-open-source-web-agent-sdk", "markdown": "https://wpnews.pro/news/show-hn-dotdotduck-open-source-web-agent-sdk.md", "text": "https://wpnews.pro/news/show-hn-dotdotduck-open-source-web-agent-sdk.txt", "jsonld": "https://wpnews.pro/news/show-hn-dotdotduck-open-source-web-agent-sdk.jsonld"}}