Show HN: Dotdotduck – open-source Web Agent SDK Dotdotduck, an open-source Web Agent SDK that turns existing websites into AI-native sites by operating the DOM, was released on Hacker News. The SDK offers a palette UI, proactive offers, dwell-based interaction, immersive translation, and voice support, aiming to deflect customer-service tickets and reduce reliance on multiple vendors. Turn your existing site into an AI-native site. An embedded AI SDK that lives inside your page and operates the DOM — not a chatbot bolted to the corner. dddk-demo.mp4 Several physical entry points to send context into dddk. No new vocabulary to learn. - Most customer-service tickets are page-solvable. "How do I X" / "where do I Y" / "track my order" / "change my plan" — the answers all live on your site already; the gap is discoverability. A DOM-grounded agent that operates the page closes that gap. Deflect the easy 70% before they reach a human queue. - Proactive offers convert. Watching scroll · Dwell · time-on-page · last interaction lets the agent ask "want me to pull the tracking?" / "want a recommendation based on what you're looking at?" before the user thinks to ask. Subtitle-bar yes/no resolves in one keystroke — friction is the lowest physically possible. Same surface for cross-sell and upsell plays. - The palette is a UI surface, not just a text list. Each row's detail pane and PanelSkills inside the palette can render any Pieces tree — charts, tables, forms, mini-dashboards. That makes the palette a real productivity surface, not just a launcher: Finance — AAPL in the palette pulls a live price card + sparkline alongside the row. Customer service — type a question; the palette shows the matching FAQ entry with formatted answer inline, not a link to click. Tool-type SaaS — pack utilities regex tester, JSON formatter, unit converter, internal lookup straight into the palette so users never tab out. Same Ctrl+K , different verbs per product. - Long-press beats "screenshot + describe". With Dwell, the user holds an element, the agent gets selector + auto-screenshot in one gesture — chart, dashboard panel, table row, whatever. Users stop interrupting themselves to take a screenshot, paste it into chat, and write a paragraph explaining what they meant. Intent flows straight from finger to LLM. - Break the language wall with one palette command. Built-in immersive translate renders every paragraph of the current page bilingually side by side — one keystroke turns your English-only docs / knowledge base / product copy into a Chinese / Japanese / Korean / Spanish-readable surface. Batched into a handful of LLM calls per page a 200-paragraph article costs ~7 calls . For cross-border SaaS, content platforms, or any product serving multiple regions, that's one fewer translation-engineering project on the roadmap. - One SDK instead of stitching six vendors. Palette + agent + inline AI + voice + Dwell + proactive + analytics + immersive translate ship as one install. The conventional alternative is Algolia for search, Intercom for chat, Mixpanel for analytics, Whisper for voice, plus the brittle glue code between them. dddk is one dependency, one theme system, one intent stream. - Yes / no / multi-choice = free RL labels. Every Space-accept and double-Space-reject is a clean, intentional signal — what the user actually wanted vs didn't, said by the user, recorded with the original prompt. No more inferring from clickstream noise. The training set for whatever you fine-tune or evaluate next is already collected. - Voice doesn't stop at the browser. The same Voice + utility LLM shape powers IoT panels, kiosk terminals, service machines, and accessibility-first surfaces for elderly users or anyone who'd rather not type. One mental model across every device that has a microphone. Architectural rework of the webagent core. One breaking change coreActions is the default install, not all 12 builtin actions . Full notes: release-notes.md /PerhapxinLab/dotdotduck/blob/main/docs/v0.2.0/dddk/release-notes.md . Cost validation — done. gpt-5.4-nano runs the full monolithic webagent loop with the same task-success rate as gpt-5.4-mini at roughly an order of magnitude lower cost. That's the new default for webagent + plan roles on dddk.perhapxin.com https://dddk.perhapxin.com . Highlights: - ✅ TaskAgent — third agent kind alongside WebAgent + InlineAgent. Conversation + host-defined tool calling, no DOM, plain protocol. ask / streamAsk . Same AgentSession shape so multiple TaskAgents share conversation history when wired to the same session. - ✅ WebAgent multi-instance + shared sessions — dddk.sessions named-session registry + dddk.agents named-instance registry. Inject the same AgentSession into different WebAgents one persona per route and dddk.agents.setActive name on route change. - ✅ Opt-in action bundles — default install is coreActions 5: narrate / navigate / click / border / scroll to . Pass formActions / flowActions / extraActions to opt in. builtinActions kept as union for back-compat. Breaking change. - ✅ New actions — hold key , double click , long press , drag , press key extended with modifiers . narrate promoted from CoT-only primitive to first-class action in the registry. - ✅ Cursor on every action — cursorTrail: true now covers click / border / highlight / fill input / scroll to / narrate-with-about. scroll to swaps cursor glyph to a mouse-wheel icon mid-scroll. New API: moveCursorTo el , cursorPulse , setCursorMode 'pointer' | 'scroll' | 'reading' . - ✅ Planner sees the DOM — every planning call now receives a current-page snapshot in hostContext , so the planner can spot routes / links visible on the page even when the briefed sitemap missed them. Cap via plannerDomMaxLength default 8000 . - ✅ Navigate path validation — navigate rejects paths not in the sitemap and returns the valid path list to the LLM for retry. Stops the loop from chasing hallucinated paths into 404s. - ✅ Streaming envelope parser — scanner-based incremental JSON parser. Each action dispatches the moment its tool-args { } balances, instead of waiting for the outer envelope to close. Opt in via enableStreamingEnvelope: true . - ✅ Live registry — webagent.registerTool def → ToolHandle and webagent.registerContextProvider role, fn → ContextProviderHandle . Handle's remove unregisters; context-provider remove restores the SDK default rather than emptying the slot. - ✅ Context providers split — six slots url , page summary , dom , screenshot , history , selection with default providers SDK-installed in the WebAgent constructor. - ✅ InlineAgent scoping — inlineAgent.attachScope selector, config for per-region action sets. Innermost-wins on the selection's anchor element; callback fallback via setScopeResolver . - ✅ — agent-loop closure UI: onLoopEnd hook silent / text / feedback Space accepts · double-tap rejects · Esc nulls / ask user closing question with options . - ✅ — emitted whenever a tool handler returns agent tool failed intent event { ok: false } or throws. - ✅ Inline palette + rich rows — dddk.palette.mountInline host, opts? persistently embeds the palette inside a host element no backdrop . Ctrl/⌘+K raises the modal on top, close restores the inline. New PaletteItem.lines: string + image: string + submitButton: boolean . - ✅ Self-hosted analytics layer @perhapxin/dddk/analytics — IndexedDB-backed EventStore + toCSV / toNDJSON / toSQL exporters + function-based SqlSchemaMapper . Canonical dddk events DDL ships for SQLite / Postgres / MySQL. - ✅ Mini dashboard @perhapxin/dddk/analytics/dashboard — renderDashboard container, store mounts six vanilla-SVG charts. EN / zh-TW labels, optional auto-refresh. - ✅ Session-lifecycle hardening — hard reload F5 / Ctrl+R / Ctrl+Shift+R always clears session regardless of sessionContinuityMs ; default sessionContinuityMs flipped from 5 60 1000 to 0 each ask is its own session unless host opts in . - ✅ Subtitle click/tap = Space — single click on the subtitle surface accepts; double-click rejects. Mouse / touch / pen all work. Items consciously deferred from v0.2: Cross-type session sharing with full re-serialization — TaskAgent reading WebAgent's session already works CoT agent step turns are silently skipped ; the reverse WebAgent reading TaskAgent's plain-chat turns and re-wrapping them as CoT envelope shape is more work. Multi-agent delegation — a TaskAgent calling a WebAgent or vice versa via a tool. Workable; introduces orchestrator-routing complexity that wants real use-case validation first. buildMessages migration through provider registry — url / page summary / history / selection / screenshot are consulted via providers; dom is still inline because currentIndexMap for selector resolution is coupled to the call site. Untangling that is mechanical but careful refactor. TaskAgent tool-args incremental streaming — streamAsk already streams text deltas and toolCallStart / toolCallEnd markers. Streaming the tool arguments AS the LLM types them is on the roadmap. Cross-tab session share for TaskAgent — WebAgent already crosstabs; TaskAgent doesn't yet. v0.1.x bug fixes continue to ship on the v0.1.x branch. dotdotduck is in active development. It works, but expect rough edges. A few things up front: Clone the repo to evaluate properly. The bundled docs are useful as a map, but the source is the source of truth. git clone https://github.com/PerhapxinLab/dotdotduck into your project directory and read the code alongside the online docs https://dddk.perhapxin.com/docs — that's the recommended way to understand what's actually implemented. The docs are AI-drafted. They're written and maintained with Claude Code. They stay close to the code by convention, but if something looks wrong, grep the repo before assuming the docs are right. Found a bug or unclear behaviour? Open an issue at github.com/PerhapxinLab/dotdotduck/issues https://github.com/PerhapxinLab/dotdotduck/issues — one-liners help shape the roadmap. dddk.perhapxin.com https://dddk.perhapxin.com doubles as dotdotduck's official landing page AND as the real-world test bed for the package — every release ships first to this site and gets exercised end-to-end before being tagged. The standing challenge: serve the demo well using the smallest viable model at each role, so the same recipe holds up when other teams adopt dddk on a cost budget. Expect the model picks below to keep shifting as smaller checkpoints catch up. Current stack: 4-axis LLM router webagent / vision / utility / plan — host configures one model per role; the bundled demo runs OpenAI gpt-5.4-nano for the main agent loop and planner, gpt-5.4-mini for InlineAgent + voice cleanup. Speech-to-text → the browser's Web Speech API the SDK default; fine for demo, no SLA — production hosts wire transcribe with Whisper / Deepgram / etc. None of this is baked into @perhapxin/dddk . The package itself ships LLM provider adapters OpenAI / Google / proxy, plus any OpenAI-compatible vendor via baseURL — e.g. DeepSeek, Qwen, OpenRouter and a transcribe audio extension point. Bring your own keys, models, and ASR vendor — the SDK doesn't lock you in. What's new in v0.2.0 → release notes https://dddk.perhapxin.com/docs/v0.2.0/dddk/release-notes · migration guide https://dddk.perhapxin.com/docs/v0.2.0/dddk/migrating Full docs → dddk.perhapxin.com/docs https://dddk.perhapxin.com/docs/v0.2.0/dddk/overview Agent DOM-grounded loop + InlineAgent + sitemap + Memory → /dddk/agent https://dddk.perhapxin.com/docs/v0.2.0/dddk/agent/overview LLM providers + router + adapter registry → /dddk/llm https://dddk.perhapxin.com/docs/v0.2.0/dddk/llm/providers Skills system + evals → /dddk/skills https://dddk.perhapxin.com/docs/v0.2.0/dddk/skills/overview Modules voice / Dwell / inline / immersive translate / proactive / analytics → /dddk/modules https://dddk.perhapxin.com/docs/v0.2.0/dddk/modules/overview Toolbox search + recommend → /dddk/toolbox https://dddk.perhapxin.com/docs/v0.2.0/dddk/toolbox/overview Theming → /dddk/theming https://dddk.perhapxin.com/docs/v0.2.0/dddk/theming pnpm add @perhapxin/dddk or: npm i @perhapxin/dddk js import { DotDotDuck, OpenAIProvider } from '@perhapxin/dddk'; import '@perhapxin/dddk/styles.css'; const dddk = new DotDotDuck { llm: new OpenAIProvider { apiKey: import.meta.env.VITE OPENAI KEY, model: 'gpt-5.4-mini', } , siteName: 'YourSaaS', skills: { id: 'introduce', type: 'script', name: 'Tour the app', steps: { subtitle: 'Welcome ', action: t = t.spotlight '.hero' }, { subtitle: 'Here is pricing.', action: t = t.highlight '.pricing' , waitForUser: true }, , }, , } ; dddk.mount ; Press Ctrl/⌘+K , type /introduce , watch it run. The full quickstart guide https://dddk.perhapxin.com/docs/v0.2.0/dddk/quickstart-frameworks covers React / Vue / Svelte / Solid wiring. Everything visual reads from CSS custom properties — --dddk-bg , --dddk-accent , --dddk-radius , --dddk-font , and friends. Override at :root or scope inside any wrapper. :root { --dddk-accent: 6366f1; / your brand colour / --dddk-radius: 10px; --dddk-font: 'Inter', system-ui, sans-serif; } Dark mode is automatic: data-theme="dark" anywhere up the tree, OR @media prefers-color-scheme: dark — whichever fires first. Custom modes sepia, high-contrast, brand-specific work by overriding the same variables under a new selector. AGPL-3.0-or-later. See LICENSE /PerhapxinLab/dotdotduck/blob/main/LICENSE for the full text. Built by Perhapxin Team