{"slug": "build-a-minimal-webmcp-agent-with-playwright-and-gemini", "title": "Build a Minimal WebMCP Agent with Playwright and Gemini", "summary": "A developer built a minimal WebMCP agent using Playwright and the Gemini API to test WebMCP tools with stronger AI models. The agent uses Playwright to control a real Chrome browser, enabling tool discovery and execution beyond the limitations of the Model Context Tool Inspector. The project demonstrates how to wire Gemini's tool-calling capabilities with WebMCP in a Node.js environment.", "body_md": "WebMCP lets a web page expose tools that AI agents can discover and execute inside the browser. That sounds simple until you want to test those tools with a model outside the [Model Context Tool Inspector](https://chromewebstore.google.com/detail/webmcp-model-context-tool/gbpdfapgefenggkahomfgkhfehlcenpd) Chrome extension.\n\nA while ago, I built a [small puzzle game](https://dev.to/gramli/tower-before-dusk-i-built-a-puzzle-game-for-humans-and-ai-oao) that exposes WebMCP tools. I tested and debugged those tools using the Model Context Tool Inspector, which is great for quick experiments, the limitation is that it only gives access to a small set of lightweight Gemini models and I wanted to test the same WebMCP tools with stronger ones.\n\nMy first idea was to build another Chrome extension, but that felt like overkill. WebMCP tools need a real browser context: the browser must open the page directly, discover the tools and execute them inside the page. So instead of building another extension, I looked for something that could simply open Chrome and control the page.\n\nAnd that is where Playwright fits nicely.\n\nSo in this article, I will show how to create a simple agent that wires up the Gemini API with WebMCP through Playwright. Gemini requests a tool call and Playwright executes the matching WebMCP tool inside a real Chrome browser.\n\nFor this example, you need:\n\nThe first thing we need to do is [enable WebMCP in Chrome](https://developer.chrome.com/docs/ai/webmcp). WebMCP is still experimental, so for local development it must be enabled through a Chrome flag:\n\n```\nOpen Chrome and navigate to chrome://flags/#enable-webmcp-testing\nSet the flag to Enabled.\nRelaunch Chrome to apply the changes.\n```\n\nAfter that, we can create a small Node.js project:\n\n```\nmkdir custom-agent\ncd custom-agent\nnpm init -y\n```\n\nNext, install Playwright as a development dependency. I also use `tsx`\n\nto run TypeScript files directly and `dotenv`\n\nto read environment variables from a `.env`\n\nfile:\n\n```\nnpm install -D playwright tsx dotenv typescript @types/node\n```\n\nThis gives us everything we need to run TypeScript code, open Chrome and access environment variables.\n\nBecause the agent will also call an AI model, we need to install the Gemini SDK. For this example, I use `@google/genai`\n\n:\n\n```\nnpm install @google/genai\n```\n\nThe last preparation step is to add a script to `package.json`\n\n:\n\n```\n{\n  \"scripts\": {\n    \"agent\": \"tsx agent.ts\"\n  }\n}\n```\n\nThis command will run the `agent.ts`\n\nfile, where we will put the main logic.\n\nNow that the project is prepared, let’s create the first version of `agent.ts`\n\n. At this stage, I only want to check whether `modelContext`\n\nis available inside the browser page.\n\n``` js\nimport { chromium } from \"playwright\";\n\nconst gameUrl = process.argv[2] ?? \"http://localhost:5173\";\n\nasync function main() {\n  const context = await chromium.launchPersistentContext(\n    \"./.chrome-agent-profile\",\n    {\n      channel: \"chrome\",\n      headless: false,\n      args: [\"--enable-experimental-web-platform-features\"],\n    },\n  );\n\n  const page = await context.newPage();\n\n  await page.goto(gameUrl, { waitUntil: \"networkidle\" });\n\n  const result = await page.evaluate(() => ({\n    userAgent: navigator.userAgent,\n    hasNavigatorModelContext: \"modelContext\" in navigator,\n    hasDocumentModelContext: \"modelContext\" in document,\n  }));\n\n  console.log(result);\n}\n\nmain().catch((error) => {\n  console.error(error);\n  process.exit(1);\n});\n```\n\nThis code opens Chrome, navigates to the game page, and checks if `modelContext`\n\nexists on `navigator`\n\nor `document`\n\n.\n\nOne important detail is that I am not using the bundled Chromium from Playwright. Instead, I am opening the real Chrome installed on my machine by using `launchPersistentContext`\n\nwith `channel: \"chrome\"`\n\n. This matters because WebMCP is still experimental. In my case, the isolated Chromium browser did not discover the WebMCP tools correctly, while real Chrome with the enabled flag worked.\n\nNote: Because`launchPersistentContext`\n\ncreates a local Chrome profile, do not forget to add this folder to`.gitignore`\n\n:\n\n```\n.chrome-agent-profile/\n```\n\nThe profile can contain local browser data such as cache, cookies, and other Chrome state. It should not be committed to the repository.\n\nThe first check only tells us whether `modelContext`\n\nexists. The next step is to read the tools exposed by the page.\n\nWe can do that by calling `modelContext.getTools()`\n\ninside the `page.evaluate()`\n\nmethod:\n\n``` js\n  const result = await page.evaluate(async () => {\n    const modelContext = navigator.modelContext;\n\n    if (!modelContext) {\n      return {\n        hasModelContext: false,\n        tools: [],\n      };\n    }\n\n    const tools = await modelContext.getTools();\n\n    return {\n      hasModelContext: true,\n      tools: tools.map((tool) => ({\n        name: tool.name,\n        description: tool.description,\n        inputSchema: tool.inputSchema,\n        origin: tool.origin,\n      })),\n    };\n  });\n```\n\nThis code returns the list of tools exposed by the current page. For each tool, I print basic metadata such as the name, description, input schema and origin.\n\nAt this point, it is useful to print the result as formatted JSON:\n\n```\nconsole.log(JSON.stringify(result, null, 2));\n```\n\nThis makes it easier to verify that Chrome discovered the WebMCP tools correctly.\n\nReading tools is useful, but the real goal is to execute them. In my game, one of the exposed tools is called `getGameState`\n\n. It returns the current state of the puzzle, including the map, remaining moves and collected wood. For the first test, I can find this tool by name and execute it directly:\n\n``` js\nconst gameState = await page.evaluate(async () => {\n    const modelContext = (navigator as any).modelContext;\n\n  if (!modelContext) {\n    throw new Error(\"modelContext is empty\");\n  }\n\n  const tools = await modelContext.getTools();\n\n  const getGameStateTool = tools.find((tool: any) => tool.name === \"getGameState\");\n\n  if (!getGameStateTool) {\n    throw new Error(\"getGameState tool not found\");\n  }\n\n  return await modelContext.executeTool(getGameStateTool, \"{}\");\n});\n```\n\nThis proves that Playwright can open the page, access `modelContext`\n\n, find a WebMCP tool and execute it inside the browser context.\n\nHowever, hardcoding the tool execution like this is not ideal. The agent should be able to execute any tool by name, so I extracted the logic into a reusable helper function:\n\n``` python\nimport type { Page } from \"playwright\";\n\nexport async function executeWebMcpTool<T>(\n  page: Page,\n  toolName: string,\n  args: unknown,\n): Promise<T> {\n  return await page.evaluate(\n    async ({ toolName, args }) => {\n    const modelContext =\n        (document as any).modelContext ?? (navigator as any).modelContext;\n    if (!modelContext) {\n        throw new Error(\"Model Context API is not available\");\n    }\n\n    const tools = await modelContext.getTools();\n\n    const tool = tools.find((tool: any) => tool.name === toolName);\n\n    if (!tool) {\n        throw new Error(`Tool not found: ${toolName}`);\n    }\n\n    const result = await modelContext.executeTool(\n        tool,\n        JSON.stringify(args),\n    );\n\n    return result;\n    },\n    { toolName, args },\n);\n}\n```\n\nThis function receives a Playwright `Page`\n\n, the tool name and arguments. It then evaluates code inside the browser page, finds the matching WebMCP tool, serializes the arguments and executes the tool. With this helper, the Node.js code does not need to know the internal implementation of the page. It only needs the tool name and arguments.\n\nThat is the important bridge: Playwright controls Chrome, Chrome sees the WebMCP tools and our Node.js code can execute them.\n\nNote:In my setup,`navigator.modelContext`\n\nworked reliably, but WebMCP is still experimental, so in the reusable helper I check both`document.modelContext`\n\nand`navigator.modelContext`\n\n.\n\nNow we can connect the WebMCP tool execution with an AI model.\n\nFor this article, I want to keep the example small. The goal is not to build the full game-playing agent here. The goal is to prove the basic flow:\n\nThe full agent can build on top of this by sending the tool result back to the model and continuing the loop.\n\nFor this example, I use the `@google/genai`\n\npackage. We already installed it earlier, so now we can create a small service for communicating with Gemini.\n\nCreate a new file called `genai.service.ts`\n\n:\n\n``` js\nimport \"dotenv/config\";\nimport {\n  GoogleGenAI,\n  type Content,\n  type GenerateContentConfig,\n  type GenerateContentResponse,\n} from \"@google/genai\";\n\nexport type GenerateRequest = {\n  contents: Content[];\n  config?: GenerateContentConfig;\n};\n\nexport class GenaiService {\n  private readonly ai: GoogleGenAI;\n  private readonly model: string;\n\n  constructor(model: string = \"gemini-2.5-flash-lite\") {\n    this.model = model;\n    const apiKey = process.env.GEMINI_API_KEY;\n    if (!apiKey) {\n      throw new Error(\"Missing GEMINI_API_KEY in .env\");\n    }\n\n    this.ai = new GoogleGenAI({ apiKey });\n  }\n\n  public async generateContentAsync(\n    request: GenerateRequest,\n  ): Promise<GenerateContentResponse> {\n    const response = await this.ai.models.generateContent({\n      model: this.model,\n      contents: request.contents,\n      config: request.config,\n    });\n\n    return response;\n  }\n}\n```\n\nThe implementation is straightforward. The service reads `GEMINI_API_KEY`\n\nfrom the `.env`\n\nfile, creates an instance of `GoogleGenAI`\n\nand exposes one method called `generateContentAsync`\n\n.\n\nI also created a small `GenerateRequest`\n\ntype. The reason is simple: I only want to expose the properties that this example needs. The original SDK request type contains more options and for this proof of concept that would make the code harder to read.\n\nYou also need to create a `.env`\n\nfile:\n\n```\nGEMINI_API_KEY=your-api-key\n```\n\nDo not forget to add the `.env`\n\nfile to `.gitignore`\n\n, so you do not commit your API key to the repository.\n\nNow we can put everything together in `agent.ts`\n\n.\n\nIn this example, the tool definition is hardcoded. That keeps the proof of concept simple and easier to understand. In a more generic version, we could read WebMCP tools from the page and map them into Gemini tool declarations automatically. But that would add more code and I want this article to stay focused on the core idea.\n\n``` js\nimport { chromium, type Page } from \"playwright\";\nimport { GenaiService } from \"./genai.service\";\nimport {\n  FunctionCallingConfigMode,\n  type Content,\n  type Tool,\n} from \"@google/genai\";\n\nexport const tools: Tool[] = [\n  {\n    functionDeclarations: [\n      {\n        name: \"getGameState\",\n        description:\n          \"Get the current board. visibleMap rows run top-to-bottom; each character is x=0 onward. P=player, .=land, W=tree, ~=water, B=bridge, R=rock, and G=goal.\",\n        responseJsonSchema: {\n          type: \"object\",\n          properties: {\n            remainingMoves: { type: \"number\" },\n            wood: { type: \"number\" },\n            visibleMap: {\n              type: \"array\",\n              items: { type: \"string\" },\n            },\n          },\n          required: [\"remainingMoves\", \"wood\", \"visibleMap\"],\n        },\n      },\n    ],\n  },\n];\n\nconst gameUrl = process.argv[2] ?? \"https://tower-before-dusk.gramli.workers.dev\";\n\nasync function main() {\n  const aiService = new GenaiService();\n  const context = await chromium.launchPersistentContext(\n    \"./.chrome-agent-profile\",\n    {\n      channel: \"chrome\",\n      headless: false,\n      args: [\"--enable-experimental-web-platform-features\"],\n    },\n  );\n\n  const page = await context.newPage();\n  await page.goto(gameUrl, { waitUntil: \"networkidle\" });\n\n  const contents: Content[] = [\n    {\n      role: \"user\",\n      parts: [\n        {\n          text: \"Inspect the current Tower Before Dusk game state.\",\n        },\n      ],\n    },\n  ];\n\n  const response = await aiService.generateContentAsync({\n    contents,\n    config: {\n      tools,\n      toolConfig: {\n        functionCallingConfig: {\n          mode: FunctionCallingConfigMode.ANY,\n          allowedFunctionNames: [\"getGameState\"],\n        },\n      },\n    },\n  });\n\n  const functionCall = response.functionCalls?.[0];\n  if (!functionCall?.name) {\n    throw new Error(\"Gemini did not return a tool call\");\n  }\n  if (functionCall.name !== \"getGameState\") {\n    throw new Error(`Gemini requested an unknown tool: ${functionCall.name}`);\n  }\n\n  console.log(\"Gemini tool call:\", functionCall);\n\n  const gameState = await executeWebMcpTool(\n    page,\n    functionCall.name,\n    functionCall.args ?? {},\n  );\n\n  console.log(\"Tool result:\", gameState);\n}\n\nmain().catch((error) => {\n  console.error(error);\n  process.exitCode = 1;\n});\n\nexport async function executeWebMcpTool<T>(\n  page: Page,\n  toolName: string,\n  args: unknown,\n): Promise<T> {\n  return await page.evaluate(\n    async ({ toolName, args }) => {\n      const modelContext =\n        (document as any).modelContext ?? (navigator as any).modelContext;\n      if (!modelContext) {\n        throw new Error(\"Model Context API is not available\");\n      }\n\n      const tools = await modelContext.getTools();\n\n      const tool = tools.find((tool: any) => tool.name === toolName);\n\n      if (!tool) {\n        throw new Error(`Tool not found: ${toolName}`);\n      }\n\n      const result = await modelContext.executeTool(tool, JSON.stringify(args));\n\n      return result;\n    },\n    { toolName, args },\n  );\n}\n```\n\nThe flow is simple:\n\nFirst, the script opens Chrome and navigates to the game page. Then it sends a prompt to Gemini together with the available tool definition. In this example, Gemini is allowed to call only one function: `getGameState`\n\n.\n\nAfter Gemini returns a function call, the script validates that the requested function is really `getGameState`\n\n. This is important because the application should never blindly execute arbitrary tool names returned by the model. Then the script passes the function name and arguments to `executeWebMcpTool`\n\n. The tool is executed inside the browser page through WebMCP and the result is printed to the console.\n\nAnd that is the proof of concept.\n\nOur Node.js script does not call the game directly. It opens the game in Chrome, lets Chrome discover the WebMCP tools, lets Gemini request a function call and then executes that function call against the page.\n\nThis small example proves that Playwright can be used as a bridge between an AI model and WebMCP tools.\n\nThe browser still owns the WebMCP context. The page still exposes the tools, but our external Node.js process can orchestrate the flow and connect those tools to a stronger model.\n\nThis is useful when the existing browser-based tooling is too limited, or when you want to experiment with your own agent loop.\n\nThe example in this article only executes one tool call. A real agent would need a loop:\n\nThat full implementation would make this article much longer, so I kept the article focused on the proof of concept.\n\nYou can find the source code here:\n\nIn this article, I showed how to use Playwright to create a custom proof of concept agent for WebMCP. First, I checked whether `modelContext`\n\nis available, then I discovered the exposed tools, executed one of them and finally connected the flow with Gemini function calling.\n\nOf course, this is not a fully autonomous agent yet, but it is the foundation for one.\n\nWebMCP is still experimental and the Model Context Tool Inspector is great for debugging. However, the available models can feel limiting for some types of web apps. I hope this approach can help others test WebMCP tools with stronger models without the need to create another Chrome extension.", "url": "https://wpnews.pro/news/build-a-minimal-webmcp-agent-with-playwright-and-gemini", "canonical_source": "https://dev.to/gramli/build-a-minimal-webmcp-agent-with-playwright-and-gemini-24fh", "published_at": "2026-07-01 06:58:36+00:00", "updated_at": "2026-07-01 07:18:38.767045+00:00", "lang": "en", "topics": ["ai-agents", "developer-tools", "large-language-models", "generative-ai"], "entities": ["Playwright", "Gemini", "WebMCP", "Model Context Tool Inspector", "Chrome", "Node.js", "Google"], "alternates": {"html": "https://wpnews.pro/news/build-a-minimal-webmcp-agent-with-playwright-and-gemini", "markdown": "https://wpnews.pro/news/build-a-minimal-webmcp-agent-with-playwright-and-gemini.md", "text": "https://wpnews.pro/news/build-a-minimal-webmcp-agent-with-playwright-and-gemini.txt", "jsonld": "https://wpnews.pro/news/build-a-minimal-webmcp-agent-with-playwright-and-gemini.jsonld"}}