How to Automate the ChatGPT & Gemini Web UIs Without an API Key A developer built a library called Hermex to automate the free web UIs of ChatGPT and Gemini without using an API key. The library handles challenges like sending messages character by character and uploading files by manipulating hidden input elements. It uses Selenium with undetected-chromedriver to drive the single-page apps. You've got a folder of a few hundred screenshots and you want the text out of each one. Or you want to generate a batch of images for a side project. Or you just want to drop a single "summarize this" call into a script you're writing on a Sunday afternoon. So you open the pricing page for the official API, do the math on per-token billing plus setting up keys and a payment method, and it's hard to justify, because the exact same model will do the exact same thing for free in a browser tab. There are really two ways to get a model like ChatGPT or Gemini to do work for you. The web UI is free, or already covered by a subscription you're paying for anyway, but you drive it by hand. The API is scriptable, but you pay by the token. Most of the time that trade-off is fine. But for a whole category of work like hobby projects, throwaway scripts, research, or anything that doesn't need production-grade reliability, you're stuck picking between "free but manual" and "automated but paid." Which raises the obvious question: why not automate the free web UI? It's just a webpage. You open it, type in the box, click send. It turns out that hides a few fiddly problems, which I ran into enough times that I eventually built a small library https://github.com/pseudo-usama/hermex for them. In this article we'll work through what it takes to automate these UIs, and at the end I'll show how little code it comes down to. A single round trip with ChatGPT or Gemini breaks down into four jobs: Every one of these is harder than it sounds, because the page is a modern single-page app that was never built to be driven by a script. We'll use Selenium with undetected-chromedriver, and for now assume the browser is already open we'll get to launching it in the next section . To keep the code readable I'll show whichever of the two platforms makes each problem clearest, and mention the other where it differs. The first surprise is that the input isn't a normal text field you can drop a string into. On ChatGPT it's a contenteditable div, and on Gemini it's a custom rich-textarea element. You can still send keystrokes to it, but two things will trip you up. A plain Enter submits the message, so any newline inside your prompt has to go in as Shift+Enter. And emoji and other characters outside the basic range quietly break send keys, so those need to be inserted through JavaScript instead. That pushes you toward sending the message one character at a time: python from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys box = driver.find element By.CSS SELECTOR, 'div contenteditable="true" ' box.click for char in message: if char == "\n": A plain Enter would send the message early box.send keys Keys.SHIFT, Keys.ENTER else: box.send keys char Gemini works the same way, just against the rich-textarea element instead of the contenteditable div. This is where it gets interesting. The file