How I automated markdown docs from UI screenshots using AI A developer built a Python script that converts UI screenshots into markdown documentation using any OpenAI-compatible AI model. The script, which is model-agnostic and self-hostable, was created to automate documenting a React component library with 40+ components. It avoids vendor lock-in and high costs by allowing users to plug in different AI endpoints. Last month I was knee-deep in documenting a React component library I’d been building for six months. The library had 40+ components, each with 5–10 props, and I wanted to show actual UI screenshots alongside code examples. Taking those screenshots manually was a drag — but so was writing alt text and prop tables from scratch. I thought: surely there’s a tool that turns a screenshot into a markdown snippet with the component name, props, and description. So I went hunting. First, I tried the obvious: OCR + regex. Take a screenshot, run Tesseract, then parse the text for component names and props. That failed miserably because: Next, I looked at cloud-based AI documentation generators. Most required me to upload my entire component library, integrate with their SDK, and pay per component. I didn’t want vendor lock-in. I also didn’t want to share my codebase with a third party just to get docs. Then I tried a public multimodal model API like OpenAI’s GPT-4o. It worked — but the cost stacked up fast when processing 40+ screenshots multiple times during iteration. Plus, managing API keys and tokens for every teammate became a mess. I needed something cheap, self-hostable, and flexible. The idea was: write a small Python script that reads a screenshot file, sends it to any AI model that accepts images, and returns structured markdown. The script itself is the star — the AI endpoint is just a pluggable option. Here’s the approach: The key is that the same script works with OpenAI, Claude, local models via Ollama, or even a custom endpoint like the one at ai.interwestinfo.com I tried it as a fallback . The technique is model-agnostic. bash /usr/bin/env python3 """ Screenshot to Markdown documentation generator. Works with any OpenAI-compatible API. """ import os import sys import base64 import requests from pathlib import Path def encode image image path : with open image path, "rb" as f: return base64.b64encode f.read .decode "utf-8" def image to markdown image path, api key, endpoint="https://api.openai.com/v1/chat/completions" : """Convert an image to markdown via an AI model.""" base64 image = encode image image path prompt = "You are a UI documentation expert. Given a screenshot of a React component, " "generate a markdown description. Start with a second-level heading containing " "the component name. Then write a short description. Then create a table with " "columns: Prop Name, Type, Default, Description. If you cannot determine a prop, " "write N/A. Output only the markdown." headers = { "Content-Type": "application/json", "Authorization": f"Bearer {api key}" } payload = { "model": "gpt-4o", swap to other models here "messages": { "role": "user", "content": {"type": "text", "text": prompt}, { "type": "image url", "image url": { "url": f"data:image/png;base64,{base64 image}", "detail": "low" } } } , "max tokens": 500 } response = requests.post endpoint, headers=headers, json=payload if response.status code = 200: raise Exception f"API error {response.status code}: {response.text}" return response.json "choices" 0 "message" "content" if name == " main ": if len sys.argv < 2: print "Usage: python screenshot2docs.py