{"slug": "how-i-automated-markdown-docs-from-ui-screenshots-using-ai", "title": "How I automated markdown docs from UI screenshots using AI", "summary": "A developer built a Python script that converts UI screenshots into markdown documentation using any OpenAI-compatible AI model. The script, which is model-agnostic and self-hostable, was created to automate documenting a React component library with 40+ components. It avoids vendor lock-in and high costs by allowing users to plug in different AI endpoints.", "body_md": "Last month I was knee-deep in documenting a React component library I’d been building for six months. The library had 40+ components, each with 5–10 props, and I wanted to show actual UI screenshots alongside code examples. Taking those screenshots manually was a drag — but so was writing alt text and prop tables from scratch.\n\nI thought: surely there’s a tool that turns a screenshot into a markdown snippet with the component name, props, and description. So I went hunting.\n\nFirst, I tried the obvious: OCR + regex. Take a screenshot, run Tesseract, then parse the text for component names and props. That failed miserably because:\n\nNext, I looked at cloud-based AI documentation generators. Most required me to upload my entire component library, integrate with their SDK, and pay per component. I didn’t want vendor lock-in. I also didn’t want to share my codebase with a third party just to get docs.\n\nThen I tried a public multimodal model API like OpenAI’s GPT-4o. It worked — but the cost stacked up fast when processing 40+ screenshots multiple times during iteration. Plus, managing API keys and tokens for every teammate became a mess.\n\nI needed something cheap, self-hostable, and flexible. The idea was: write a small Python script that reads a screenshot file, sends it to any AI model that accepts images, and returns structured markdown. The script itself is the star — the AI endpoint is just a pluggable option.\n\nHere’s the approach:\n\nThe key is that the same script works with OpenAI, Claude, local models via Ollama, or even a custom endpoint like the one at `ai.interwestinfo.com`\n\n(I tried it as a fallback). The technique is model-agnostic.\n\n``` bash\n#!/usr/bin/env python3\n\"\"\"\nScreenshot to Markdown documentation generator.\nWorks with any OpenAI-compatible API.\n\"\"\"\n\nimport os\nimport sys\nimport base64\nimport requests\nfrom pathlib import Path\n\ndef encode_image(image_path):\n    with open(image_path, \"rb\") as f:\n        return base64.b64encode(f.read()).decode(\"utf-8\")\n\ndef image_to_markdown(image_path, api_key, endpoint=\"https://api.openai.com/v1/chat/completions\"):\n    \"\"\"Convert an image to markdown via an AI model.\"\"\"\n    base64_image = encode_image(image_path)\n\n    prompt = (\n        \"You are a UI documentation expert. Given a screenshot of a React component, \"\n        \"generate a markdown description. Start with a second-level heading containing \"\n        \"the component name. Then write a short description. Then create a table with \"\n        \"columns: Prop Name, Type, Default, Description. If you cannot determine a prop, \"\n        \"write N/A. Output only the markdown.\"\n    )\n\n    headers = {\n        \"Content-Type\": \"application/json\",\n        \"Authorization\": f\"Bearer {api_key}\"\n    }\n\n    payload = {\n        \"model\": \"gpt-4o\",  # swap to other models here\n        \"messages\": [\n            {\n                \"role\": \"user\",\n                \"content\": [\n                    {\"type\": \"text\", \"text\": prompt},\n                    {\n                        \"type\": \"image_url\",\n                        \"image_url\": {\n                            \"url\": f\"data:image/png;base64,{base64_image}\",\n                            \"detail\": \"low\"\n                        }\n                    }\n                ]\n            }\n        ],\n        \"max_tokens\": 500\n    }\n\n    response = requests.post(endpoint, headers=headers, json=payload)\n    if response.status_code != 200:\n        raise Exception(f\"API error {response.status_code}: {response.text}\")\n\n    return response.json()[\"choices\"][0][\"message\"][\"content\"]\n\nif __name__ == \"__main__\":\n    if len(sys.argv) < 2:\n        print(\"Usage: python screenshot2docs.py <image.png>\")\n        sys.exit(1)\n\n    image_path = sys.argv[1]\n    if not Path(image_path).exists():\n        print(f\"File not found: {image_path}\")\n        sys.exit(1)\n\n    api_key = os.getenv(\"AI_API_KEY\")\n    if not api_key:\n        print(\"Set AI_API_KEY environment variable.\")\n        sys.exit(1)\n\n    md = image_to_markdown(image_path, api_key)\n    # Save to a file with same name but .md extension\n    out_path = Path(image_path).with_suffix(\".md\")\n    out_path.write_text(md)\n    print(f\"Documentation saved to {out_path}\")\n```\n\n`requests`\n\n(`pip install requests`\n\n).`AI_API_KEY`\n\nenvironment variable (e.g., OpenAI key, or any compatible endpoint key).`python screenshot2docs.py button.png`\n\n`button.md`\n\nto fix any errors.This approach is lightweight, but it’s not perfect. Let me be honest:\n\n`ThreadPoolExecutor`\n\n.I’d build a small web frontend where I can drag & drop screenshots, see the generated markdown inline, and edit it before saving. The script works for batch, but interactivity helps with review. I’d also add a “model selector” dropdown to switch between endpoints on the fly.\n\nAlso, I’d write a deduplication layer: if two component variants look similar (e.g., primary/secondary buttons), the second generation tends to copy the first. Better to hash the image and check cache first.\n\nAutomating documentation from screenshots saved me about 10 hours for this library. The technique of using a generic AI multimodal endpoint to generate structured data from images is reusable beyond docs — you could do it for design handoff specs, bug report screenshots, or auto-generating alt text for your blog.\n\nNow I’d love to hear: **What’s your go-to method for generating docs from visuals? Have you tried a similar image-to-markdown pipeline, or do you have a completely different workflow?**", "url": "https://wpnews.pro/news/how-i-automated-markdown-docs-from-ui-screenshots-using-ai", "canonical_source": "https://dev.to/__c1b9e06dc90a7e0a676b/how-i-automated-markdown-docs-from-ui-screenshots-using-ai-obg", "published_at": "2026-06-14 13:04:16+00:00", "updated_at": "2026-06-14 13:40:59.412679+00:00", "lang": "en", "topics": ["developer-tools", "artificial-intelligence", "large-language-models", "computer-vision"], "entities": ["OpenAI", "GPT-4o", "Claude", "Ollama", "Tesseract", "React"], "alternates": {"html": "https://wpnews.pro/news/how-i-automated-markdown-docs-from-ui-screenshots-using-ai", "markdown": "https://wpnews.pro/news/how-i-automated-markdown-docs-from-ui-screenshots-using-ai.md", "text": "https://wpnews.pro/news/how-i-automated-markdown-docs-from-ui-screenshots-using-ai.txt", "jsonld": "https://wpnews.pro/news/how-i-automated-markdown-docs-from-ui-screenshots-using-ai.jsonld"}}