Stop Reading Documentation. Start Reading GitHub Issues.

A developer has created a checklist for mining GitHub issues to extract UX research findings from engineering conversations. The framework involves coding each issue with metadata such as user persona, product stage, and challenge type, turning raw anecdotes into a defensible dataset. The approach claims to reveal patterns invisible to casual reading, such as identifying the highest-priority UX problems by mapping multiple issues to the same stage and challenge type.

Reading the docs isn't enough. The most valuable developer feedback lives inside GitHub issues, bug reports, and feature discussions. In this article, I share the checklist I use to mine repositories and coding frame work for the issue , uncover real developer pain points, and turn engineering conversations into meaningful UX research findings. Reading GitHub issues without a coding framework produces impressions. Coding them produces findings. When you code an issue, you are making an explicit analytical decision about what type of signal it contains. You are saying: "This issue is evidence of a feedback gap at Stage 4 model loading , filed by an ML engineer with a new user experience level, in KServe v0.11, and it directly answers my research question about what information engineers need during model loading that the product currently does not provide." That sentence assembled from your coding decisions is a finding. Multiply it across 100 issues and you have a research study. Without coding, you have 100 anecdotes. With coding, you have a dataset. It makes patterns visible. When you code 20 issues and notice that 14 of them map to the same stage and the same challenge type, you have found your highest-priority UX problem. You would never see that pattern by reading casually. It makes your findings defensible. "Engineers struggle with KServe" is an opinion. "18 issues across v0.10–v0.13 filed by engineers in their first deployment show identical feedback gap patterns at Stage 4, with an average of 14 comments per issue" is a finding. It separates your role from the engineer's role. Engineers read GitHub issues as bug reports. You read them as evidence of design decisions. The coding framework is the analytical lens that makes your UX reading possible. Before any analysis, capture the foundational data for every issue. This establishes the quantitative baseline for your research report. Issue Metadata Checklist ───────────────────────────────────────────── Issue URL / Number e.g. 1234 Issue creation date Issue category Bug | Question | Feature Request | Docs Labels e.g. InferenceService, Control Plane, Kubernetes Current state Open | Closed | Stale | Merged Resolution type Code Fix | Docs Update | Workaround | No resolution Total comment count Unique commenter count Ping-pong count Back-and-forth before root cause found Brief issue summary The "ping-pong count" — the number of back-and-forth diagnostic comments between the user and maintainers before the root cause was identified — is a particularly powerful metric. High ping-pong means the product gave the user no diagnostic signal. That is a UX failure in the product itself. Understanding user demographics is essential in any research project because it tells you whose problems to prioritise. In interviews or surveys, you simply ask. In GitHub mining, you have to infer from signals embedded in the issue itself. In the context of KServe, we are primarily trying to distinguish between three main personas. The language engineers use tells you their world immediately. | Persona | Typical keywords | What they focus on | |---|---|---| Data Scientist / ML Engineer | PyTorch, TensorFlow, HuggingFace, weights, predictor, artifact, S3, inputs/outputs | The model itself — getting a Python script to serve predictions | Platform / DevOps Engineer | CRDs, Istio, Knative, ingress, RBAC, service account, Helm, multi-cluster, HPA | Infrastructure, networking, security, cluster stability | Application Developer | REST API, gRPC, JSON payload, curl, SDK, timeout, 503 error, endpoint | Consuming the model — integrating the endpoint into a larger application | KServe issues typically fall into three abstraction layers: kubectl describe outputs, cluster events, Helm values files, or Istio configurations — they speak in Kubernetes YAML Quick triage question:Is this person treating KServe as aninfrastructure componentor as amodel-delivery tool? Infrastructure → Platform/DevOps. Model-delivery → Data Scientist/ML Engineer. Demographics Checklist ───────────────────────────────────────────── Inferred user persona Data Scientist / ML Engineer Platform / DevOps Engineer Application Developer ML-expert / K8s-novice ← the hardest edge case Unclear Experience level New user first issues, plain English, no version info Experienced provides full env, logs, rules out causes Unclear Deployment environment Local Kind / Minikube Cloud managed EKS / GKE / AKS On-premises / bare metal Deployment scale Single-cluster Multi-cluster Multi-tenant Deployment method Helm | Kustomize | ArgoCD / GitOps | Direct kubectl The hardest edge case:ML-experienced / Kubernetes-novice engineers. They write technically confident issues about model formats or serving runtimes — but are completely confused about Istio or Knative. Always code these as a separate category — they reveal a completely different class of UX failure. The key insight for non-technical UX researchers: you do not need to understand the technical content of an issue to identify its UX signal. The signals are in the language, not the configuration. Here is how to read any issue in under two minutes. Scan for time words before you read anything else. | Time word | Severity | What it means | |---|---|---| | "minutes" | Low | Minor gap, quickly resolved | | "hours" | Medium | Significant friction, real work lost | | "days" | High | Severe friction, deadline impact | | "weeks" / "months" | Critical | Product-level failure | | "gave up" / "switching to X" | Abandonment | User is leaving | Real example: "I've spent the last three days trying to figure out why my model stays in Unknown status." You do not need to know what Unknown status means. "Three days" tells you this is a high-severity finding. Every friction-revealing issue has the word "but" at a specific moment. Everything before "but" is what the engineer did correctly. Everything after is where the product failed them. Real example: "I followed the quickstart exactly but the webhook never became Ready." Before "but" = user followed instructions. After "but" = product gave instructions that led to failure. That is a documentation UX finding, not a technical bug. Real example: "The status shows Ready but every curl request returns 503." This is the "misleading success" friction type, the product declared success when the user's actual goal was completely unmet. One of the most trust-destroying UX failures possible. These phrases directly reveal a mental model gap — the engineer built an incorrect picture of how the system works, and reality contradicted it. Real example: "I thought Transformer meant a language model component like BERT. Turns out it's just data preprocessing." This is not a bug. It is a naming failure. "Transformer" means attention-based neural architecture in the ML world. KServe uses it to mean "a component that preprocesses data before the model." Every ML engineer who encounters this name builds the wrong mental model from it. Real example: "I assumed KServe would track model versions automatically — like a proper ML serving platform should." This is scope confusion — the engineer's expectation of what KServe is does not match what it actually is. That mismatch is a design communication failure, not a user error. Each "I had to" signals a missing workflow step — something the engineer needed that the product should have provided. Real example: "I had to write a polling script to check when the InferenceService became Ready, because there's no built-in wait command." Count the "I had to" chains in an issue. An issue with four of them put four separate manual burdens on an engineer for a task that should have been automated. Scan for capitalized tool names: Knative, Istio, Prometheus, MLflow, Argo, Triton, HuggingFace. Count them. When an engineer writes "sorry if this is a basic question" — that is not politeness. That is evidence the product made a competent person feel responsible for the product's communication failure. Once you have your basic metadata and demographics, here is the full coding structure to apply to every issue. Map every issue to one of 8 stages. This tells you where in the journey the product is losing engineers. Deployment Stage Checklist ───────────────────────────────────────────── Stage 1 · Setup Installing KServe and dependencies Signals: "webhook not ready" · "quickstart fails" Stage 2 · Storage Getting the trained model accessible Signals: "access denied" · "model not found" · "storageUri" Stage 3 · Configuration Writing the InferenceService YAML spec Signals: "minimum config?" · "deprecated field" · "required fields?" Stage 4 · Loading Applying config and waiting for model to load Signals: "Unknown status" · "how long?" · "no logs" · "OOMKilled" Stage 5 · Network Reaching the deployed endpoint Signals: "Ready but 503" · "connection refused" · "EXTERNAL-IP pending" Stage 6 · Inference Sending requests and getting predictions back Signals: "400 error" · "what format?" · "V1 vs V2 protocol" Stage 7 · Hardening Making the deployment production-reliable Signals: "zero downtime update" · "autoscaling conflict" · "SLA" Stage 8 · Day-2 ops Updating, monitoring, governing over time Signals: "rollback" · "update my model" · "60 models across teams" Cross-stage? Root cause in Stage X, discovered at Stage Y delayed discovery = highest severity finding The most important finding:Stages 4 and 5 consistently produce the highest issue volume in K-Serve. The product is completely silent at the two moments when engineers are most anxious and most blind. Usability Challenge Checklist ───────────────────────────────────────────── U1 · Learnability breakdown Cannot figure out how to do the task the first time Signal: "how do I" · already answered in docs · confused by concepts U2 · Error recovery failure Hits error, can't understand it, doesn't know which log to check Signal: pastes cryptic error, "stuck for days", tries random things U3 · Feedback & visibility gap System gives no signal — Unknown, Pending, complete silence Signal: "nothing happens" · "how long should this take?" · "no logs" U4 · Configuration complexity Too many fields, unclear defaults, no minimum viable spec Signal: "is all of this needed?" · "which fields are required?" U5 · Mental model mismatch Expectation contradicts how system actually works Signal: "I expected" · "I thought" · "this makes no sense" U6 · Workaround proliferation User invented their own solution to fill a product gap Signal: "I wrote a script" · "I had to" · shared snippets in comments Developer Friction Checklist ───────────────────────────────────────────── F1 · Invisible wall System silent — nothing to debug F2 · Misleading success "Ready" but goal completely unmet F3 · Hidden prerequisite Required knowledge never communicated until failure F4 · Terminology confusion Word means something different in this context F5 · Broken feedback loop Can't tell if a change had any effect F6 · Forced context switch Must configure Istio/Knative to complete one KServe task F7 · Documentation gap Knows what they want, can't find how to do it F8 · Accumulated friction 5–6 small frictions in sequence → abandonment signal System Challenge Checklist ───────────────────────────────────────────── Ownership ambiguity KServe says "that's Istio", Istio says "that's KServe" Abstraction leakage InferenceService was meant to hide Knative/Istio; it doesn't Observability gap Logs scattered across 4+ components; no unified view Role boundary collision ML engineer task structurally requires platform engineer action Upgrade path fragility Every version upgrade risks production breakage Environmental Challenge Checklist ───────────────────────────────────────────── Managed K8s divergence EKS, GKE Autopilot, OpenShift behave differently Corporate proxy / air-gap No public internet; private registry; air-gapped GPU & hardware OOMKilled, VRAM insufficient, driver mismatch Org security policy OPA, Gatekeeper, PodSecurityAdmission blocking KServe On-premises / hybrid No managed LoadBalancer, NFS storage, bare metal Regulated / compliance HIPAA, SOC2, GDPR, data residency requirements This is what transforms your research from a snapshot into a longitudinal UX health report. Version Tracking Checklist ───────────────────────────────────────────── KServe version exact e.g. 0.11.2 Previous version if upgrade e.g. 0.10 → 0.11 Kubernetes version e.g. 1.27 Cloud provider EKS / GKE / AKS / On-prem / Local Version stated by: User upfront | Maintainer had to ask | Never provided Upgrade experience: Better | Same | Worse | New regression introduced Chronic pain signal: Same issue present in prior version? Yes / No The chronic problem list— friction points that appear in the top-3 across three or more versions — is your most powerful finding. A problem that survived three release cycles is not a bug. It is an architectural decision. LLM Inference Checklist ───────────────────────────────────────────── Is this an LLM issue? Yes | No | Hybrid LLM model family Llama | Mistral | Qwen | Gemma | Custom LLM runtime vLLM | TGI | OpenAI-compatible | Custom Capability attempted: Basic inference Streaming tokens SSE Multi-GPU / tensor parallelism LoRA / adapter serving HuggingFace Hub authentication OpenAI API compatibility Quantisation GPTQ, AWQ LLM-specific challenge: GPU OOM / VRAM insufficient Model loading with no progress signal Streaming failure through gateway HuggingFace auth failing in cluster Runtime version lag behind vLLM/TGI ecosystem No LLM-specific metrics token throughput, TTFT Innovation lag signal: Date of capability request: Date KServe released support: Gap days : For each issue, record which research question it provides evidence for. This anchors your mining to your study goals. Research Question Mapping ───────────────────────────────────────────── Current-state questions what is broken today RQ1 · First deployment challenges across roles and experience levels RQ2 · Workflow gaps between deployed model and reliable production RQ3 · Observability and debugging challenges by stage RQ4 · LLM deployment challenges vs classical ML serving RQ5 · Environmental factors shaping deployment experience RQ6 · How challenges evolved across versions your unique longitudinal contribution RQ7 · Design changes that would most reduce friction UX improvement questions what should be designed differently UX1 · Time-to-first-inference reduction UX3 · Model loading progress visibility highest volume finding UX4 · Self-service diagnostic experience UX9 · LLM mental model bridge vLLM/HuggingFace → KServe UX11 · Environment validator / dependency pre-flight checker Once you spot a pattern across multiple issues, record it here. One template per pattern — not per issue. UX Finding Template ───────────────────────────────────────────────────────────── Finding type: Affected users: Role · Experience level · Version band Deployment stage: Evidence: N issues · Date range · e.g. "14 issues, 2022–2024" Best quote: Under 25 words — your strongest evidence ───────────────────────────────────────────────────────────── UX finding statement: "Engineers doing X cannot accomplish Y because design gap Z , which means impact on time / confidence / adoption ." ───────────────────────────────────────────────────────────── Severity: Low | Medium | High | Critical Chronic? Present across versions Design recommendation: Research question answered: Mining Session Completion Checklist ───────────────────────────────────────────── Issue coverage 50+ general deployment issues coded across all 8 stages 30+ LLM inference issues coded and version-tracked 10+ issues per major version band v0.10, v0.11, v0.12, v0.13+ 15+ upgrade issues with before/after UX delta recorded Top 30 most-commented issues reviewed sort:comments-desc 10+ abandoned issues open 6+ months, last message unanswered 15+ success cases quickly-resolved — positive signal baseline All enhancement/feature-request labels reviewed All competitive tool mentions captured BentoML, Seldon, Ray Serve Baseline measurement Metrics table filled per version band: Issue count | Avg comments | 7-day resolution rate | Emotional language % Top-3 friction points per version — chronic problem list built LLM innovation lag calculated for 5+ capabilities Version reporting rate: % of issues that include version upfront Quality 15% of issues coded by a second researcher inter-rater reliability Every research question has at least 3 issues as evidence One finding statement written per significant pattern found The biggest barrier to studying developer tools as a UX researcher is the assumption that you need to understand the code to understand the problem. This framework removes that barrier entirely. When you code an issue, you are not evaluating the correctness of someone's Kubernetes configuration. You are recording what the issue reveals about the human experience of using the product . "Status shows Unknown for 20 minutes" tells you everything you need regardless of whether you understand what Unknown means technically. The product left a user without feedback during its most critical operation. That is a UX finding independent of any technical knowledge. The coding sheet converts anecdote into pattern. Instead of "users seem to struggle with deployment," you can say "14 of 20 issues sampled from v0.11 show feedback gap failures at Stage 4, with an average of 18 comments per issue, suggesting loading state communication is the highest-priority improvement target for this version band." That is a product roadmap argument. The coding sheet built it. The research questions embedded in the coding sheet map directly to design recommendations with evidence behind them. A contributor who wants to make a meaningful impact on user experience now has specific, evidence-backed targets — not "improve docs" but "add granular status conditions per loading phase that distinguish between 10 failure modes currently all reporting as Unknown." When researchers share findings publicly in CNCF Blogs, KubeCon talks, or articles like this one — the coding framework makes the research reproducible. Other researchers can apply the same checklist to a different version or a different tool and compare results. That cumulative body of evidence is what eventually changes product direction. GitHub issues are not a bug tracker. They are a longitudinal, naturalistic record of where real engineers encounter the gap between what a product promises and what it delivers. A coding sheet is the analytical framework that transforms that record into research. Without it, you are reading. With it, you are studying. The framework I built for KServe — covering demographics, deployment stages, usability challenges, friction types, mental model gaps, system challenges, environmental barriers, version tracking, and LLM inference — did not emerge from theory. It emerged from reading hundreds of issues and asking the same question every time: what is the UX researcher's reading of this, beyond what the engineer sees? The answer is always the same: engineers see symptoms. The coding sheet helps you see the design decisions that caused them. Start with the "but" sentence. Work backwards to the design failure. Code it. Repeat 100 times. Then write the research report. This coding framework was developed as part of a UX research study on ML model deployment in KServe. If you are working on similar research in the cloud-native or MLOps space, I would love to hear your thoughts.