Stop Reading Documentation. Start Reading GitHub Issues.

wpnews.pro

Reading the docs isn't enough. The most valuable developer feedback lives inside GitHub issues, bug reports, and feature discussions. In this article, I share the checklist I use to mine repositories and coding frame work for the issue, uncover real developer pain points, and turn engineering conversations into meaningful UX research findings.

Reading GitHub issues without a coding framework produces impressions. Coding them produces findings.

When you code an issue, you are making an explicit analytical decision about what type of signal it contains. You are saying: "This issue is evidence of a feedback gap at Stage 4 (model ), filed by an ML engineer with a new user experience level, in KServe v0.11, and it directly answers my research question about what information engineers need during model that the product currently does not provide."

That sentence assembled from your coding decisions is a finding. Multiply it across 100 issues and you have a research study.

Without coding, you have 100 anecdotes. With coding, you have a dataset.

It makes patterns visible. When you code 20 issues and notice that 14 of them map to the same stage and the same challenge type, you have found your highest-priority UX problem. You would never see that pattern by reading casually.

It makes your findings defensible. "Engineers struggle with KServe" is an opinion. "18 issues across v0.10–v0.13 filed by engineers in their first deployment show identical feedback gap patterns at Stage 4, with an average of 14 comments per issue" is a finding.

It separates your role from the engineer's role. Engineers read GitHub issues as bug reports. You read them as evidence of design decisions. The coding framework is the analytical lens that makes your UX reading possible.

Before any analysis, capture the foundational data for every issue. This establishes the quantitative baseline for your research report.

Issue Metadata Checklist
─────────────────────────────────────────────
[ ] Issue URL / Number        e.g. #1234
[ ] Issue creation date
[ ] Issue category            Bug | Question | Feature Request | Docs
[ ] Labels                    e.g. InferenceService, Control Plane, Kubernetes
[ ] Current state             Open | Closed | Stale | Merged
[ ] Resolution type           Code Fix | Docs Update | Workaround | No resolution
[ ] Total comment count
[ ] Unique commenter count
[ ] Ping-pong count           Back-and-forth before root cause found
[ ] Brief issue summary

The "ping-pong count" — the number of back-and-forth diagnostic comments between the user and maintainers before the root cause was identified — is a particularly powerful metric. High ping-pong means the product gave the user no diagnostic signal. That is a UX failure in the product itself.

Understanding user demographics is essential in any research project because it tells you whose problems to prioritise. In interviews or surveys, you simply ask. In GitHub mining, you have to infer from signals embedded in the issue itself.

In the context of KServe, we are primarily trying to distinguish between three main personas.

The language engineers use tells you their world immediately.

Persona	Typical keywords	What they focus on
Data Scientist / ML Engineer
PyTorch, TensorFlow, HuggingFace, weights, predictor, artifact, S3, inputs/outputs	The model itself — getting a Python script to serve predictions
Platform / DevOps Engineer
CRDs, Istio, Knative, ingress, RBAC, service account, Helm, multi-cluster, HPA	Infrastructure, networking, security, cluster stability
Application Developer
REST API, gRPC, JSON payload, curl, SDK, timeout, 503 error, endpoint	Consuming the model — integrating the endpoint into a larger application

KServe issues typically fall into three abstraction layers:

kubectl describe

outputs, cluster events, Helm values files, or Istio configurations — they speak in Kubernetes YAML

Quick triage question:Is this person treating KServe as aninfrastructure componentor as amodel-delivery tool? Infrastructure → Platform/DevOps. Model-delivery → Data Scientist/ML Engineer.

Demographics Checklist
─────────────────────────────────────────────
[ ] Inferred user persona
      Data Scientist / ML Engineer
      Platform / DevOps Engineer
      Application Developer
      ML-expert / K8s-novice  ← the hardest edge case
      Unclear

[ ] Experience level
      New user (first issues, plain English, no version info)
      Experienced (provides full env, logs, rules out causes)
      Unclear

[ ] Deployment environment
      Local (Kind / Minikube)
      Cloud managed (EKS / GKE / AKS)
      On-premises / bare metal

[ ] Deployment scale
      Single-cluster
      Multi-cluster
      Multi-tenant

[ ] Deployment method
      Helm | Kustomize | ArgoCD / GitOps | Direct kubectl

The hardest edge case:ML-experienced / Kubernetes-novice engineers. They write technically confident issues about model formats or serving runtimes — but are completely confused about Istio or Knative. Always code these as a separate category — they reveal a completely different class of UX failure.

The key insight for non-technical UX researchers: you do not need to understand the technical content of an issue to identify its UX signal. The signals are in the language, not the configuration.

Here is how to read any issue in under two minutes.

Scan for time words before you read anything else.

Time word	Severity	What it means
"minutes"	Low	Minor gap, quickly resolved
"hours"	Medium	Significant friction, real work lost
"days"	High	Severe friction, deadline impact
"weeks" / "months"	Critical	Product-level failure
"gave up" / "switching to X"	Abandonment	User is leaving

Real example: "I've spent the last three days trying to figure out why my model stays in Unknown status."

You do not need to know what Unknown status means. "Three days" tells you this is a high-severity finding.

Every friction-revealing issue has the word "but" at a specific moment. Everything before "but" is what the engineer did correctly. Everything after is where the product failed them.

Real example: "I followed the quickstart exactly but the webhook never became Ready."

Before "but" = user followed instructions. After "but" = product gave instructions that led to failure. That is a documentation UX finding, not a technical bug.

Real example: "The status shows Ready but every curl request returns 503."

This is the "misleading success" friction type, the product declared success when the user's actual goal was completely unmet. One of the most trust-destroying UX failures possible.

These phrases directly reveal a mental model gap — the engineer built an incorrect picture of how the system works, and reality contradicted it.

Real example: "I thought Transformer meant a language model component like BERT. Turns out it's just data preprocessing."

This is not a bug. It is a naming failure. "Transformer" means attention-based neural architecture in the ML world. KServe uses it to mean "a component that preprocesses data before the model." Every ML engineer who encounters this name builds the wrong mental model from it.

Real example: "I assumed KServe would track model versions automatically — like a proper ML serving platform should."

This is scope confusion — the engineer's expectation of what KServe is does not match what it actually is. That mismatch is a design communication failure, not a user error.

Each "I had to" signals a missing workflow step — something the engineer needed that the product should have provided.

Real example: "I had to write a polling script to check when the InferenceService became Ready, because there's no built-in wait command."

Count the "I had to" chains in an issue. An issue with four of them put four separate manual burdens on an engineer for a task that should have been automated.

Scan for capitalized tool names: Knative, Istio, Prometheus, MLflow, Argo, Triton, HuggingFace. Count them.

When an engineer writes "sorry if this is a basic question" — that is not politeness. That is evidence the product made a competent person feel responsible for the product's communication failure.

Once you have your basic metadata and demographics, here is the full coding structure to apply to every issue.

Map every issue to one of 8 stages. This tells you where in the journey the product is losing engineers.

Deployment Stage Checklist
─────────────────────────────────────────────
[ ] Stage 1 · Setup         Installing KServe and dependencies
                             Signals: "webhook not ready" · "quickstart fails"

[ ] Stage 2 · Storage       Getting the trained model accessible
                             Signals: "access denied" · "model not found" · "storageUri"

[ ] Stage 3 · Configuration Writing the InferenceService YAML spec
                             Signals: "minimum config?" · "deprecated field" · "required fields?"

[ ] Stage 4 ·        Applying config and waiting for model to load
                             Signals: "Unknown status" · "how long?" · "no logs" · "OOMKilled"

[ ] Stage 5 · Network       Reaching the deployed endpoint
                             Signals: "Ready but 503" · "connection refused" · "EXTERNAL-IP pending"

[ ] Stage 6 · Inference     Sending requests and getting predictions back
                             Signals: "400 error" · "what format?" · "V1 vs V2 protocol"

[ ] Stage 7 · Hardening     Making the deployment production-reliable
                             Signals: "zero downtime update" · "autoscaling conflict" · "SLA"

[ ] Stage 8 · Day-2 ops     Updating, monitoring, governing over time
                             Signals: "rollback" · "update my model" · "60 models across teams"

[ ] Cross-stage?            Root cause in Stage X, discovered at Stage Y
                             (delayed discovery = highest severity finding)

The most important finding:Stages 4 and 5 consistently produce the highest issue volume in K-Serve. The product is completely silent at the two moments when engineers are most anxious and most blind.

Usability Challenge Checklist
─────────────────────────────────────────────
[ ] U1 · Learnability breakdown
         Cannot figure out how to do the task the first time
         Signal: "how do I" · already answered in docs · confused by concepts

[ ] U2 · Error recovery failure
         Hits error, can't understand it, doesn't know which log to check
         Signal: pastes cryptic error, "stuck for days", tries random things

[ ] U3 · Feedback & visibility gap
         System gives no signal — Unknown, Pending, complete silence
         Signal: "nothing happens" · "how long should this take?" · "no logs"

[ ] U4 · Configuration complexity
         Too many fields, unclear defaults, no minimum viable spec
         Signal: "is all of this needed?" · "which fields are required?"

[ ] U5 · Mental model mismatch
         Expectation contradicts how system actually works
         Signal: "I expected" · "I thought" · "this makes no sense"

[ ] U6 · Workaround proliferation
         User invented their own solution to fill a product gap
         Signal: "I wrote a script" · "I had to" · shared snippets in comments
Developer Friction Checklist
─────────────────────────────────────────────
[ ] F1 · Invisible wall        System silent — nothing to debug
[ ] F2 · Misleading success    "Ready" but goal completely unmet
[ ] F3 · Hidden prerequisite   Required knowledge never communicated until failure
[ ] F4 · Terminology confusion Word means something different in this context
[ ] F5 · Broken feedback loop  Can't tell if a change had any effect
[ ] F6 · Forced context switch Must configure Istio/Knative to complete one KServe task
[ ] F7 · Documentation gap     Knows what they want, can't find how to do it
[ ] F8 · Accumulated friction  5–6 small frictions in sequence → abandonment signal
System Challenge Checklist
─────────────────────────────────────────────
[ ] Ownership ambiguity    KServe says "that's Istio", Istio says "that's KServe"
[ ] Abstraction leakage    InferenceService was meant to hide Knative/Istio; it doesn't
[ ] Observability gap      Logs scattered across 4+ components; no unified view
[ ] Role boundary collision ML engineer task structurally requires platform engineer action
[ ] Upgrade path fragility  Every version upgrade risks production breakage

Environmental Challenge Checklist
─────────────────────────────────────────────
[ ] Managed K8s divergence   EKS, GKE Autopilot, OpenShift behave differently
[ ] Corporate proxy / air-gap No public internet; private registry; air-gapped
[ ] GPU & hardware           OOMKilled, VRAM insufficient, driver mismatch
[ ] Org security policy      OPA, Gatekeeper, PodSecurityAdmission blocking KServe
[ ] On-premises / hybrid     No managed LoadBalancer, NFS storage, bare metal
[ ] Regulated / compliance   HIPAA, SOC2, GDPR, data residency requirements

This is what transforms your research from a snapshot into a longitudinal UX health report.

Version Tracking Checklist
─────────────────────────────────────────────
[ ] KServe version (exact)         e.g. 0.11.2
[ ] Previous version (if upgrade)  e.g. 0.10 → 0.11
[ ] Kubernetes version             e.g. 1.27
[ ] Cloud provider                 EKS / GKE / AKS / On-prem / Local
[ ] Version stated by:             User upfront | Maintainer had to ask | Never provided
[ ] Upgrade experience:            Better | Same | Worse | New regression introduced
[ ] Chronic pain signal:           Same issue present in prior version? Yes / No

The chronic problem list— friction points that appear in the top-3 across three or more versions — is your most powerful finding. A problem that survived three release cycles is not a bug. It is an architectural decision.

LLM Inference Checklist
─────────────────────────────────────────────
[ ] Is this an LLM issue?          Yes | No | Hybrid
[ ] LLM model family               Llama | Mistral | Qwen | Gemma | Custom
[ ] LLM runtime                    vLLM | TGI | OpenAI-compatible | Custom
[ ] Capability attempted:
      Basic inference
      Streaming tokens (SSE)
      Multi-GPU / tensor parallelism
      LoRA / adapter serving
      HuggingFace Hub authentication
      OpenAI API compatibility
      Quantisation (GPTQ, AWQ)

[ ] LLM-specific challenge:
      GPU OOM / VRAM insufficient
      Model  with no progress signal
      Streaming failure through gateway
      HuggingFace auth failing in cluster
      Runtime version lag behind vLLM/TGI ecosystem
      No LLM-specific metrics (token throughput, TTFT)

[ ] Innovation lag signal:
      Date of capability request: ___________
      Date KServe released support: ___________
      Gap (days): ___________

For each issue, record which research question it provides evidence for. This anchors your mining to your study goals.

Research Question Mapping
─────────────────────────────────────────────
Current-state questions (what is broken today)
[ ] RQ1 · First deployment challenges across roles and experience levels
[ ] RQ2 · Workflow gaps between deployed model and reliable production
[ ] RQ3 · Observability and debugging challenges by stage
[ ] RQ4 · LLM deployment challenges vs classical ML serving
[ ] RQ5 · Environmental factors shaping deployment experience
[ ] RQ6 · How challenges evolved across versions (your unique longitudinal contribution)
[ ] RQ7 · Design changes that would most reduce friction

UX improvement questions (what should be designed differently)
[ ] UX1 · Time-to-first-inference reduction
[ ] UX3 · Model  progress visibility (highest volume finding)
[ ] UX4 · Self-service diagnostic experience
[ ] UX9 · LLM mental model bridge (vLLM/HuggingFace → KServe)
[ ] UX11 · Environment validator / dependency pre-flight checker

Once you spot a pattern across multiple issues, record it here. One template per pattern — not per issue.

UX Finding Template
─────────────────────────────────────────────────────────────
Finding type:       ___________________________________________
Affected users:     Role · Experience level · Version band
Deployment stage:   ___________________________________________
Evidence:           N issues · Date range · e.g. "14 issues, 2022–2024"
Best quote:         Under 25 words — your strongest evidence
─────────────────────────────────────────────────────────────
UX finding statement:
"Engineers [doing X] cannot [accomplish Y] because [design gap Z],
 which means [impact on time / confidence / adoption]."
─────────────────────────────────────────────────────────────
Severity:           Low | Medium | High | Critical
Chronic?            Present across ___ versions
Design recommendation: ________________________________________
Research question answered: ___________________________________
Mining Session Completion Checklist
─────────────────────────────────────────────
Issue coverage
[ ] 50+ general deployment issues coded across all 8 stages
[ ] 30+ LLM inference issues coded and version-tracked
[ ] 10+ issues per major version band (v0.10, v0.11, v0.12, v0.13+)
[ ] 15+ upgrade issues with before/after UX delta recorded
[ ] Top 30 most-commented issues reviewed (sort:comments-desc)
[ ] 10+ abandoned issues (open 6+ months, last message unanswered)
[ ] 15+ success cases (quickly-resolved — positive signal baseline)
[ ] All enhancement/feature-request labels reviewed
[ ] All competitive tool mentions captured (BentoML, Seldon, Ray Serve)

Baseline measurement
[ ] Metrics table filled per version band:
      Issue count | Avg comments | 7-day resolution rate | Emotional language %
[ ] Top-3 friction points per version — chronic problem list built
[ ] LLM innovation lag calculated for 5+ capabilities
[ ] Version reporting rate: % of issues that include version upfront

Quality
[ ] 15% of issues coded by a second researcher (inter-rater reliability)
[ ] Every research question has at least 3 issues as evidence
[ ] One finding statement written per significant pattern found

The biggest barrier to studying developer tools as a UX researcher is the assumption that you need to understand the code to understand the problem. This framework removes that barrier entirely.

When you code an issue, you are not evaluating the correctness of someone's Kubernetes configuration. You are recording what the issue reveals about the human experience of using the product. "Status shows Unknown for 20 minutes" tells you everything you need regardless of whether you understand what Unknown means technically. The product left a user without feedback during its most critical operation. That is a UX finding independent of any technical knowledge.

The coding sheet converts anecdote into pattern. Instead of "users seem to struggle with deployment," you can say "14 of 20 issues sampled from v0.11 show feedback gap failures at Stage 4, with an average of 18 comments per issue, suggesting state communication is the highest-priority improvement target for this version band."

That is a product roadmap argument. The coding sheet built it.

The research questions embedded in the coding sheet map directly to design recommendations with evidence behind them. A contributor who wants to make a meaningful impact on user experience now has specific, evidence-backed targets — not "improve docs" but "add granular status conditions per phase that distinguish between 10 failure modes currently all reporting as Unknown."

When researchers share findings publicly in CNCF Blogs, KubeCon talks, or articles like this one — the coding framework makes the research reproducible. Other researchers can apply the same checklist to a different version or a different tool and compare results. That cumulative body of evidence is what eventually changes product direction.

GitHub issues are not a bug tracker. They are a longitudinal, naturalistic record of where real engineers encounter the gap between what a product promises and what it delivers.

A coding sheet is the analytical framework that transforms that record into research. Without it, you are reading. With it, you are studying.

The framework I built for KServe — covering demographics, deployment stages, usability challenges, friction types, mental model gaps, system challenges, environmental barriers, version tracking, and LLM inference — did not emerge from theory. It emerged from reading hundreds of issues and asking the same question every time: what is the UX researcher's reading of this, beyond what the engineer sees?

The answer is always the same: engineers see symptoms. The coding sheet helps you see the design decisions that caused them.

Start with the "but" sentence. Work backwards to the design failure. Code it. Repeat 100 times. Then write the research report.

This coding framework was developed as part of a UX research study on ML model deployment in KServe. If you are working on similar research in the cloud-native or MLOps space, I would love to hear your thoughts.

source & further reading

dev.to — original article The OpenAI and Hugging Face Incident Was an Agent Boundary Failure LLM, AI, Are you truly getting behind??? Six open-source pieces, one JavaScript agent stack

Stop Reading Documentation. Start Reading GitHub Issues.

Run your AI side-project on zahid.host