# Jenkins Continues Development of AI Chatbot for Resources

> Source: <https://letsdatascience.com/news/jenkins-continues-development-of-ai-chatbot-for-resources-2674fe2b>
> Published: 2026-05-26 21:49:48.868783+00:00

# Jenkins Continues Development of AI Chatbot for Resources

Mallikarjun G D's Jenkins blog post (May 26, 2026) reports a GSoC 2026 continuation of an AI chatbot plugin embedded in the Jenkins UI, extending the project with three core features: an **LLM-as-a-Judge** evaluation pipeline using a curated golden dataset and DeepEval metrics, a **GraphRAG** layer implemented with NetworkX for plugin-dependency queries, and a Build Failure Diagnosis Agent that strips PII with Presidio before passing sanitized logs to the LLM. Daniele Caldarigi's Jenkins blog post (May 26, 2026) describes a complementary GSoC plugin focused on guiding user workflow, with a React+Vite sidebar, a Jenkins Controller, a FastAPI backend using LangGraph, ChromaDB for vectors, and a choice of a local LLM via Ollama or an external API. Industry context: these posts show community-driven experimentation with RAG, evaluation pipelines, and on-prem/local LLM options within a mature CI/CD tool.

### What happened

Mallikarjun G D's Jenkins blog post (May 26, 2026) documents a GSoC 2026 continuation of an AI chatbot plugin embedded in the **Jenkins** UI, with three stated feature areas: an **LLM-as-a-Judge** evaluation pipeline using a curated golden dataset and **DeepEval** metrics, a **GraphRAG** layer built with **NetworkX** to traverse plugin dependency relationships, and a Build Failure Diagnosis Agent that sanitizes logs with Presidio before sending context to an LLM. Daniele Caldarigi's Jenkins blog post (May 26, 2026) outlines a related GSoC plugin to guide user workflows, describing a frontend implemented with **React+Vite**, a Jenkins Controller, a **FastAPI** backend, LangGraph for agent reasoning, ChromaDB as the vector store, and a configurable LLM hosted locally with Ollama or via an external API.

### Technical details

Editorial analysis - technical context: The combination of a judge-style evaluation pipeline, explicit GraphRAG for dependency-aware retrieval, and a log-diagnosis agent reflects three complementary technical risks and benefits practitioners track when embedding LLMs into developer tooling. Using an evaluation model and **DeepEval** metrics helps create repeatable benchmarks for retrieval and answer quality, which is important for avoiding regressions as embeddings, prompt templates, and retrieval strategies change. Graph traversal with **NetworkX** is a practical approach for dependency queries, but it raises operational questions around graph size, update cadence, and real-time traversal cost. Integrating Presidio for PII stripping demonstrates an attention to data hygiene; practitioners will want to validate redaction effectiveness across varied build logs and formats.

### Context and significance

Industry context: Community-driven projects in major engineering tools increasingly combine RAG, local LLM hosting, and evaluation pipelines to balance privacy, latency, and cost. The modular architecture described in Daniele's post - separating frontend, a controller for auth, and a FastAPI backend - mirrors common patterns that let operators choose where to host ChromaDB and their LLM. For open-source CI/CD ecosystems, these choices matter because they affect deployability in air-gapped or enterprise environments and influence maintenance burden for plugin authors.

### What to watch

- •Evaluation: which judge model and
**DeepEval** metrics the contributors settle on and whether runs are reproducible across hardware. - •GraphRAG scale: how the NetworkX graph is populated and updated as plugin metadata evolves.
- •Data governance: effectiveness of Presidio redaction and policies for indexing external forums (Discourse, Reddit).
- •LLM hosting trade-offs: adoption of local Ollama-hosted models versus third-party APIs and the operational implications for latency and cost.

## Scoring Rationale

This is a notable open-source engineering effort showing practical integration patterns (GraphRAG, evaluation pipelines, PII stripping) relevant to practitioners embedding LLMs in developer tools, but it is not a frontier model or industry-shaking release.

Practice with real FinTech & Trading data

90 SQL & Python problems · 15 industry datasets

[Active Verified Users by Income TierEasy](/problems/sql/active-verified-users-by-income)

[Technology Stocks with High BetaMedium](/problems/sql/technology-stocks-with-high-beta)

[Portfolio Performance ScorecardHard](/problems/sql/portfolio-performance-scorecard)

250 free problems · No credit card

[See all FinTech & Trading problems](/problems/datasets/fintech)
