PrologMCP: A Standardized Prolog Tool Interface for LLM Agents

Researchers introduced PrologMCP, an open-source server that exposes Prolog as a stateful tool through the Model Context Protocol, enabling LLM agents to delegate deductive reasoning tasks to a symbolic solver. In evaluations on the PARARULE-Plus dataset, a formalizer agent using PrologMCP achieved near-perfect accuracy (1.00/0.99) on challenging subsets, outperforming reasoning LLMs like Claude Sonnet 4.6 and GPT-4.1. The approach offers a robust, inspectable alternative to extended natural-language reasoning for complex deductive tasks.

arXiv:2606.14935v1 Announce Type: new Abstract: Frontier reasoning-tuned language models still fail on deductive tasks at depth, and the cost of improved performance through extended internal reasoning scales poorly. Symbolic delegation offers a complementary route: a language model translates the problem, while a solver performs the inference. However, current autoformalization pipelines for logic programming are typically bespoke integrations tied to particular tasks or agents. We introduce PrologMCP, a task-agnostic, open-source server that exposes Prolog as a stateful tool through the Model Context Protocol MCP . Its compact tool interface, structured error reporting, and per-session isolation make the translate-run-inspect-repair loop a reusable primitive for MCP-capable agents. We evaluate a formalizer agent enhanced with PrologMCP against standard and reasoning LLMs Claude Sonnet 4.6, GPT-4.1, and o4-mini on two subsets of PARARULE-Plus: a general-purpose sample and a more challenging one targeting a specific failure mode of natural-language reasoning. On the general sample, the formalizer matches or exceeds reasoning LLMs accuracy 1.00 vs.\ 1.00 / 0.998 , with the largest gains over standard models 0.762 for GPT-4.1 . On the challenging subset, the formalizer remains near-perfect 1.00 / 0.99 while reasoning LLMs drop to 0.95 / 0.94. These results suggest that delegating inference to Prolog via MCP is a robust and inspectable alternative to extended natural-language reasoning.