{"slug": "rlhf-for-pkms", "title": "RLHF for PKMs", "summary": "Marco Porcellato proposes applying Reinforcement Learning from Human Feedback (RLHF) to Personal Knowledge Graphs (PKMs) to dynamically weight nodes based on user interactions. The system, being developed for Matryca Brain, uses implicit signals like transclusion and focus mode, plus explicit upvotes/downvotes, to surface high-value ideas and decay obsolete ones. A reference implementation in Python demonstrates updating node weights and applying temporal decay.", "body_md": "# RFC: Implicit and Explicit Reinforcement Learning from Human Feedback (RLHF) for Personal Knowledge Graphs, PKMs and BKMs (Business Knowledge Management systems)\n\n**Date:** June 21, 2026 (Summer Solstice)\n**Status:** Request for Comments / Conceptual Framework\n**Author:** [Marco Porcellato / MarcoPorcellato]\n**Context:** This architecture is currently being researched and developed for the core engine of **Matryca Brain**, but is hereby released to the open-source community to foster experimentation in the Personal and Business Knowledge Management (PKM and BKM) space.\n\nCurrent generation Personal Knowledge Management (PKM) tools like Obsidian, Logseq, and Roam Research are built on \"flat\" graphs. Every node (note or block) holds a static weight of `1.0`\n\n. Search algorithms rely entirely on text frequency (BM25) or static semantic proximity (Vector Embeddings).\n\nHowever, human memory and cognition do not work this way. Some ideas are foundational pillars; others are fleeting thoughts or deprecated drafts.\n\nThis document proposes a new paradigm: **Applying Reinforcement Learning from Human Feedback (RLHF) to Personal Knowledge Graphs**. By treating nodes like neurons and interactions as synaptic reinforcements, the PKM learns to surface high-value thoughts and naturally decay obsolete ones, without forcing the user to manually rate their notes.\n\nIn this proposed architecture, every block or node in the graph database receives a dynamic `weight`\n\n(salience) attribute.\n\nThe system passively listens to the user's natural workflow to increase node weights:\n\n**Transclusion / Block References:** If Node A is embedded into Node B (e.g.,`((uuid))`\n\n), it strongly signals that Node A is foundational. (+0.5 weight)**Focus Mode / Zooming:** Clicking into a block to isolate its sub-tree indicates active work/review. (+0.1 weight)**AI Context Usage:** If a block is repeatedly selected to feed the context window of a local LLM agent. (+0.2 weight)\n\n**Search Scroll-past:** If a user searches \"Machine Learning\", the system returns 5 results, and the user clicks the 4th result, the unclicked top 3 results receive a micro-penalty for that specific context.**Temporal Decay (Forgetting Curve):** Nodes that have not been read, modified, or linked in X months undergo a logarithmic weight decay, mimicking human \"retrieval-induced forgetting\". They are never deleted, but they sink to the bottom of global searches.\n\nSimilar to LLM outputs, users can explicitly upvote/downvote specific blocks in the UI, marking them as \"Core\" or \"Deprecated/Erratum\", instantly altering the retrieval heuristic.\n\nTo ground this concept, here is the reference implementation logic we use when applying this to a generic Graph Database or relational index.\n\n``` python\nimport time\nimport math\n\nclass KnowledgeGraphRLHF:\n    def __init__(self, db_connection):\n        self.db = db_connection  # Generic Graph or SQLite FTS bridge\n\n    def apply_interaction_reward(self, node_uuid: str, interaction_type: str):\n        \"\"\"Updates the synaptic weight of a node based on implicit RLHF.\"\"\"\n        rewards = {\n            \"transclusion\": 0.50,\n            \"focus_zoom\": 0.10,\n            \"ai_context_inclusion\": 0.20,\n            \"explicit_upvote\": 1.00,\n            \"explicit_downvote\": -2.00\n        }\n        \n        reward = rewards.get(interaction_type, 0.0)\n        if reward == 0:\n            return\n            \n        # Update logic in a generic Graph/Relational structure\n        query = \"\"\"\n            UPDATE nodes \n            SET weight = weight + ?, last_interacted_at = ?\n            WHERE uuid = ?\n        \"\"\"\n        self.db.execute(query, (reward, time.time(), node_uuid))\n\n    def calculate_decay(self, node_uuid: str, current_time: float) -> float:\n        \"\"\"Applies a logarithmic temporal decay to unused nodes.\"\"\"\n        node = self.db.execute(\"SELECT weight, last_interacted_at FROM nodes WHERE uuid = ?\", (node_uuid,))\n        days_since_interaction = (current_time - node.last_interacted_at) / 86400\n        \n        if days_since_interaction < 30:\n            return node.weight\n            \n        # Logarithmic decay formula\n        decay_factor = math.log10(days_since_interaction / 30 + 1)\n        new_weight = max(0.1, node.weight - decay_factor)\n        return new_weight\n```\n\nWhen querying the knowledge base (e.g., via Cmd+K global search), the ranking algorithm is no longer pure text matching. It becomes a composite score: Final_Score = (BM25_Text_Rank OR Vector_Similarity) * Node_Weight This ensures that heavily used paradigms naturally float to the top of the user's workflow.\n\nThis conceptual framework, architecture, and accompanying pseudo-code are released under the **Apache License 2.0**.\nMy goal is to push the boundaries of the PKM ecosystem. You are free to implement this Implicit RLHF paradigm in your own tools, Obsidian plugins, Logseq forks, or standalone applications.\n**Requirement (NOTICE):** If you incorporate this architecture or core logic into your software, the Apache 2.0 license requires you to include the accompanying NOTICE file in your repository, providing clear attribution to this original RFC and referencing its origin from the **Matryca Brain** research initiative.", "url": "https://wpnews.pro/news/rlhf-for-pkms", "canonical_source": "https://gist.github.com/MarcoPorcellato/9e5226408c56048b16957771f9056e28", "published_at": "2026-06-21 11:54:39+00:00", "updated_at": "2026-06-26 07:33:20.324047+00:00", "lang": "en", "topics": ["machine-learning", "artificial-intelligence", "developer-tools"], "entities": ["Marco Porcellato", "Matryca Brain", "Obsidian", "Logseq", "Roam Research"], "alternates": {"html": "https://wpnews.pro/news/rlhf-for-pkms", "markdown": "https://wpnews.pro/news/rlhf-for-pkms.md", "text": "https://wpnews.pro/news/rlhf-for-pkms.txt", "jsonld": "https://wpnews.pro/news/rlhf-for-pkms.jsonld"}}