Knowledge and Memory Management v0.0.2: Portable Knowledge Collection

The v0.0.2 release of the Knowledge and Memory Management system introduces portable knowledge collection and structured memory management. The system standardizes on the $AGENT_HOME environment variable to eliminate hardcoded paths, enabling easy sharing across environments. It features source-specific collectors for web, video, and article content, and a hybrid vector-store and key-value index for memory management.

The v0.0.2 release of the Knowledge and Memory Management system marks a clear shift toward portability and clean separation of concerns. All personal paths have been replaced with the $AGENT HOME environment variable, eliminating hardcoded directory assumptions that plagued v0.0.1. This release focuses on two core pillars: knowledge collection from diverse sources and structured memory management for long-term retention. Why $AGENT HOME Matters Previous versions required manual path configuration per deployment, leading to brittle setups. By standardizing on $AGENT HOME , the system now resolves storage roots at runtime. This makes it trivial to share agent configurations across team members, CI pipelines, and containerized environments. The memory manager, knowledge collectors, and indexing engines all respect this base directory, so everything from raw downloads to vector data stays under one portable root. Knowledge Collection: Web, Video, Articles The collector modules are source-specific but export a consistent interface. For web content, the HTML scraper strips scripts, downloads images with size limits , and extracts readable text using a configurable parser. Video collection relies on transcript APIs—YouTube, Vimeo, and local files—with optional frame extraction for slide-heavy content. Article ingestion handles RSS feeds and direct URLs, applying automatic summarization for long-form pieces. Each collector normalizes metadata title, source URL, timestamp, language and passes the payload to a staging queue. Deduplication runs at the source and memory levels: identical URLs or hashed content are flagged before storage. The collectors also support custom filters—for example, ignoring articles below a word count or videos shorter than 60 seconds. Memory Management: Indexing and Retrieval Memory in v0.0.2 is built on a hybrid vector-store and key-value index. Knowledge entries are chunked, embedded default model: all-MiniLM-L6-v2 , and inserted into an HNSW-based vector database. The key-value index stores metadata and relationships, enabling graph traversal across related items. When a new piece of knowledge arrives, the memory manager checks for semantic similarity with existing entries—if a duplicate is detected, the new data can update the old entry’s timestamps and references instead of creating a duplicate. Retrieval supports both dense vector search and keyword-based filtering. The retrieve method accepts a query string, an optional source filter web , video , article , and a recency window. Results are ranked by cosine similarity and weighted by source freshness. Code Example: Collection Setup The following demonstrates how to configure and use the collectors with $AGENT HOME : python import os from knowledge collector import WebCollector, VideoCollector, ArticleCollector from memory manager import MemoryManager agent home = os.getenv 'AGENT HOME', './data' memory = MemoryManager base path=agent home web = WebCollector memory=memory, dedup=True video = VideoCollector memory=memory, transcript=True article = ArticleCollector memory=memory, min words=300 web.collect 'https://example.com/deep-learning-guide' video.collect 'https://youtube.com/watch?v=dQw4w9WgXcQ' article.collect 'https://blog.example.com/feed' memory.commit flush embeddings and indexes to disk The collect methods validate URLs, download content, normalize it, and push to memory. commit writes all pending vector and key-value updates to the $AGENT HOME storage tree. Performance and Scalability Memory operations are batched: by default, 100 entries trigger an automatic commit, or you can call commit explicitly. The vector database uses mmap for large indexes, so memory overhead stays predictable even with 500k+ entries. The collectors are I/O-bound by design—they respect AGENT HOME for caching downloads and avoiding redundant network requests. Looking Ahead v0.0.2 is a clean foundation. The next minor version will introduce cross-source merging e.g., linking a video transcript to its accompanying article and incremental garbage collection for stale entries. For now, the focus on portable paths and separated collection/memory layers makes this release suitable for production agents that need to learn from the web without environment-specific hacks.