{"slug": "your-django-app-has-years-of-data-here-s-how-to-make-ai-agents-actually-use-it", "title": "Your Django App Has Years of Data. Here's How to Make AI Agents Actually Use It.", "summary": "A developer built `django-graph-search`, a library that enables AI agents to query live Django ORM data without custom ETL pipelines or separate vector stores. The tool traverses relational database graphs to create semantically rich vector documents from existing models, configured through a single `settings.py` file with automatic index updates via `post_save` signals.", "body_md": "You have a Django app with years of data — products, articles, orders, users. Your users type natural language queries into a search box and get either nothing or keyword-matched garbage.\n\nWorse: you want to connect an AI agent that can answer questions about this data. But everything is locked inside relational tables. To feed it to an LLM, you either dump the database, write custom ETL pipelines, or stand up a separate vector store and manually keep it in sync with your ORM.\n\nI ran into exactly this problem. So I built a library that solves it with one config file.\n\nClassic Django search solutions solve *one* problem — search. But they all come with a price:\n\n| Solution | What You Need | Why It Hurts |\n|---|---|---|\n| Haystack + Elasticsearch | Separate server, manual field mapping | Mountains of boilerplate, no semantics |\n`django.contrib.postgres.search` |\nPostgreSQL only | Exact match, no meaning |\n| Custom RAG pipeline | Export scripts, pandas, numpy | Data goes stale, no ORM connection |\n\nNone of them answer the real question: **How do I make an AI agent work with live data from my Django app without rewriting the architecture?**\n\n`django-graph-search`\n\ndoesn't just vectorize individual model fields. It **traverses your ORM relation graph** and builds a rich document that captures the full context of an object.\n\nTake a simple `Product`\n\nmodel:\n\n```\nProduct(pk=42)\n├── name: \"Pixel 8\"\n├── description: \"Camera-first Android phone with Tensor G3\"\n├── category → Category.name: \"Smartphones\"          ← FK\n├── tags → [Tag.name: \"android\", \"5G\", \"camera\"]     ← M2M\n└── brand → Brand.description: \"Google hardware...\"  ← FK depth=2\n         └── country → Country.name: \"USA\"           ← depth=2\n```\n\nAll of this gets merged into **one text document**, which is passed to the embedding model. The resulting vector is semantically rich — it carries information not just about the object itself, but about its entire relational context.\n\nThis is done by `GraphResolver`\n\n, which recursively walks `_meta.get_fields()`\n\n, handles FK, M2M, and reverse relations, and tracks cycles via a `visited`\n\nset.\n\nThe most important part: all of this is added on top of your **existing** Django application. You don't change models, don't create new tables, don't touch your views.\n\nEverything lives in `settings.py`\n\n:\n\n```\nINSTALLED_APPS = [\n    ...,\n    \"django_graph_search\",\n]\n\nGRAPH_SEARCH = {\n    \"MODELS\": [\n        {\n            \"model\": \"shop.Product\",\n            \"fields\": [\"name\", \"description\", \"category__name\", \"tags__name\"],\n            \"follow_relations\": True,\n            \"relation_depth\": 2,\n        },\n    ],\n    \"VECTOR_STORE\": {\n        \"BACKEND\": \"django_graph_search.backends.ChromaDBBackend\",\n        \"OPTIONS\": {\"persist_directory\": \"vector_db\"},\n    },\n    \"EMBEDDINGS\": {\n        \"default\": {\n            \"BACKEND\": \"django_graph_search.embeddings.SentenceTransformerBackend\",\n            # Multilingual — works with Russian, English, German, etc.\n            \"MODEL_NAME\": \"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2\",\n        }\n    },\n    \"DELTA_INDEXING\": True,\n}\n```\n\nOne command and your vector index is built:\n\n```\npython manage.py build_search_index\n```\n\nYour Django app runs exactly as before. Data stays in PostgreSQL. Vectors live in ChromaDB alongside. `post_save`\n\nsignals keep the index updated automatically whenever objects change.\n\nNot everything in your database should end up in a vector. Technical fields like `slug`\n\n, `created_at`\n\n, internal admin notes — these are noise that degrades search quality.\n\n`weight_fields`\n\ngives you precise control:\n\n```\n\"weight_fields\": {\n    \"title\": 2.0,         # repeated twice — embedding \"remembers\" it more\n    \"description\": 1.0,   # standard weight\n    \"internal_note\": 0.0, # weight 0.0 = completely excluded from the vector\n    \"slug\": 0.0,\n}\n```\n\nThis isn't just filtering. Under the hood, the `GraphResolver._apply_weight()`\n\nmethod repeats text fragments in the document proportionally to their weight. A field with weight `2.0`\n\nappears twice in the concatenated string, shifting the embedding centroid toward that concept in vector space.\n\nNow for the main point. Here's how this connects to AI agents.\n\nThe standard RAG pattern (Retrieval-Augmented Generation) requires: get user query → retrieve relevant context → pass to LLM. Normally \"retrieve context\" means separate infrastructure. With `django-graph-search`\n\n, it's three lines:\n\n``` python\nfrom django_graph_search import search\n\ndef build_llm_context(user_question: str) -> str:\n    results = search(\n        user_question,\n        models=[\"shop.Product\", \"blog.Article\"],\n        limit=5\n    )\n    # Each result contains \"text\" — the full indexed document\n    # and \"score\" — cosine similarity from 0.0 to 1.0\n    context = \"\\n\\n\".join(\n        r[\"text\"] for r in results if r[\"score\"] > 0.7\n    )\n    return context\n\n# Pass context to the system prompt of any LLM\n```\n\nKey detail: `results[*][\"text\"]`\n\nis not just `str(instance)`\n\n. It's the full text document used to build the vector — with all related fields, all weights applied. The LLM receives **rich, relational context**, not just an object name.\n\nFor more demanding use cases, the library ships an optional LangGraph pipeline. It adds:\n\n```\nGRAPH_SEARCH = {\n    \"LANGGRAPH\": {\n        \"ENABLED\": True,\n        \"QUERY_EXPANSION\": True,\n        \"RERANKING\": True,\n        \"MAX_EXPANDED_QUERIES\": 3,\n        \"FALLBACK_ON_ERROR\": True,  # if LLM fails, falls back to pure vector search\n        \"LLM\": {\n            \"BACKEND\": \"myapp.llm.OllamaBackend\",  # plug in your own Ollama/vLLM/OpenAI\n        },\n    },\n}\n```\n\nImportant: `langgraph`\n\nis an **optional dependency**. If the package is not installed, the pipeline automatically falls back to a built-in `_FallbackGraph`\n\nwith identical behavior. Your application code doesn't change.\n\nThe conversational endpoint adds stateful memory between requests:\n\n```\nUser:  \"show me smartphones under $500\"\nAgent: [returns list]\nUser:  \"only Samsung\"\nAgent: [understands previous context, interpreted_query = \"Samsung smartphones under $500\"]\n```\n\nMigrating between vector backends is just changing one line:\n\n```\nDev:        ChromaDB  → local, no server, files on disk\nStaging:    FAISS     → fast, CPU-only, everything in memory\nProduction: pgvector  → if you already have PostgreSQL — no new server at all\n            Qdrant    → if you need filtering and horizontal scaling\n```\n\nEmbedding models are swappable the same way: local `sentence-transformers`\n\n, OpenAI, Cohere — all through `EMBEDDINGS.BACKEND`\n\n. Zero code changes.\n\n`django-graph-search`\n\nis not just another search library. It's a **vector layer on top of your existing Django ORM** that:\n\n```\npip install django-graph-search[chromadb]\n```\n\nYour data is already there. It just needs a vector layer.\n\n*GitHub: svalench/django_graph_search · PyPI: django-graph-search*", "url": "https://wpnews.pro/news/your-django-app-has-years-of-data-here-s-how-to-make-ai-agents-actually-use-it", "canonical_source": "https://dev.to/valenchits/your-django-app-has-years-of-data-heres-how-to-make-ai-agents-actually-use-it-200d", "published_at": "2026-06-06 20:19:00+00:00", "updated_at": "2026-06-06 20:41:55.257400+00:00", "lang": "en", "topics": ["ai-agents", "ai-tools", "ai-products", "natural-language-processing", "large-language-models"], "entities": ["Django", "Haystack", "Elasticsearch", "PostgreSQL", "LLM", "Pixel 8", "Tensor G3"], "alternates": {"html": "https://wpnews.pro/news/your-django-app-has-years-of-data-here-s-how-to-make-ai-agents-actually-use-it", "markdown": "https://wpnews.pro/news/your-django-app-has-years-of-data-here-s-how-to-make-ai-agents-actually-use-it.md", "text": "https://wpnews.pro/news/your-django-app-has-years-of-data-here-s-how-to-make-ai-agents-actually-use-it.txt", "jsonld": "https://wpnews.pro/news/your-django-app-has-years-of-data-here-s-how-to-make-ai-agents-actually-use-it.jsonld"}}