{"slug": "how-to-set-up-codebase-indexing-in-kilo-code", "title": "How to Set Up Codebase Indexing in Kilo Code", "summary": "Kilo Code has reintroduced codebase indexing for semantic code search, requiring users to configure an embedding provider and vector store before enabling the feature. The indexing system parses code locally with Tree-sitter, chunks it into semantic blocks, and stores vectors for conceptual searches like \"where do we validate customer identity?\" Users must explicitly turn on indexing globally or per project through VS Code settings, as adding API keys alone does not start the process.", "body_md": "# How to Set Up Codebase Indexing in Kilo Code\n\n### Configure providers, vector stores, file filters, tuning, and status checks for semantic code search.\n\n[Codebase indexing is back](https://blog.kilo.ai/p/codebase-indexing-is-back-in-kilo)—here’s how to set it up.\n\nThis is the practical companion to the launch post. The launch post covers why indexing matters. This guide covers the mechanics: providers, vector stores, enablement scope, tuning, file filters, and verification.\n\nThe main rule: provider configuration is not enablement. You can add API keys and model settings without indexing anything. Kilo starts indexing only after you turn it on globally or for the current project.\n\n## What you need before you start\n\nCodebase indexing needs two pieces:\n\nan embedding provider, which turns code chunks into vectors\n\na vector store, which saves those vectors and lets Kilo search them later\n\nKilo parses code locally with Tree-sitter, chunks it into semantic blocks like functions, classes, and methods, embeds those chunks, and stores the vectors. Once the index is ready, Kilo can use the semantic_search tool to answer conceptual questions like “where do we validate customer identity?” or “find the retry logic for failed API calls.”\n\nThis guide follows the current [Kilo Code codebase indexing docs](https://kilo.ai/docs/customize/context/codebase-indexing). Where the public docs do not document a config shape, this guide does not invent one.\n\n## Start in the VS Code settings UI\n\nFor most users, the settings UI is the safest first path. This assumes you already have a vector store ready - if not, be sure to follow the “Choose a vector store” section below first.\n\nOpen Kilo Code in VS Code.\n\nGo to\n\n**Kilo Code Settings → Indexing**.Turn on\n\n**Global Enable** or**Enable for This Project**.Choose an embedding provider.\n\nChoose a vector store:\n\n**Qdrant** or**LanceDB**.Adjust tuning only if you need to.\n\nSave, then watch the indexing status indicator in the prompt input panel.\n\nYou can also click the indexing indicator at the bottom of the prompt input panel to open indexing setup.\n\nThe statuses are:\n\n**Disabled**: indexing is off or not configured.** Initializing**: indexing is getting started and setup.** Standby**: indexing is configured but not currently processing files.** In Progress**: Kilo is scanning, chunking, embedding, or storing files. The UI shows progress (e.g.`Indexed 123 / 250 files (54%)`\n\n).**Complete**: the index is up to date and ready for semantic search.** Error**: indexing failed. Check the error message, provider credentials, and vector store connection.\n\nDo not skip this check. An API key in config does not mean the repo has been indexed.\n\n## Project-level vs. global enablement\n\nKilo has two enablement scopes.\n\nUse **Enabled Globally** when you want Kilo to index every workspace you open, using your global indexing defaults.\n\nUse **Enable for this project** when you want indexing only for the current repo. This is usually the better first test, especially for a large codebase or a hosted embedding provider.\n\nThe config shape is the same either way:\n\n```\n{\n  “indexing”: {\n    “enabled”: true\n  }\n}\n```\n\nThe path determines the scope:\n\nGlobal config:\n\n`~/.config/kilo/kilo.jsonc`\n\nProject config:\n\n`./kilo.jsonc in the repo`\n\nUse the global file for defaults you want everywhere. Use the project file when a repo needs its own provider, vector store, file filters, or tuning.\n\nAgain: setting provider, model, or an API key does not start indexing. indexing.enabled must be true at the scope you intend.\n\n## Path 1: Kilo Gateway users\n\nIf you already use Kilo tokens, check **Kilo Code Settings → Indexing** first. The current public indexing docs list supported embedding providers and provider config keys, but they do not currently document a Kilo Gateway-specific indexing provider or a Gateway embeddings endpoint.\n\nThat matters because provider is not a display label. It is a config key Kilo uses to load the provider. Do not guess a key like kilo-gateway unless your installed Kilo Code build writes it for you.\n\nIf your build shows Kilo Gateway as an embedding option, use the UI and let Kilo write the provider shape. A guarded example looks like this:\n\n```\n{\n  \"indexing\": {\n    \"enabled\": true,\n    // Use the provider key written by your installed Kilo Code build.\n    // The current public indexing docs do not document a Kilo Gateway\n    // embedding provider key, so do not hand-write one from memory.\n    \"provider\": \"<kilo-gateway-provider-from-ui>\",\n    \"model\": \"<embedding-model-from-ui>\",\n    \"vectorStore\": \"lancedb\",\n    \"lancedb\": {}\n  }\n}\n```\n\nIf the UI does not show Kilo Gateway for embeddings, use one of the documented direct provider paths below. The Gateway docs confirm Kilo’s OpenAI-compatible Gateway for chat, FIM, model listing, and provider listing. The indexing docs are the source of truth for indexing embedding providers.\n\n## Path 2: Mistral BYOK\n\nUse Mistral BYOK when you want to bring a Mistral API key from [La Plateforme](https://mistral.ai/news/la-plateforme/).\n\nIn the UI:\n\nOpen\n\n**Kilo Code Settings → Indexing**.Enable indexing globally or for this project.\n\nChoose mistral as the embedding provider.\n\nPaste your Mistral API key.\n\nChoose Qdrant or LanceDB.\n\nSave.\n\nThe docs call out one easy mistake: Codestral-specific keys from the Mistral autocomplete setup guide are not interchangeable with regular Mistral API keys for indexing. Use an API key from La Plateforme.\n\nMinimal Mistral BYOK with LanceDB:\n\n```\n{\n  \"indexing\": {\n    \"enabled\": true,\n    \"provider\": \"mistral\",\n    \"model\": \"\",\n    \"vectorStore\": \"lancedb\",\n    \"mistral\": {\n      \"apiKey\": \"<MISTRAL_API_KEY_FROM_LA_PLATEFORME>\"\n    },\n    \"lancedb\": {}\n  }\n}\n```\n\nLeave model unset if you want the provider default. Set it only when you have a specific Mistral embedding model you want Kilo to use.\n\n## Path 3: Ollama + LanceDB for fully local indexing\n\nUse Ollama + LanceDB when you do not want indexing data to leave your machine.\n\nOllama runs the embedding model locally. LanceDB is embedded and file-based, so there is no vector database server to run. With this setup, parsing, embedding, and vector storage all happen locally.\n\nInstall and start Ollama, then pull an embedding model. The indexing docs list mxbai-embed-large, nomic-embed-text, and all-minilm as Ollama options.\n\n```\nollama pull nomic-embed-text\n```\n\nThen configure Kilo:\n\n```\n{\n  \"indexing\": {\n    \"enabled\": true,\n    \"provider\": \"ollama\",\n    \"model\": \"nomic-embed-text\",\n    \"vectorStore\": \"lancedb\",\n    \"ollama\": {\n      \"baseUrl\": \"http://localhost:11434\"\n    },\n    \"lancedb\": {}\n  }\n}\n```\n\nThis is the simplest fully local setup: no hosted embedding API, no external vector database, and no external calls for indexing. If status moves to Error, confirm that Ollama is running, the model was pulled successfully, and baseUrl matches your local Ollama server.\n\n## Path 4: OpenAI\n\nUse OpenAI when you want a hosted embedding model with a small config surface.\n\nThe docs list `text-embedding-3-small`\n\nas the default, `text-embedding-3-large`\n\nfor higher accuracy, and `text-embedding-ada-002`\n\nas legacy.\n\n```\n{\n  \"indexing\": {\n    \"enabled\": true,\n    \"provider\": \"ollama\",\n    \"model\": \"nomic-embed-text\",\n    \"vectorStore\": \"lancedb\",\n    \"ollama\": {\n      \"baseUrl\": \"http://localhost:11434\"\n    },\n    \"lancedb\": {}\n  }\n}\n```\n\nIf you see rate-limit or batch errors during indexing, lower `embeddingBatchSize`\n\nbefore changing providers.\n\n## Other direct providers\n\nKilo also supports these direct embedding provider shapes. These examples are intentionally brief: use them when you already know which provider and embedding model you want.\n\nOpenAI-compatible endpoint:\n\n```\n{\n  \"indexing\": {\n    \"provider\": \"openai-compatible\",\n    \"model\": \"<embedding-model>\",\n    \"openai-compatible\": {\n      \"baseUrl\": \"https://...\",\n      \"apiKey\": \"...\"\n    }\n  }\n}\n```\n\nGemini:\n\n```\n{\n  \"indexing\": {\n    \"provider\": \"openai-compatible\",\n    \"model\": \"<embedding-model>\",\n    \"openai-compatible\": {\n      \"baseUrl\": \"https://...\",\n      \"apiKey\": \"...\"\n    }\n  }\n}\n```\n\nVercel AI Gateway:\n\n```\n{\n  \"indexing\": {\n    \"provider\": \"vercel-ai-gateway\",\n    \"model\": \"<embedding-model>\",\n    \"vercel-ai-gateway\": {\n      \"apiKey\": \"...\"\n    }\n  }\n}\n```\n\nAWS Bedrock:\n\n```\n{\n  \"indexing\": {\n    \"provider\": \"bedrock\",\n    \"model\": \"<embedding-model>\",\n    \"bedrock\": {\n      \"region\": \"us-east-1\",\n      \"profile\": \"default\"\n    }\n  }\n}\n```\n\nOpenRouter:\n\n```\n{\n  \"indexing\": {\n    \"provider\": \"openrouter\",\n    \"model\": \"<embedding-model>\",\n    \"openrouter\": {\n      \"apiKey\": \"...\",\n      \"specificProvider\": \"...\"\n    }\n  }\n}\n```\n\nVoyage:\n\n```\n{\n  \"indexing\": {\n    \"provider\": \"voyage\",\n    \"model\": \"voyage-code-3\",\n    \"voyage\": {\n      \"apiKey\": \"...\"\n    }\n  }\n}\n```\n\nFor any of these, add `enabled`\n\n, `vectorStore`\n\n, and vector store settings when you want indexing to start:\n\n```\n{\n  \"indexing\": {\n    \"enabled\": true,\n    \"provider\": \"voyage\",\n    \"model\": \"voyage-code-3\",\n    \"vectorStore\": \"lancedb\",\n    \"voyage\": {\n      \"apiKey\": \"...\"\n    },\n    \"lancedb\": {}\n  }\n}\n```\n\n## Choose a vector store: Qdrant or LanceDB\n\nThe vector store is where Kilo saves embeddings after it chunks your code.\n\nUse **LanceDB** when you want the least moving parts. It is embedded and file-based. You do not need Docker, a server process, or a network connection. If you omit a directory, Kilo stores LanceDB data under the Kilo data directory by default.\n\n```\n{\n  \"indexing\": {\n    \"vectorStore\": \"lancedb\",\n    \"lancedb\": {}\n  }\n}\n```\n\nUse **Qdrant** when you want an external vector database server. The docs list Qdrant as the default vector store and recommend it for larger codebases and team deployments. For production, use authentication.\n\nStart Qdrant locally with Docker:\n\n```\ndocker run -p 6333:6333 qdrant/qdrant\n```\n\nThen configure Kilo:\n\n```\n{\n  \"indexing\": {\n    \"vectorStore\": \"qdrant\",\n    \"qdrant\": {\n      \"url\": \"http://localhost:6333\",\n      \"apiKey\": \"\"\n    }\n  }\n}\n```\n\nIf indexing fails with Qdrant selected, check that the server is running, the URL is reachable from Kilo, and the API key matches your Qdrant deployment.\n\n## Configure indexing from the CLI\n\nThe Kilo CLI includes an interactive indexing command when the indexing plugin is installed.\n\nOpen a Kilo TUI session in your repo and run:\n\n```\n/indexing\n```\n\nAliases also work:\n\n```\n/index\n\n/embedding\n```\n\nThe dialog can toggle indexing, choose an embedding provider, set provider credentials, choose a model, set vector dimensions, choose Qdrant or LanceDB, configure vector store settings, and adjust tuning parameters. Changes are written to kilo.jsonc and take effect immediately.\n\nA complete CLI-style config can look like this:\n\n```\n{\n  \"indexing\": {\n    \"enabled\": true,\n    \"provider\": \"voyage\",\n    \"model\": \"voyage-code-3\",\n    \"dimension\": 1024,\n    \"vectorStore\": \"qdrant\",\n    \"voyage\": {\n      \"apiKey\": \"pa-...\"\n    },\n    \"qdrant\": {\n      \"url\": \"http://localhost:6333\",\n      \"apiKey\": \"\"\n    },\n    \"searchMinScore\": 0.4,\n    \"searchMaxResults\": 50,\n    \"embeddingBatchSize\": 60,\n    \"scannerMaxBatchRetries\": 3\n  }\n}\n```\n\nWhen indexing is enabled, the CLI shows an `IDX`\n\nbadge at the bottom of the TUI: `IDX In Progress 40% 120/300`\n\n, `IDX Complete`\n\n, `IDX Standby`\n\n, or `IDX Error <message>`\n\n.\n\n## Tune the defaults only when you have a reason\n\nThe defaults are a good starting point. Change them when you are solving a specific failure mode.\n\n`searchMinScore`\n\ncontrols the minimum similarity score for returned results. The default is `0.4`\n\n. Raise it if searches return too much loosely related code. Lower it if searches miss relevant results.\n\n`searchMaxResults`\n\ncontrols how many results semantic search can return. The default is `50`\n\n. Lower it if the agent receives too much context. Raise it if you are working in a large repo and relevant matches are being cut off.\n\n`embeddingBatchSize`\n\ncontrols how many code segments Kilo sends to the embedding provider per batch. The default is `60`\n\n. Lower it if a hosted provider rate-limits you or if local embedding runs out of memory.\n\n`scannerMaxBatchRetries`\n\ncontrols how many times Kilo retries a failed embedding batch. The default is `3`\n\n. Raise it only if failures are transient and retrying is actually helping.\n\nExample conservative hosted-provider tuning:\n\n```\n{\n  \"indexing\": {\n    \"searchMinScore\": 0.45,\n    \"searchMaxResults\": 30,\n    \"embeddingBatchSize\": 25,\n    \"scannerMaxBatchRetries\": 3\n  }\n}\n```\n\n## Control what gets indexed\n\nKilo does not blindly embed every file in your repo.\n\nBy default, it excludes:\n\nbinary files and images\n\nfiles larger than 1MB\n\n`.git`\n\ndirectoriesdependency folders such as\n\n`node_modules`\n\nand`vendor`\n\nfiles ignored by\n\n`.gitignore`\n\nfiles ignored by\n\n`.kilocodeignore`\n\nUse `.kilocodeignore`\n\nwhen you want indexing-specific exclusions without changing Git ignore rules. Common examples include generated clients, build output, large snapshots, vendored SDKs, or private test fixtures that should not be sent to a hosted embedding provider.\n\nExample `.kilocodeignore`\n\n:\n\n```\n# Generated code\ngenerated/\n**/*.generated.ts\n\n# Large fixtures\nfixtures/snapshots/\n\n# Local secrets and scratch files\n.env*\n.local-notes/\n```\n\nIf you are using a hosted provider, remember the privacy model: parsing happens locally, and Kilo sends small code snippets for embedding, not whole files. If nothing should leave the machine, use Ollama + LanceDB.\n\n## Verify semantic search works\n\nWait for status to reach Complete. Then ask Kilo a conceptual question that would be annoying to grep for:\n\n```\nWhere do we validate user permissions before deleting a resource?\n```\n\nA healthy semantic search result should include relevant snippets, file paths, line numbers, similarity scores, and enough surrounding context for the agent to decide what to read next.\n\nIf status shows `Error`\n\n:\n\nRead the error message in the UI or\n\n`IDX Error`\n\nbadge.Confirm\n\n`indexing.enabled`\n\nis true at the scope you intended.Check provider credentials.\n\nIf using Qdrant, confirm the server is running and reachable.\n\nIf using Ollama, confirm the embedding model is pulled and Ollama is listening on the configured\n\n`baseUrl`\n\n.Lower\n\n`embeddingBatchSize`\n\nif the provider is rate-limiting or local embedding is failing under load.Check\n\n`.kilocodeignore`\n\nand`.gitignore`\n\nif files you expected to see are missing.\n\nFor local embedding failures involving batch or micro-batch settings, align the embedding model batch size and micro-batch size, restart the local server, and try again.\n\n## The setup checklist\n\nA working indexing setup has all of these pieces:\n\n`indexing.enabled`\n\nis true globally or for the current project.The embedding provider is documented and has valid credentials or a reachable local endpoint.\n\nThe model is an embedding model, not a chat-only model.\n\nThe vector store is configured and reachable.\n\nFile filters exclude the files you do not want indexed.\n\nStatus reaches\n\n`Complete`\n\n.Kilo can answer a conceptual code question using semantic search.\n\nStart with the simplest path that matches your constraints: Mistral BYOK or OpenAI for hosted embeddings, Ollama + LanceDB for fully local indexing, LanceDB when you do not want a database server, and Qdrant when you need an external vector store.\n\nOnce status is `Complete`\n\n, indexing is no longer a setup task. It becomes part of the agent workflow: ask Kilo for the concept, inspect the files it finds, and then make the change with the right context loaded.", "url": "https://wpnews.pro/news/how-to-set-up-codebase-indexing-in-kilo-code", "canonical_source": "https://blog.kilo.ai/p/how-to-set-up-codebase-indexing-in", "published_at": "2026-06-06 15:53:51+00:00", "updated_at": "2026-06-06 16:41:05.132536+00:00", "lang": "en", "topics": ["ai-tools", "ai-products", "ai-infrastructure"], "entities": ["Kilo Code", "Tree-sitter", "Kilo AI", "VS Code"], "alternates": {"html": "https://wpnews.pro/news/how-to-set-up-codebase-indexing-in-kilo-code", "markdown": "https://wpnews.pro/news/how-to-set-up-codebase-indexing-in-kilo-code.md", "text": "https://wpnews.pro/news/how-to-set-up-codebase-indexing-in-kilo-code.txt", "jsonld": "https://wpnews.pro/news/how-to-set-up-codebase-indexing-in-kilo-code.jsonld"}}