{"slug": "semanticsourcecode-local-semantic-code-search-with-ollama-and-sqlite", "title": "SemanticSourceCode – Local semantic code search with Ollama and SQLite", "summary": "A new open-source C# tool, SemanticSourceCode, enables local semantic code search using Ollama or LM Studio for embeddings and SQLite for vector storage. The tool analyzes C# code structure, supports hybrid search combining semantic and keyword matching, and offers features like watch mode, MCP server integration, and adaptive similarity thresholds.", "body_md": "A C# tool for semantic code search with local embeddings. Search your codebase by meaning, not just keywords.\n\n- 🔍\n**Semantic Chunking**— Analyzes C# classes, methods, properties, constructors and fields separately - 🧠\n**Local Embeddings**— Uses Ollama or LM Studio locally, no cloud dependency, no data leakage - 💾\n**SQLite Vector Database**— Simple embedded database with cosine similarity search - 🔎\n**Semantic Search**— Find code based on meaning, not just keywords - 👀\n**Watch Mode**— Live incremental re-indexing on file changes (500 ms debounce, Ctrl+C to stop) - 🔌\n**MCP Server**— Expose the search as a Model Context Protocol tool. - 📜\n**Scriptable Search**— Non-interactive one-shot mode with`--query`\n\nfor pipes, scripts and agentic use - ⚡\n**Multiple Providers**— Switch between Ollama and LM Studio via configuration - 🚀\n**Enhanced Search Quality**— Content boosting and query expansion for better results - 🏷️\n**Framework Detection**— Automatic detection of ASP.NET Controllers, Services and Middleware - 📊\n**Call Graph Analysis**— Track method calls and dependencies between code chunks\n\nThe search engine combines semantic similarity with keyword matching:\n\n**Semantic Score**— Cosine similarity of embeddings (weight: 0.7)** Keyword Score**— Matches in class names, member names, and content (weight: 0.3)** Combined**—`hybrid_score = 0.7 * semantic + 0.3 * keyword`\n\nThis ensures that exact keyword matches (e.g., `class DatabaseService`\n\n) are not overshadowed by semantically similar but structurally irrelevant results.\n\nNarrow down search results with structural filters:\n\n```\n# Only search in controllers\n./SemanticSourceCode --mode search --namespace Api.Controllers --http-method GET\n\n# Only search in specific class\n./SemanticSourceCode --mode search --class DatabaseService\n\n# File path pattern\n./SemanticSourceCode --mode search --file-pattern \"*/Controllers/*\"\n```\n\nAvailable filters:\n\n| Filter | CLI Flag | Description |\n|---|---|---|\n| Namespace | `--namespace` |\nMatch namespace name (exact or partial) |\n| Class | `--class` |\nMatch class name |\n| HTTP Method | `--http-method` |\nMatch HTTP method (GET, POST, etc.) |\n| File Pattern | `--file-pattern` |\nMatch file path (glob pattern) |\n\nWhen no strong matches are found, the engine suggests alternative queries based on Levenshtein distance to known class and member names:\n\n```\n> DataBase\nDo you mean: DatabaseService?\n```\n\nSuggestions are computed from the indexed codebase and require no external dependencies.\n\nThe similarity threshold adjusts automatically based on:\n\n**Score Distribution**— Percentile-based analysis of result scores** Gap Detection**— Elbow method to find natural cutoffs** Query Specificity**— Shorter queries get lower thresholds (generic), longer queries get higher thresholds (specific)\n\nConfigure in `appsettings.json`\n\n:\n\n```\n{\n  \"Search\": {\n    \"AdaptiveThreshold\": {\n      \"Enabled\": true,\n      \"FloorThreshold\": 0.30,\n      \"CeilingThreshold\": 0.85,\n      \"Percentile\": 70\n    }\n  }\n}\n```\n\nResults are re-ranked using structural boosts:\n\n| Signal | Boost | Description |\n|---|---|---|\n| ClassName Match | ×1.3 | Query matches class name |\n| MemberName Match | ×1.0 | Query matches member name |\n| Controller | ×1.1 | ASP.NET Controller detected |\n| Service | ×1.1 | Service class detected |\n| Middleware | ×1.1 | Middleware class detected |\n| Documentation | ×1.05 | Has XML documentation |\n| Small File | ×0.9 | Penalty for very small files (often helpers) |\n\nAll search features can be configured in `appsettings.json`\n\n:\n\n```\n{\n  \"Search\": {\n    \"MinimumSimilarity\": 0.35,\n    \"TopK\": 20,\n    \"DisplayCount\": 5,\n    \"WeakMatchThreshold\": 0.30,\n    \"Hybrid\": {\n      \"SemanticWeight\": 0.7,\n      \"KeywordWeight\": 0.3\n    },\n    \"AdaptiveThreshold\": {\n      \"Enabled\": true,\n      \"FloorThreshold\": 0.30,\n      \"CeilingThreshold\": 0.85,\n      \"Percentile\": 70\n    },\n    \"ReRanking\": {\n      \"ClassNameBoost\": 1.3,\n      \"MemberNameBoost\": 1.0,\n      \"ControllerBoost\": 1.1,\n      \"DocumentationBoost\": 1.05\n    }\n  }\n}\n┌─────────────────┐      ┌──────────────────┐\n│  C# Files       │ ───> │   CodeAnalyzer   │ (Roslyn)\n└─────────────────┘      └────────┬─────────┘\n                                 │ CodeChunks\n                                 v\n                        ┌──────────────────┐\n                        │ EmbeddingProvider│ (Ollama/LM Studio)\n                        └────────┬─────────┘\n                                 │ float[]\n                                 v\n                        ┌──────────────────┐\n                        │ SqliteVssDatabase│ (vec0)\n                        └────────┬─────────┘\n                                 │\n                                 v\n                        ┌──────────────────┐\n                        │ SearchEngine     │ (Cosine Sim)\n                        └──────────────────┘\n```\n\n| Components | Responsibility | File |\n|---|---|---|\n| CodeAnalyzer | Roslyn-based code decomposition | Services/CodeAnalyzer.cs |\n| IEmbeddingService | Provider abstraction | Services/IEmbeddingService.cs |\n| EmbeddingServiceFactory | Auto-detect provider | Services/EmbeddingServiceFactory.cs |\n| IVectorDatabase | Vector storage with cosine similarity | Services/IVectorDatabase.cs |\n| SqliteVssDatabase | SQLite + vec0 implementation | Services/SqliteVssDatabase.cs |\n| HybridSearchService | Combines semantic + keyword search | Search/HybridSearchService.cs |\n| ResultRanker | Re-ranking with structural signals | Search/ResultRanker.cs |\n| QuerySuggester | Levenshtein-based suggestions | Search/QuerySuggester.cs |\n| AdaptiveThreshold | Dynamic similarity threshold | Search/AdaptiveThreshold.cs |\n| SearchFilter | Context filters (namespace, class, etc.) | Search/SearchFilter.cs |\n| QueryExpander | Synonym expansion | Search/QueryExpander.cs |\n| CodeChunk | Data model | Models/CodeChunk.cs |\n\n[.NET 10.0 SDK](https://dotnet.microsoft.com/download/dotnet/10.0)- Either\n[Ollama](https://ollama.com)or[LM Studio](https://lmstudio.ai)(locally installed)\n\n```\n# Using the dotnet-install script\ncurl -sSL https://dot.net/v1/dotnet-install.sh | bash /dev/stdin --channel 10.0 --install-dir ~/.dotnet\n\n# Add to PATH\nexport PATH=\"$HOME/.dotnet:$PATH\"\n\n# Verify version\ndotnet --version  # Should print 10.0.x\n```\n\nFor other installation methods (Windows, package managers), see the [official .NET 10 documentation](https://learn.microsoft.com/en-us/dotnet/core/install/).\n\n```\n# Install Ollama (Linux/macOS)\ncurl -fsSL https://ollama.com/install.sh | sh\n\n# Pull an embedding model\nollama pull nomic-embed-text\n```\n\nDefault Ollama endpoint: `http://localhost:11434`\n\n- Download and install\n[LM Studio](https://lmstudio.ai)for your platform - Open LM Studio and go to the\n**Developer** tab - Start the local server (default port: 1234)\n- Load an embedding model, e.g.:\n`nomic-ai/nomic-embed-text-v1.5`\n\n`sentence-transformers/all-MiniLM-L6-v2`\n\nDefault LM Studio endpoint: `http://localhost:1234`\n\nBy default, the app uses **auto-detect** — you don't need to configure anything.\n\nJust set:\n\n```\n{\n  \"Embedding\": {\n    \"Provider\": \"auto\"\n  }\n}\n```\n\nThe app will automatically:\n\n**Check LM Studio first**(faster, local UI) — port 1234** Fall back to Ollama**— port 11434** Pick whichever is available**with a loaded embedding model\n\n**Zero-config out-of-the-box**— Install either LM Studio or Ollama, the app just works** Respects explicit choice**— Set`\"ollama\"`\n\nor`\"lmstudio\"`\n\nto pin a provider (fallback still works if that one is down)**Transparent logging**— The console tells you exactly which provider was chosen and why\n\n| Config | First Try | Fallback |\n|---|---|---|\n`auto` |\nLM Studio | Ollama |\n`lmstudio` |\nLM Studio | Ollama |\n`ollama` |\nOllama | LM Studio |\n\nIf neither provider is reachable, you'll get a clear error with installation instructions for both.\n\n```\ndotnet restore\ndotnet build\ndotnet test        # All 109 tests should pass\ndotnet publish -c Release\n# Index C# files in a directory\n./SemanticSourceCode --mode index --path ./src\n\n# Example with absolute path\n./SemanticSourceCode --mode index --path /home/user/projects/MyApp\n# Start watch mode on a directory\n./SemanticSourceCode --mode watch --path ./src\n```\n\nWatch mode runs an initial full index, then keeps the process running and\nre-indexes the affected file automatically whenever a `*.cs`\n\nfile is\ncreated, changed, deleted, or renamed. The index stays fresh within\n~500 ms of an edit, so searches in another shell always see the latest code.\n\n**Debounce**— Multiple rapid saves to the same file are coalesced into a single re-index (default: 500 ms).** Excluded directories**—`bin/`\n\n,`obj/`\n\n,`.git/`\n\n,`.vs/`\n\n,`.idea/`\n\n,`node_modules/`\n\n,`dist/`\n\n,`build/`\n\nare ignored automatically.**Stop**— Press`Ctrl+C`\n\nto stop watching. The watcher exits cleanly, no leftover background tasks.\n\nExample workflow:\n\n```\n# Terminal 1: start watching\n./SemanticSourceCode --mode watch --path ./src\n\n# Terminal 2: edit a file\nvim ./src/Services/MyService.cs   # → re-indexes automatically\n\n# Terminal 3: search while watching\n./SemanticSourceCode --mode search\n```\n\n**Interactive mode:**\n\n```\n# Start interactive search mode\n./SemanticSourceCode --mode search\n```\n\nExample queries:\n\n- \"How do I find all files in a directory?\"\n- \"Database connection handling\"\n- \"Async HTTP client\"\n- \"User authentication\"\n\n**Non-interactive (one-shot) mode:**\n\n```\n# Default (text format) — prints human-readable results, exits\n./SemanticSourceCode --mode search --query \"arithmetic calculation\"\n\n# JSON output — for piping into jq, scripts, or other tools\n./SemanticSourceCode --mode search --query \"arithmetic calculation\" --format json\n\n# Quiet output — only the top-1 result, one line\n./SemanticSourceCode --mode search --query \"Add\" --quiet\n\n# Short flags\n./SemanticSourceCode --mode search -q \"Add\" -f json -l 2\n\n# With structural filter\n./SemanticSourceCode --mode search -q \"Query\" --namespace MyApp.Data\n```\n\nThe one-shot mode is perfect for scripts and agentic use:\n\n| Flag | Description |\n|---|---|\n`--query, -q` |\nThe search query (triggers non-interactive mode) |\n`--format, -f` |\n`text` (default), `json` , or `quiet` |\n`--limit, -l` |\nMax results to display |\n`--quiet` |\nShorthand for `--format quiet` |\n`--namespace` |\nFilter to chunks in this namespace |\n`--class` |\nFilter to chunks in this class |\n`--http-method` |\nFilter to controller methods with this verb |\n`--file-pattern` |\nFilter to files matching this glob |\n\n**Exit codes** (non-interactive only):\n\n`0`\n\n— at least one result found`1`\n\n— no results, validation error, or DB not initialized\n\n```\n# Start the MCP server over stdio\n./SemanticSourceCode --mode mcp\n```\n\nThe server speaks **JSON-RPC 2.0** over **stdin/stdout** (MCP standard). It\nexposes two tools that AI agents can\ncall directly:\n\n| Tool | Description |\n|---|---|\n`search_code` |\nSemantic search with optional `namespace` , `class` , `filePattern` , `limit` filters |\n`get_chunk_by_id` |\nFetch a single indexed chunk by its semantic ID |\n\nStatus messages go to **stderr** so the JSON-RPC channel on **stdout**\nstays clean for client parsing.\n\n**Example:\nproject-local .mcp.json):**\n\n```\n{\n  \"mcpServers\": {\n    \"semantic-source-code\": {\n      \"command\": \"SemanticSourceCode\",\n      \"args\": [\"--mode\", \"mcp\"]\n    }\n  }\n}\n```\n\nAfter restarting the agent can call `search_code`\n\nand\n`get_chunk_by_id`\n\ndirectly in its tool-using workflow.\n\nEdit `appsettings.json`\n\nto switch providers. Use `\"auto\"`\n\n(default) for zero-config behavior, or explicitly pin a provider.\n\n```\n{\n  \"Embedding\": {\n    \"Provider\": \"auto\"\n  },\n  \"Ollama\": {\n    \"BaseUrl\": \"http://localhost:11434\",\n    \"EmbeddingModel\": \"nomic-embed-text\"\n  },\n  \"LMStudio\": {\n    \"BaseUrl\": \"http://localhost:1234\",\n    \"EmbeddingModel\": \"text-embedding-nomic-embed-text-v1.5\"\n  },\n  \"Database\": {\n    \"Path\": \"codechunks.db\"\n  }\n}\n{\n  \"Embedding\": {\n    \"Provider\": \"ollama\"\n  },\n  \"Ollama\": {\n    \"BaseUrl\": \"http://localhost:11434\",\n    \"EmbeddingModel\": \"nomic-embed-text\"\n  },\n  \"Database\": {\n    \"Path\": \"codechunks.db\"\n  }\n}\n{\n  \"Embedding\": {\n    \"Provider\": \"lmstudio\"\n  },\n  \"LMStudio\": {\n    \"BaseUrl\": \"http://localhost:1234\",\n    \"EmbeddingModel\": \"text-embedding-nomic-embed-text-v1.5\"\n  },\n  \"Database\": {\n    \"Path\": \"codechunks.db\"\n  }\n}\n```\n\n| Section | Key | Default | Description |\n|---|---|---|---|\n`Embedding` |\n`Provider` |\n`auto` |\nProvider: `auto` , `ollama` , or `lmstudio` |\n`Ollama` |\n`BaseUrl` |\n`http://localhost:11434` |\nOllama API endpoint |\n`Ollama` |\n`EmbeddingModel` |\n`nomic-embed-text` |\nModel name in Ollama |\n`LMStudio` |\n`BaseUrl` |\n`http://localhost:1234` |\nLM Studio API endpoint |\n`LMStudio` |\n`EmbeddingModel` |\n`text-embedding-nomic-embed-text-v1.5` |\nModel identifier for LM Studio |\n`Database` |\n`Path` |\n`codechunks.db` |\nSQLite database file path |\n`Chunking` |\n`MaxChunkSize` |\n`1000` |\nMaximum tokens per chunk |\n`Chunking` |\n`OverlapTokens` |\n`100` |\nOverlap between chunks |\n\nSearch queries are automatically expanded with synonyms and related terms. You can customize this in `appsettings.json`\n\n:\n\n```\n{\n  \"QueryExpansion\": {\n    \"db\": \"database,sql,entity framework\",\n    \"http\": \"web,api,rest,endpoint\",\n    \"async\": \"asynchronous,task,background\"\n  }\n}\n```\n\nEach C# class is split into separate chunks:\n\n**Methods**— With signature, body and XML documentation** Properties**— Including getter/setter logic** Constructors**— Separate initialization logic** Fields**— With type and initialization\n\nTo improve search quality, the tool implements several techniques:\n\nEach code chunk is enhanced with additional metadata to improve search relevance:\n\n**Class Name Boosting**— Class names are repeated to increase their weight** Member Name Boosting**— Member names are emphasized for better matching** Framework Metadata**— Framework-specific terms are added for ASP.NET components\n\nSearch queries are automatically expanded with synonyms and related terms:\n\n`db`\n\n→`database`\n\n,`sql`\n\n,`entity framework`\n\n`http`\n\n→`web`\n\n,`api`\n\n,`rest`\n\n,`endpoint`\n\n`async`\n\n→`asynchronous`\n\n,`task`\n\n,`background`\n\n`sensor`\n\n→`ultrasonic`\n\n,`distance`\n\n,`color`\n\n,`gyro`\n\n`file`\n\n→`io`\n\n,`read`\n\n,`write`\n\n,`stream`\n\n- Uses the Ollama HTTP API (\n`/api/embeddings`\n\n) - Compatible with all Ollama embedding models\n- Default:\n`nomic-embed-text`\n\n(768 dimensions) - Alternatives:\n`mxbai-embed-large`\n\n,`all-minilm`\n\n- Uses the OpenAI-compatible HTTP API (\n`/v1/embeddings`\n\n) - Works with any model loaded in LM Studio\n- Default:\n`text-embedding-nomic-embed-text-v1.5`\n\n- Supports models from HuggingFace, GGUF, etc.\n\nCosine similarity implementation:\n\n```\nsimilarity = (A · B) / (||A|| × ||B||)\n```\n\nIf you see:\n\n```\nNo embedding provider available.\n```\n\nMake sure at least one of these is running:\n\n**LM Studio:**\n\n- Open LM Studio and go to the\n**Developer** tab - Start the local server (toggle should be green)\n- Load an embedding model (e.g.\n`nomic-embed-text-v1.5`\n\n) - Verify:\n`curl http://localhost:1234/v1/models`\n\n**Ollama:**\n\n```\n# Pull an embedding model\nollama pull nomic-embed-text\n\n# Ensure Ollama is running\nollama serve\n\n# Verify\ncurl http://localhost:11434/api/tags\n```\n\nThe app is set to `auto`\n\nby default, so it will pick whichever is available.\n\nIf you see:\n\n```\nLM Studio erreichbar, aber kein Modell geladen.\n```\n\nGo to the **Developer** tab in LM Studio, load an embedding model, and make sure the server is started.\n\n```\n# Check if Ollama is running\ncurl http://localhost:11434/api/tags\n\n# Start Ollama\nollama serve\n```\n\n- Open LM Studio and go to the\n**Developer** tab - Ensure the server is started (toggle should be green)\n- Verify the port in\n`appsettings.json`\n\nmatches the displayed port - Test with:\n`curl http://localhost:1234/v1/models`\n\n- Make sure indexing completed successfully\n- Check\n`codechunks.db`\n\nfile size (should be > 0 bytes) - Use more specific search terms\n- Verify your embedding provider is running and the model is loaded\n\n- Embedding generation is CPU-intensive — expect slower performance on Raspberry Pi or low-power devices\n- The tool processes chunks sequentially (batch size: 1)\n- Consider using a machine with GPU support for faster embedding generation\n\nWe welcome contributions! Please see [CONTRIBUTING.md](/TheEifelYeti/SemanticSourceCode/blob/main/CONTRIBUTING.md) for details.\n\n- Report bugs via\n[GitHub Issues](https://github.com/YOUR_USERNAME/SemanticSourceCode/issues) - Request features via\n[GitHub Discussions](https://github.com/YOUR_USERNAME/SemanticSourceCode/discussions) - Submit pull requests following our PR template\n\nMIT", "url": "https://wpnews.pro/news/semanticsourcecode-local-semantic-code-search-with-ollama-and-sqlite", "canonical_source": "https://github.com/TheEifelYeti/SemanticSourceCode", "published_at": "2026-06-14 08:05:18+00:00", "updated_at": "2026-06-14 08:31:27.541957+00:00", "lang": "en", "topics": ["developer-tools", "machine-learning", "natural-language-processing", "ai-tools"], "entities": ["SemanticSourceCode", "Ollama", "LM Studio", "SQLite", "ASP.NET"], "alternates": {"html": "https://wpnews.pro/news/semanticsourcecode-local-semantic-code-search-with-ollama-and-sqlite", "markdown": "https://wpnews.pro/news/semanticsourcecode-local-semantic-code-search-with-ollama-and-sqlite.md", "text": "https://wpnews.pro/news/semanticsourcecode-local-semantic-code-search-with-ollama-and-sqlite.txt", "jsonld": "https://wpnews.pro/news/semanticsourcecode-local-semantic-code-search-with-ollama-and-sqlite.jsonld"}}