SemanticSourceCode – Local semantic code search with Ollama and SQLite

A new open-source C# tool, SemanticSourceCode, enables local semantic code search using Ollama or LM Studio for embeddings and SQLite for vector storage. The tool analyzes C# code structure, supports hybrid search combining semantic and keyword matching, and offers features like watch mode, MCP server integration, and adaptive similarity thresholds.

A C tool for semantic code search with local embeddings. Search your codebase by meaning, not just keywords. - 🔍 Semantic Chunking — Analyzes C classes, methods, properties, constructors and fields separately - 🧠 Local Embeddings — Uses Ollama or LM Studio locally, no cloud dependency, no data leakage - 💾 SQLite Vector Database — Simple embedded database with cosine similarity search - 🔎 Semantic Search — Find code based on meaning, not just keywords - 👀 Watch Mode — Live incremental re-indexing on file changes 500 ms debounce, Ctrl+C to stop - 🔌 MCP Server — Expose the search as a Model Context Protocol tool. - 📜 Scriptable Search — Non-interactive one-shot mode with --query for pipes, scripts and agentic use - ⚡ Multiple Providers — Switch between Ollama and LM Studio via configuration - 🚀 Enhanced Search Quality — Content boosting and query expansion for better results - 🏷️ Framework Detection — Automatic detection of ASP.NET Controllers, Services and Middleware - 📊 Call Graph Analysis — Track method calls and dependencies between code chunks The search engine combines semantic similarity with keyword matching: Semantic Score — Cosine similarity of embeddings weight: 0.7 Keyword Score — Matches in class names, member names, and content weight: 0.3 Combined — hybrid score = 0.7 semantic + 0.3 keyword This ensures that exact keyword matches e.g., class DatabaseService are not overshadowed by semantically similar but structurally irrelevant results. Narrow down search results with structural filters: Only search in controllers ./SemanticSourceCode --mode search --namespace Api.Controllers --http-method GET Only search in specific class ./SemanticSourceCode --mode search --class DatabaseService File path pattern ./SemanticSourceCode --mode search --file-pattern " /Controllers/ " Available filters: | Filter | CLI Flag | Description | |---|---|---| | Namespace | --namespace | Match namespace name exact or partial | | Class | --class | Match class name | | HTTP Method | --http-method | Match HTTP method GET, POST, etc. | | File Pattern | --file-pattern | Match file path glob pattern | When no strong matches are found, the engine suggests alternative queries based on Levenshtein distance to known class and member names: DataBase Do you mean: DatabaseService? Suggestions are computed from the indexed codebase and require no external dependencies. The similarity threshold adjusts automatically based on: Score Distribution — Percentile-based analysis of result scores Gap Detection — Elbow method to find natural cutoffs Query Specificity — Shorter queries get lower thresholds generic , longer queries get higher thresholds specific Configure in appsettings.json : { "Search": { "AdaptiveThreshold": { "Enabled": true, "FloorThreshold": 0.30, "CeilingThreshold": 0.85, "Percentile": 70 } } } Results are re-ranked using structural boosts: | Signal | Boost | Description | |---|---|---| | ClassName Match | ×1.3 | Query matches class name | | MemberName Match | ×1.0 | Query matches member name | | Controller | ×1.1 | ASP.NET Controller detected | | Service | ×1.1 | Service class detected | | Middleware | ×1.1 | Middleware class detected | | Documentation | ×1.05 | Has XML documentation | | Small File | ×0.9 | Penalty for very small files often helpers | All search features can be configured in appsettings.json : { "Search": { "MinimumSimilarity": 0.35, "TopK": 20, "DisplayCount": 5, "WeakMatchThreshold": 0.30, "Hybrid": { "SemanticWeight": 0.7, "KeywordWeight": 0.3 }, "AdaptiveThreshold": { "Enabled": true, "FloorThreshold": 0.30, "CeilingThreshold": 0.85, "Percentile": 70 }, "ReRanking": { "ClassNameBoost": 1.3, "MemberNameBoost": 1.0, "ControllerBoost": 1.1, "DocumentationBoost": 1.05 } } } ┌─────────────────┐ ┌──────────────────┐ │ C Files │ ─── │ CodeAnalyzer │ Roslyn └─────────────────┘ └────────┬─────────┘ │ CodeChunks v ┌──────────────────┐ │ EmbeddingProvider│ Ollama/LM Studio └────────┬─────────┘ │ float v ┌──────────────────┐ │ SqliteVssDatabase│ vec0 └────────┬─────────┘ │ v ┌──────────────────┐ │ SearchEngine │ Cosine Sim └──────────────────┘ | Components | Responsibility | File | |---|---|---| | CodeAnalyzer | Roslyn-based code decomposition | Services/CodeAnalyzer.cs | | IEmbeddingService | Provider abstraction | Services/IEmbeddingService.cs | | EmbeddingServiceFactory | Auto-detect provider | Services/EmbeddingServiceFactory.cs | | IVectorDatabase | Vector storage with cosine similarity | Services/IVectorDatabase.cs | | SqliteVssDatabase | SQLite + vec0 implementation | Services/SqliteVssDatabase.cs | | HybridSearchService | Combines semantic + keyword search | Search/HybridSearchService.cs | | ResultRanker | Re-ranking with structural signals | Search/ResultRanker.cs | | QuerySuggester | Levenshtein-based suggestions | Search/QuerySuggester.cs | | AdaptiveThreshold | Dynamic similarity threshold | Search/AdaptiveThreshold.cs | | SearchFilter | Context filters namespace, class, etc. | Search/SearchFilter.cs | | QueryExpander | Synonym expansion | Search/QueryExpander.cs | | CodeChunk | Data model | Models/CodeChunk.cs | .NET 10.0 SDK https://dotnet.microsoft.com/download/dotnet/10.0 - Either Ollama https://ollama.com or LM Studio https://lmstudio.ai locally installed Using the dotnet-install script curl -sSL https://dot.net/v1/dotnet-install.sh | bash /dev/stdin --channel 10.0 --install-dir ~/.dotnet Add to PATH export PATH="$HOME/.dotnet:$PATH" Verify version dotnet --version Should print 10.0.x For other installation methods Windows, package managers , see the official .NET 10 documentation https://learn.microsoft.com/en-us/dotnet/core/install/ . Install Ollama Linux/macOS curl -fsSL https://ollama.com/install.sh | sh Pull an embedding model ollama pull nomic-embed-text Default Ollama endpoint: http://localhost:11434 - Download and install LM Studio https://lmstudio.ai for your platform - Open LM Studio and go to the Developer tab - Start the local server default port: 1234 - Load an embedding model, e.g.: nomic-ai/nomic-embed-text-v1.5 sentence-transformers/all-MiniLM-L6-v2 Default LM Studio endpoint: http://localhost:1234 By default, the app uses auto-detect — you don't need to configure anything. Just set: { "Embedding": { "Provider": "auto" } } The app will automatically: Check LM Studio first faster, local UI — port 1234 Fall back to Ollama — port 11434 Pick whichever is available with a loaded embedding model Zero-config out-of-the-box — Install either LM Studio or Ollama, the app just works Respects explicit choice — Set "ollama" or "lmstudio" to pin a provider fallback still works if that one is down Transparent logging — The console tells you exactly which provider was chosen and why | Config | First Try | Fallback | |---|---|---| auto | LM Studio | Ollama | lmstudio | LM Studio | Ollama | ollama | Ollama | LM Studio | If neither provider is reachable, you'll get a clear error with installation instructions for both. dotnet restore dotnet build dotnet test All 109 tests should pass dotnet publish -c Release Index C files in a directory ./SemanticSourceCode --mode index --path ./src Example with absolute path ./SemanticSourceCode --mode index --path /home/user/projects/MyApp Start watch mode on a directory ./SemanticSourceCode --mode watch --path ./src Watch mode runs an initial full index, then keeps the process running and re-indexes the affected file automatically whenever a .cs file is created, changed, deleted, or renamed. The index stays fresh within ~500 ms of an edit, so searches in another shell always see the latest code. Debounce — Multiple rapid saves to the same file are coalesced into a single re-index default: 500 ms . Excluded directories — bin/ , obj/ , .git/ , .vs/ , .idea/ , node modules/ , dist/ , build/ are ignored automatically. Stop — Press Ctrl+C to stop watching. The watcher exits cleanly, no leftover background tasks. Example workflow: Terminal 1: start watching ./SemanticSourceCode --mode watch --path ./src Terminal 2: edit a file vim ./src/Services/MyService.cs → re-indexes automatically Terminal 3: search while watching ./SemanticSourceCode --mode search Interactive mode: Start interactive search mode ./SemanticSourceCode --mode search Example queries: - "How do I find all files in a directory?" - "Database connection handling" - "Async HTTP client" - "User authentication" Non-interactive one-shot mode: Default text format — prints human-readable results, exits ./SemanticSourceCode --mode search --query "arithmetic calculation" JSON output — for piping into jq, scripts, or other tools ./SemanticSourceCode --mode search --query "arithmetic calculation" --format json Quiet output — only the top-1 result, one line ./SemanticSourceCode --mode search --query "Add" --quiet Short flags ./SemanticSourceCode --mode search -q "Add" -f json -l 2 With structural filter ./SemanticSourceCode --mode search -q "Query" --namespace MyApp.Data The one-shot mode is perfect for scripts and agentic use: | Flag | Description | |---|---| --query, -q | The search query triggers non-interactive mode | --format, -f | text default , json , or quiet | --limit, -l | Max results to display | --quiet | Shorthand for --format quiet | --namespace | Filter to chunks in this namespace | --class | Filter to chunks in this class | --http-method | Filter to controller methods with this verb | --file-pattern | Filter to files matching this glob | Exit codes non-interactive only : 0 — at least one result found 1 — no results, validation error, or DB not initialized Start the MCP server over stdio ./SemanticSourceCode --mode mcp The server speaks JSON-RPC 2.0 over stdin/stdout MCP standard . It exposes two tools that AI agents can call directly: | Tool | Description | |---|---| search code | Semantic search with optional namespace , class , filePattern , limit filters | get chunk by id | Fetch a single indexed chunk by its semantic ID | Status messages go to stderr so the JSON-RPC channel on stdout stays clean for client parsing. Example: project-local .mcp.json : { "mcpServers": { "semantic-source-code": { "command": "SemanticSourceCode", "args": "--mode", "mcp" } } } After restarting the agent can call search code and get chunk by id directly in its tool-using workflow. Edit appsettings.json to switch providers. Use "auto" default for zero-config behavior, or explicitly pin a provider. { "Embedding": { "Provider": "auto" }, "Ollama": { "BaseUrl": "http://localhost:11434", "EmbeddingModel": "nomic-embed-text" }, "LMStudio": { "BaseUrl": "http://localhost:1234", "EmbeddingModel": "text-embedding-nomic-embed-text-v1.5" }, "Database": { "Path": "codechunks.db" } } { "Embedding": { "Provider": "ollama" }, "Ollama": { "BaseUrl": "http://localhost:11434", "EmbeddingModel": "nomic-embed-text" }, "Database": { "Path": "codechunks.db" } } { "Embedding": { "Provider": "lmstudio" }, "LMStudio": { "BaseUrl": "http://localhost:1234", "EmbeddingModel": "text-embedding-nomic-embed-text-v1.5" }, "Database": { "Path": "codechunks.db" } } | Section | Key | Default | Description | |---|---|---|---| Embedding | Provider | auto | Provider: auto , ollama , or lmstudio | Ollama | BaseUrl | http://localhost:11434 | Ollama API endpoint | Ollama | EmbeddingModel | nomic-embed-text | Model name in Ollama | LMStudio | BaseUrl | http://localhost:1234 | LM Studio API endpoint | LMStudio | EmbeddingModel | text-embedding-nomic-embed-text-v1.5 | Model identifier for LM Studio | Database | Path | codechunks.db | SQLite database file path | Chunking | MaxChunkSize | 1000 | Maximum tokens per chunk | Chunking | OverlapTokens | 100 | Overlap between chunks | Search queries are automatically expanded with synonyms and related terms. You can customize this in appsettings.json : { "QueryExpansion": { "db": "database,sql,entity framework", "http": "web,api,rest,endpoint", "async": "asynchronous,task,background" } } Each C class is split into separate chunks: Methods — With signature, body and XML documentation Properties — Including getter/setter logic Constructors — Separate initialization logic Fields — With type and initialization To improve search quality, the tool implements several techniques: Each code chunk is enhanced with additional metadata to improve search relevance: Class Name Boosting — Class names are repeated to increase their weight Member Name Boosting — Member names are emphasized for better matching Framework Metadata — Framework-specific terms are added for ASP.NET components Search queries are automatically expanded with synonyms and related terms: db → database , sql , entity framework http → web , api , rest , endpoint async → asynchronous , task , background sensor → ultrasonic , distance , color , gyro file → io , read , write , stream - Uses the Ollama HTTP API /api/embeddings - Compatible with all Ollama embedding models - Default: nomic-embed-text 768 dimensions - Alternatives: mxbai-embed-large , all-minilm - Uses the OpenAI-compatible HTTP API /v1/embeddings - Works with any model loaded in LM Studio - Default: text-embedding-nomic-embed-text-v1.5 - Supports models from HuggingFace, GGUF, etc. Cosine similarity implementation: similarity = A · B / ||A|| × ||B|| If you see: No embedding provider available. Make sure at least one of these is running: LM Studio: - Open LM Studio and go to the Developer tab - Start the local server toggle should be green - Load an embedding model e.g. nomic-embed-text-v1.5 - Verify: curl http://localhost:1234/v1/models Ollama: Pull an embedding model ollama pull nomic-embed-text Ensure Ollama is running ollama serve Verify curl http://localhost:11434/api/tags The app is set to auto by default, so it will pick whichever is available. If you see: LM Studio erreichbar, aber kein Modell geladen. Go to the Developer tab in LM Studio, load an embedding model, and make sure the server is started. Check if Ollama is running curl http://localhost:11434/api/tags Start Ollama ollama serve - Open LM Studio and go to the Developer tab - Ensure the server is started toggle should be green - Verify the port in appsettings.json matches the displayed port - Test with: curl http://localhost:1234/v1/models - Make sure indexing completed successfully - Check codechunks.db file size should be 0 bytes - Use more specific search terms - Verify your embedding provider is running and the model is loaded - Embedding generation is CPU-intensive — expect slower performance on Raspberry Pi or low-power devices - The tool processes chunks sequentially batch size: 1 - Consider using a machine with GPU support for faster embedding generation We welcome contributions Please see CONTRIBUTING.md /TheEifelYeti/SemanticSourceCode/blob/main/CONTRIBUTING.md for details. - Report bugs via GitHub Issues https://github.com/YOUR USERNAME/SemanticSourceCode/issues - Request features via GitHub Discussions https://github.com/YOUR USERNAME/SemanticSourceCode/discussions - Submit pull requests following our PR template MIT