SemanticSourceCode – Local semantic code search with Ollama and SQLite

wpnews.pro

A C# tool for semantic code search with local embeddings. Search your codebase by meaning, not just keywords.

🔍 Semantic Chunking— Analyzes C# classes, methods, properties, constructors and fields separately - 🧠 Local Embeddings— Uses Ollama or LM Studio locally, no cloud dependency, no data leakage - 💾 SQLite Vector Database— Simple embedded database with cosine similarity search - 🔎 Semantic Search— Find code based on meaning, not just keywords - 👀 Watch Mode— Live incremental re-indexing on file changes (500 ms debounce, Ctrl+C to stop) - 🔌 MCP Server— Expose the search as a Model Context Protocol tool. - 📜 Scriptable Search— Non-interactive one-shot mode with--query

for pipes, scripts and agentic use - ⚡ Multiple Providers— Switch between Ollama and LM Studio via configuration - 🚀 Enhanced Search Quality— Content boosting and query expansion for better results - 🏷️ Framework Detection— Automatic detection of ASP.NET Controllers, Services and Middleware - 📊 Call Graph Analysis— Track method calls and dependencies between code chunks

The search engine combines semantic similarity with keyword matching:

Semantic Score— Cosine similarity of embeddings (weight: 0.7)** Keyword Score**— Matches in class names, member names, and content (weight: 0.3)** Combined**—hybrid_score = 0.7 * semantic + 0.3 * keyword

This ensures that exact keyword matches (e.g., class DatabaseService

) are not overshadowed by semantically similar but structurally irrelevant results.

Narrow down search results with structural filters:

./SemanticSourceCode --mode search --namespace Api.Controllers --http-method GET

./SemanticSourceCode --mode search --class DatabaseService

./SemanticSourceCode --mode search --file-pattern "*/Controllers/*"

Available filters:

Filter	CLI Flag	Description
Namespace	`--namespace`
Match namespace name (exact or partial)
Class	`--class`
Match class name
HTTP Method	`--http-method`
Match HTTP method (GET, POST, etc.)
File Pattern	`--file-pattern`
Match file path (glob pattern)

When no strong matches are found, the engine suggests alternative queries based on Levenshtein distance to known class and member names:

> DataBase
Do you mean: DatabaseService?

Suggestions are computed from the indexed codebase and require no external dependencies.

The similarity threshold adjusts automatically based on:

Score Distribution— Percentile-based analysis of result scores** Gap Detection**— Elbow method to find natural cutoffs** Query Specificity**— Shorter queries get lower thresholds (generic), longer queries get higher thresholds (specific)

Configure in appsettings.json

:

{
  "Search": {
    "AdaptiveThreshold": {
      "Enabled": true,
      "FloorThreshold": 0.30,
      "CeilingThreshold": 0.85,
      "Percentile": 70
    }
  }
}

Results are re-ranked using structural boosts:

Signal	Boost	Description
ClassName Match	×1.3	Query matches class name
MemberName Match	×1.0	Query matches member name
Controller	×1.1	ASP.NET Controller detected
Service	×1.1	Service class detected
Middleware	×1.1	Middleware class detected
Documentation	×1.05	Has XML documentation
Small File	×0.9	Penalty for very small files (often helpers)

All search features can be configured in appsettings.json

:

{
  "Search": {
    "MinimumSimilarity": 0.35,
    "TopK": 20,
    "DisplayCount": 5,
    "WeakMatchThreshold": 0.30,
    "Hybrid": {
      "SemanticWeight": 0.7,
      "KeywordWeight": 0.3
    },
    "AdaptiveThreshold": {
      "Enabled": true,
      "FloorThreshold": 0.30,
      "CeilingThreshold": 0.85,
      "Percentile": 70
    },
    "ReRanking": {
      "ClassNameBoost": 1.3,
      "MemberNameBoost": 1.0,
      "ControllerBoost": 1.1,
      "DocumentationBoost": 1.05
    }
  }
}
┌─────────────────┐      ┌──────────────────┐
│  C# Files       │ ───> │   CodeAnalyzer   │ (Roslyn)
└─────────────────┘      └────────┬─────────┘
                                 │ CodeChunks
                                 v
                        ┌──────────────────┐
                        │ EmbeddingProvider│ (Ollama/LM Studio)
                        └────────┬─────────┘
                                 │ float[]
                                 v
                        ┌──────────────────┐
                        │ SqliteVssDatabase│ (vec0)
                        └────────┬─────────┘
                                 │
                                 v
                        ┌──────────────────┐
                        │ SearchEngine     │ (Cosine Sim)
                        └──────────────────┘

Components	Responsibility	File
CodeAnalyzer	Roslyn-based code decomposition	Services/CodeAnalyzer.cs
IEmbeddingService	Provider abstraction	Services/IEmbeddingService.cs
EmbeddingServiceFactory	Auto-detect provider	Services/EmbeddingServiceFactory.cs
IVectorDatabase	Vector storage with cosine similarity	Services/IVectorDatabase.cs
SqliteVssDatabase	SQLite + vec0 implementation	Services/SqliteVssDatabase.cs
HybridSearchService	Combines semantic + keyword search	Search/HybridSearchService.cs
ResultRanker	Re-ranking with structural signals	Search/ResultRanker.cs
QuerySuggester	Levenshtein-based suggestions	Search/QuerySuggester.cs
AdaptiveThreshold	Dynamic similarity threshold	Search/AdaptiveThreshold.cs
SearchFilter	Context filters (namespace, class, etc.)	Search/SearchFilter.cs
QueryExpander	Synonym expansion	Search/QueryExpander.cs
CodeChunk	Data model	Models/CodeChunk.cs

.NET 10.0 SDK- Either OllamaorLM Studio(locally installed)

curl -sSL https://dot.net/v1/dotnet-install.sh | bash /dev/stdin --channel 10.0 --install-dir ~/.dotnet

export PATH="$HOME/.dotnet:$PATH"

dotnet --version  # Should print 10.0.x

For other installation methods (Windows, package managers), see the official .NET 10 documentation.

curl -fsSL https://ollama.com/install.sh | sh

ollama pull nomic-embed-text

Default Ollama endpoint: http://localhost:11434

Download and install LM Studiofor your platform - Open LM Studio and go to the Developer tab - Start the local server (default port: 1234)
Load an embedding model, e.g.: nomic-ai/nomic-embed-text-v1.5

sentence-transformers/all-MiniLM-L6-v2

Default LM Studio endpoint: http://localhost:1234

By default, the app uses auto-detect — you don't need to configure anything.

Just set:

{
  "Embedding": {
    "Provider": "auto"
  }
}

The app will automatically:

Check LM Studio first(faster, local UI) — port 1234** Fall back to Ollama**— port 11434** Pick whichever is available**with a loaded embedding model

Zero-config out-of-the-box— Install either LM Studio or Ollama, the app just works** Respects explicit choice**— Set"ollama"

or"lmstudio"

to pin a provider (fallback still works if that one is down)Transparent logging— The console tells you exactly which provider was chosen and why

Config	First Try	Fallback
`auto`
LM Studio	Ollama
`lmstudio`
LM Studio	Ollama
`ollama`
Ollama	LM Studio

If neither provider is reachable, you'll get a clear error with installation instructions for both.

dotnet restore
dotnet build
dotnet test        # All 109 tests should pass
dotnet publish -c Release
./SemanticSourceCode --mode index --path ./src

./SemanticSourceCode --mode index --path /home/user/projects/MyApp
./SemanticSourceCode --mode watch --path ./src

Watch mode runs an initial full index, then keeps the process running and re-indexes the affected file automatically whenever a *.cs

file is created, changed, deleted, or renamed. The index stays fresh within ~500 ms of an edit, so searches in another shell always see the latest code.

Debounce— Multiple rapid saves to the same file are coalesced into a single re-index (default: 500 ms).** Excluded directories**—bin/

,obj/

,.git/

,.vs/

,.idea/

,node_modules/

,dist/

,build/

are ignored automatically.Stop— PressCtrl+C

to stop watching. The watcher exits cleanly, no leftover background tasks.

Example workflow:

./SemanticSourceCode --mode watch --path ./src

vim ./src/Services/MyService.cs   # → re-indexes automatically

./SemanticSourceCode --mode search

Interactive mode:

./SemanticSourceCode --mode search

Example queries:

"How do I find all files in a directory?"
"Database connection handling"
"Async HTTP client"
"User authentication"

Non-interactive (one-shot) mode:

./SemanticSourceCode --mode search --query "arithmetic calculation"

./SemanticSourceCode --mode search --query "arithmetic calculation" --format json

./SemanticSourceCode --mode search --query "Add" --quiet

./SemanticSourceCode --mode search -q "Add" -f json -l 2

./SemanticSourceCode --mode search -q "Query" --namespace MyApp.Data

The one-shot mode is perfect for scripts and agentic use:

Flag	Description
`--query, -q`
The search query (triggers non-interactive mode)
`--format, -f`
`text` (default), `json` , or `quiet`
`--limit, -l`
Max results to display
`--quiet`
Shorthand for `--format quiet`
`--namespace`
Filter to chunks in this namespace
`--class`
Filter to chunks in this class
`--http-method`
Filter to controller methods with this verb
`--file-pattern`
Filter to files matching this glob

Exit codes (non-interactive only):

0

— at least one result found1

— no results, validation error, or DB not initialized

./SemanticSourceCode --mode mcp

The server speaks JSON-RPC 2.0 over stdin/stdout (MCP standard). It exposes two tools that AI agents can call directly:

Tool	Description
`search_code`
Semantic search with optional `namespace` , `class` , `filePattern` , `limit` filters
`get_chunk_by_id`
Fetch a single indexed chunk by its semantic ID

Status messages go to stderr so the JSON-RPC channel on stdout stays clean for client parsing.

Example: project-local .mcp.json):

{
  "mcpServers": {
    "semantic-source-code": {
      "command": "SemanticSourceCode",
      "args": ["--mode", "mcp"]
    }
  }
}

After restarting the agent can call search_code

and get_chunk_by_id

directly in its tool-using workflow.

Edit appsettings.json

to switch providers. Use "auto"

(default) for zero-config behavior, or explicitly pin a provider.

{
  "Embedding": {
    "Provider": "auto"
  },
  "Ollama": {
    "BaseUrl": "http://localhost:11434",
    "EmbeddingModel": "nomic-embed-text"
  },
  "LMStudio": {
    "BaseUrl": "http://localhost:1234",
    "EmbeddingModel": "text-embedding-nomic-embed-text-v1.5"
  },
  "Database": {
    "Path": "codechunks.db"
  }
}
{
  "Embedding": {
    "Provider": "ollama"
  },
  "Ollama": {
    "BaseUrl": "http://localhost:11434",
    "EmbeddingModel": "nomic-embed-text"
  },
  "Database": {
    "Path": "codechunks.db"
  }
}
{
  "Embedding": {
    "Provider": "lmstudio"
  },
  "LMStudio": {
    "BaseUrl": "http://localhost:1234",
    "EmbeddingModel": "text-embedding-nomic-embed-text-v1.5"
  },
  "Database": {
    "Path": "codechunks.db"
  }
}

Section	Key	Default	Description
`Embedding`
`Provider`
`auto`
Provider: `auto` , `ollama` , or `lmstudio`
`Ollama`
`BaseUrl`
`http://localhost:11434`
Ollama API endpoint
`Ollama`
`EmbeddingModel`
`nomic-embed-text`
Model name in Ollama
`LMStudio`
`BaseUrl`
`http://localhost:1234`
LM Studio API endpoint
`LMStudio`
`EmbeddingModel`
`text-embedding-nomic-embed-text-v1.5`
Model identifier for LM Studio
`Database`
`Path`
`codechunks.db`
SQLite database file path
`Chunking`
`MaxChunkSize`
`1000`
Maximum tokens per chunk
`Chunking`
`OverlapTokens`
`100`
Overlap between chunks

Search queries are automatically expanded with synonyms and related terms. You can customize this in appsettings.json

:

{
  "QueryExpansion": {
    "db": "database,sql,entity framework",
    "http": "web,api,rest,endpoint",
    "async": "asynchronous,task,background"
  }
}

Each C# class is split into separate chunks:

Methods— With signature, body and XML documentation** Properties**— Including getter/setter logic** Constructors**— Separate initialization logic** Fields**— With type and initialization

To improve search quality, the tool implements several techniques:

Each code chunk is enhanced with additional metadata to improve search relevance:

Class Name Boosting— Class names are repeated to increase their weight** Member Name Boosting**— Member names are emphasized for better matching** Framework Metadata**— Framework-specific terms are added for ASP.NET components

Search queries are automatically expanded with synonyms and related terms:

db

→database

,sql

,entity framework

http

→web

,api

,rest

,endpoint

async

→asynchronous

,task

,background

sensor

→ultrasonic

,distance

,color

,gyro

file

→io

,read

,write

,stream

Uses the Ollama HTTP API ( /api/embeddings

) - Compatible with all Ollama embedding models

Default: nomic-embed-text

(768 dimensions) - Alternatives: mxbai-embed-large

,all-minilm

Uses the OpenAI-compatible HTTP API ( /v1/embeddings

) - Works with any model loaded in LM Studio

Default: text-embedding-nomic-embed-text-v1.5
Supports models from HuggingFace, GGUF, etc.

Cosine similarity implementation:

similarity = (A · B) / (||A|| × ||B||)

If you see:

No embedding provider available.

Make sure at least one of these is running:

LM Studio:

Open LM Studio and go to the Developer tab - Start the local server (toggle should be green)
Load an embedding model (e.g. nomic-embed-text-v1.5

) - Verify: curl http://localhost:1234/v1/models

Ollama:

ollama pull nomic-embed-text

ollama serve

curl http://localhost:11434/api/tags

The app is set to auto

by default, so it will pick whichever is available.

If you see:

LM Studio erreichbar, aber kein Modell geladen.

Go to the Developer tab in LM Studio, load an embedding model, and make sure the server is started.

curl http://localhost:11434/api/tags

ollama serve

Open LM Studio and go to the Developer tab - Ensure the server is started (toggle should be green)
Verify the port in appsettings.json

matches the displayed port - Test with: curl http://localhost:1234/v1/models

Make sure indexing completed successfully
Check codechunks.db

file size (should be > 0 bytes) - Use more specific search terms

Verify your embedding provider is running and the model is loaded
Embedding generation is CPU-intensive — expect slower performance on Raspberry Pi or low-power devices
The tool processes chunks sequentially (batch size: 1)
Consider using a machine with GPU support for faster embedding generation

We welcome contributions! Please see CONTRIBUTING.md for details.

Report bugs via GitHub Issues - Request features via GitHub Discussions - Submit pull requests following our PR template

MIT

source & further reading

github.com — original article

SemanticSourceCode – Local semantic code search with Ollama and SQLite

Run your AI side-project on zahid.host