cd /news/developer-tools/semanticsourcecode-local-semantic-co… · home topics developer-tools article
[ARTICLE · art-26834] src=github.com ↗ pub= topic=developer-tools verified=true sentiment=↑ positive

SemanticSourceCode – Local semantic code search with Ollama and SQLite

A new open-source C# tool, SemanticSourceCode, enables local semantic code search using Ollama or LM Studio for embeddings and SQLite for vector storage. The tool analyzes C# code structure, supports hybrid search combining semantic and keyword matching, and offers features like watch mode, MCP server integration, and adaptive similarity thresholds.

read11 min publishedJun 14, 2026

A C# tool for semantic code search with local embeddings. Search your codebase by meaning, not just keywords.

  • 🔍 Semantic Chunking— Analyzes C# classes, methods, properties, constructors and fields separately - 🧠 Local Embeddings— Uses Ollama or LM Studio locally, no cloud dependency, no data leakage - 💾 SQLite Vector Database— Simple embedded database with cosine similarity search - 🔎 Semantic Search— Find code based on meaning, not just keywords - 👀 Watch Mode— Live incremental re-indexing on file changes (500 ms debounce, Ctrl+C to stop) - 🔌 MCP Server— Expose the search as a Model Context Protocol tool. - 📜 Scriptable Search— Non-interactive one-shot mode with--query

for pipes, scripts and agentic use - ⚡ Multiple Providers— Switch between Ollama and LM Studio via configuration - 🚀 Enhanced Search Quality— Content boosting and query expansion for better results - 🏷️ Framework Detection— Automatic detection of ASP.NET Controllers, Services and Middleware - 📊 Call Graph Analysis— Track method calls and dependencies between code chunks

The search engine combines semantic similarity with keyword matching:

Semantic Score— Cosine similarity of embeddings (weight: 0.7)** Keyword Score**— Matches in class names, member names, and content (weight: 0.3)** Combined**—hybrid_score = 0.7 * semantic + 0.3 * keyword

This ensures that exact keyword matches (e.g., class DatabaseService

) are not overshadowed by semantically similar but structurally irrelevant results.

Narrow down search results with structural filters:

./SemanticSourceCode --mode search --namespace Api.Controllers --http-method GET

./SemanticSourceCode --mode search --class DatabaseService

./SemanticSourceCode --mode search --file-pattern "*/Controllers/*"

Available filters:

Filter CLI Flag Description
Namespace --namespace
Match namespace name (exact or partial)
Class --class
Match class name
HTTP Method --http-method
Match HTTP method (GET, POST, etc.)
File Pattern --file-pattern
Match file path (glob pattern)

When no strong matches are found, the engine suggests alternative queries based on Levenshtein distance to known class and member names:

> DataBase
Do you mean: DatabaseService?

Suggestions are computed from the indexed codebase and require no external dependencies.

The similarity threshold adjusts automatically based on:

Score Distribution— Percentile-based analysis of result scores** Gap Detection**— Elbow method to find natural cutoffs** Query Specificity**— Shorter queries get lower thresholds (generic), longer queries get higher thresholds (specific)

Configure in appsettings.json

:

{
  "Search": {
    "AdaptiveThreshold": {
      "Enabled": true,
      "FloorThreshold": 0.30,
      "CeilingThreshold": 0.85,
      "Percentile": 70
    }
  }
}

Results are re-ranked using structural boosts:

Signal Boost Description
ClassName Match ×1.3 Query matches class name
MemberName Match ×1.0 Query matches member name
Controller ×1.1 ASP.NET Controller detected
Service ×1.1 Service class detected
Middleware ×1.1 Middleware class detected
Documentation ×1.05 Has XML documentation
Small File ×0.9 Penalty for very small files (often helpers)

All search features can be configured in appsettings.json

:

{
  "Search": {
    "MinimumSimilarity": 0.35,
    "TopK": 20,
    "DisplayCount": 5,
    "WeakMatchThreshold": 0.30,
    "Hybrid": {
      "SemanticWeight": 0.7,
      "KeywordWeight": 0.3
    },
    "AdaptiveThreshold": {
      "Enabled": true,
      "FloorThreshold": 0.30,
      "CeilingThreshold": 0.85,
      "Percentile": 70
    },
    "ReRanking": {
      "ClassNameBoost": 1.3,
      "MemberNameBoost": 1.0,
      "ControllerBoost": 1.1,
      "DocumentationBoost": 1.05
    }
  }
}
┌─────────────────┐      ┌──────────────────┐
│  C# Files       │ ───> │   CodeAnalyzer   │ (Roslyn)
└─────────────────┘      └────────┬─────────┘
                                 │ CodeChunks
                                 v
                        ┌──────────────────┐
                        │ EmbeddingProvider│ (Ollama/LM Studio)
                        └────────┬─────────┘
                                 │ float[]
                                 v
                        ┌──────────────────┐
                        │ SqliteVssDatabase│ (vec0)
                        └────────┬─────────┘
                                 │
                                 v
                        ┌──────────────────┐
                        │ SearchEngine     │ (Cosine Sim)
                        └──────────────────┘
Components Responsibility File
CodeAnalyzer Roslyn-based code decomposition Services/CodeAnalyzer.cs
IEmbeddingService Provider abstraction Services/IEmbeddingService.cs
EmbeddingServiceFactory Auto-detect provider Services/EmbeddingServiceFactory.cs
IVectorDatabase Vector storage with cosine similarity Services/IVectorDatabase.cs
SqliteVssDatabase SQLite + vec0 implementation Services/SqliteVssDatabase.cs
HybridSearchService Combines semantic + keyword search Search/HybridSearchService.cs
ResultRanker Re-ranking with structural signals Search/ResultRanker.cs
QuerySuggester Levenshtein-based suggestions Search/QuerySuggester.cs
AdaptiveThreshold Dynamic similarity threshold Search/AdaptiveThreshold.cs
SearchFilter Context filters (namespace, class, etc.) Search/SearchFilter.cs
QueryExpander Synonym expansion Search/QueryExpander.cs
CodeChunk Data model Models/CodeChunk.cs

.NET 10.0 SDK- Either OllamaorLM Studio(locally installed)

curl -sSL https://dot.net/v1/dotnet-install.sh | bash /dev/stdin --channel 10.0 --install-dir ~/.dotnet

export PATH="$HOME/.dotnet:$PATH"

dotnet --version  # Should print 10.0.x

For other installation methods (Windows, package managers), see the official .NET 10 documentation.

curl -fsSL https://ollama.com/install.sh | sh

ollama pull nomic-embed-text

Default Ollama endpoint: http://localhost:11434

  • Download and install LM Studiofor your platform - Open LM Studio and go to the Developer tab - Start the local server (default port: 1234)
  • Load an embedding model, e.g.: nomic-ai/nomic-embed-text-v1.5

sentence-transformers/all-MiniLM-L6-v2

Default LM Studio endpoint: http://localhost:1234

By default, the app uses auto-detect — you don't need to configure anything.

Just set:

{
  "Embedding": {
    "Provider": "auto"
  }
}

The app will automatically:

Check LM Studio first(faster, local UI) — port 1234** Fall back to Ollama**— port 11434** Pick whichever is available**with a loaded embedding model

Zero-config out-of-the-box— Install either LM Studio or Ollama, the app just works** Respects explicit choice**— Set"ollama"

or"lmstudio"

to pin a provider (fallback still works if that one is down)Transparent logging— The console tells you exactly which provider was chosen and why

Config First Try Fallback
auto
LM Studio Ollama
lmstudio
LM Studio Ollama
ollama
Ollama LM Studio

If neither provider is reachable, you'll get a clear error with installation instructions for both.

dotnet restore
dotnet build
dotnet test        # All 109 tests should pass
dotnet publish -c Release
./SemanticSourceCode --mode index --path ./src

./SemanticSourceCode --mode index --path /home/user/projects/MyApp
./SemanticSourceCode --mode watch --path ./src

Watch mode runs an initial full index, then keeps the process running and re-indexes the affected file automatically whenever a *.cs

file is created, changed, deleted, or renamed. The index stays fresh within ~500 ms of an edit, so searches in another shell always see the latest code.

Debounce— Multiple rapid saves to the same file are coalesced into a single re-index (default: 500 ms).** Excluded directories**—bin/

,obj/

,.git/

,.vs/

,.idea/

,node_modules/

,dist/

,build/

are ignored automatically.Stop— PressCtrl+C

to stop watching. The watcher exits cleanly, no leftover background tasks.

Example workflow:

./SemanticSourceCode --mode watch --path ./src

vim ./src/Services/MyService.cs   # → re-indexes automatically

./SemanticSourceCode --mode search

Interactive mode:

./SemanticSourceCode --mode search

Example queries:

  • "How do I find all files in a directory?"
  • "Database connection handling"
  • "Async HTTP client"
  • "User authentication"

Non-interactive (one-shot) mode:

./SemanticSourceCode --mode search --query "arithmetic calculation"

./SemanticSourceCode --mode search --query "arithmetic calculation" --format json

./SemanticSourceCode --mode search --query "Add" --quiet

./SemanticSourceCode --mode search -q "Add" -f json -l 2

./SemanticSourceCode --mode search -q "Query" --namespace MyApp.Data

The one-shot mode is perfect for scripts and agentic use:

Flag Description
--query, -q
The search query (triggers non-interactive mode)
--format, -f
text (default), json , or quiet
--limit, -l
Max results to display
--quiet
Shorthand for --format quiet
--namespace
Filter to chunks in this namespace
--class
Filter to chunks in this class
--http-method
Filter to controller methods with this verb
--file-pattern
Filter to files matching this glob

Exit codes (non-interactive only):

0

— at least one result found1

— no results, validation error, or DB not initialized

./SemanticSourceCode --mode mcp

The server speaks JSON-RPC 2.0 over stdin/stdout (MCP standard). It exposes two tools that AI agents can call directly:

Tool Description
search_code
Semantic search with optional namespace , class , filePattern , limit filters
get_chunk_by_id
Fetch a single indexed chunk by its semantic ID

Status messages go to stderr so the JSON-RPC channel on stdout stays clean for client parsing.

Example: project-local .mcp.json):

{
  "mcpServers": {
    "semantic-source-code": {
      "command": "SemanticSourceCode",
      "args": ["--mode", "mcp"]
    }
  }
}

After restarting the agent can call search_code

and get_chunk_by_id

directly in its tool-using workflow.

Edit appsettings.json

to switch providers. Use "auto"

(default) for zero-config behavior, or explicitly pin a provider.

{
  "Embedding": {
    "Provider": "auto"
  },
  "Ollama": {
    "BaseUrl": "http://localhost:11434",
    "EmbeddingModel": "nomic-embed-text"
  },
  "LMStudio": {
    "BaseUrl": "http://localhost:1234",
    "EmbeddingModel": "text-embedding-nomic-embed-text-v1.5"
  },
  "Database": {
    "Path": "codechunks.db"
  }
}
{
  "Embedding": {
    "Provider": "ollama"
  },
  "Ollama": {
    "BaseUrl": "http://localhost:11434",
    "EmbeddingModel": "nomic-embed-text"
  },
  "Database": {
    "Path": "codechunks.db"
  }
}
{
  "Embedding": {
    "Provider": "lmstudio"
  },
  "LMStudio": {
    "BaseUrl": "http://localhost:1234",
    "EmbeddingModel": "text-embedding-nomic-embed-text-v1.5"
  },
  "Database": {
    "Path": "codechunks.db"
  }
}
Section Key Default Description
Embedding
Provider
auto
Provider: auto , ollama , or lmstudio
Ollama
BaseUrl
http://localhost:11434
Ollama API endpoint
Ollama
EmbeddingModel
nomic-embed-text
Model name in Ollama
LMStudio
BaseUrl
http://localhost:1234
LM Studio API endpoint
LMStudio
EmbeddingModel
text-embedding-nomic-embed-text-v1.5
Model identifier for LM Studio
Database
Path
codechunks.db
SQLite database file path
Chunking
MaxChunkSize
1000
Maximum tokens per chunk
Chunking
OverlapTokens
100
Overlap between chunks

Search queries are automatically expanded with synonyms and related terms. You can customize this in appsettings.json

:

{
  "QueryExpansion": {
    "db": "database,sql,entity framework",
    "http": "web,api,rest,endpoint",
    "async": "asynchronous,task,background"
  }
}

Each C# class is split into separate chunks:

Methods— With signature, body and XML documentation** Properties**— Including getter/setter logic** Constructors**— Separate initialization logic** Fields**— With type and initialization

To improve search quality, the tool implements several techniques:

Each code chunk is enhanced with additional metadata to improve search relevance:

Class Name Boosting— Class names are repeated to increase their weight** Member Name Boosting**— Member names are emphasized for better matching** Framework Metadata**— Framework-specific terms are added for ASP.NET components

Search queries are automatically expanded with synonyms and related terms:

db

database

,sql

,entity framework

http

web

,api

,rest

,endpoint

async

asynchronous

,task

,background

sensor

ultrasonic

,distance

,color

,gyro

file

io

,read

,write

,stream

  • Uses the Ollama HTTP API ( /api/embeddings

) - Compatible with all Ollama embedding models

  • Default: nomic-embed-text

(768 dimensions) - Alternatives: mxbai-embed-large

,all-minilm

  • Uses the OpenAI-compatible HTTP API ( /v1/embeddings

) - Works with any model loaded in LM Studio

  • Default: text-embedding-nomic-embed-text-v1.5

  • Supports models from HuggingFace, GGUF, etc.

Cosine similarity implementation:

similarity = (A · B) / (||A|| × ||B||)

If you see:

No embedding provider available.

Make sure at least one of these is running:

LM Studio:

  • Open LM Studio and go to the Developer tab - Start the local server (toggle should be green)
  • Load an embedding model (e.g. nomic-embed-text-v1.5

) - Verify: curl http://localhost:1234/v1/models

Ollama:

ollama pull nomic-embed-text

ollama serve

curl http://localhost:11434/api/tags

The app is set to auto

by default, so it will pick whichever is available.

If you see:

LM Studio erreichbar, aber kein Modell geladen.

Go to the Developer tab in LM Studio, load an embedding model, and make sure the server is started.

curl http://localhost:11434/api/tags

ollama serve
  • Open LM Studio and go to the Developer tab - Ensure the server is started (toggle should be green)
  • Verify the port in appsettings.json

matches the displayed port - Test with: curl http://localhost:1234/v1/models

  • Make sure indexing completed successfully
  • Check codechunks.db

file size (should be > 0 bytes) - Use more specific search terms

  • Verify your embedding provider is running and the model is loaded

  • Embedding generation is CPU-intensive — expect slower performance on Raspberry Pi or low-power devices

  • The tool processes chunks sequentially (batch size: 1)

  • Consider using a machine with GPU support for faster embedding generation

We welcome contributions! Please see CONTRIBUTING.md for details.

MIT

── more in #developer-tools 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/semanticsourcecode-l…] indexed:0 read:11min 2026-06-14 ·