# SemanticSourceCode – Local semantic code search with Ollama and SQLite

> Source: <https://github.com/TheEifelYeti/SemanticSourceCode>
> Published: 2026-06-14 08:05:18+00:00

A C# tool for semantic code search with local embeddings. Search your codebase by meaning, not just keywords.

- 🔍
**Semantic Chunking**— Analyzes C# classes, methods, properties, constructors and fields separately - 🧠
**Local Embeddings**— Uses Ollama or LM Studio locally, no cloud dependency, no data leakage - 💾
**SQLite Vector Database**— Simple embedded database with cosine similarity search - 🔎
**Semantic Search**— Find code based on meaning, not just keywords - 👀
**Watch Mode**— Live incremental re-indexing on file changes (500 ms debounce, Ctrl+C to stop) - 🔌
**MCP Server**— Expose the search as a Model Context Protocol tool. - 📜
**Scriptable Search**— Non-interactive one-shot mode with`--query`

for pipes, scripts and agentic use - ⚡
**Multiple Providers**— Switch between Ollama and LM Studio via configuration - 🚀
**Enhanced Search Quality**— Content boosting and query expansion for better results - 🏷️
**Framework Detection**— Automatic detection of ASP.NET Controllers, Services and Middleware - 📊
**Call Graph Analysis**— Track method calls and dependencies between code chunks

The search engine combines semantic similarity with keyword matching:

**Semantic Score**— Cosine similarity of embeddings (weight: 0.7)** Keyword Score**— Matches in class names, member names, and content (weight: 0.3)** Combined**—`hybrid_score = 0.7 * semantic + 0.3 * keyword`

This ensures that exact keyword matches (e.g., `class DatabaseService`

) are not overshadowed by semantically similar but structurally irrelevant results.

Narrow down search results with structural filters:

```
# Only search in controllers
./SemanticSourceCode --mode search --namespace Api.Controllers --http-method GET

# Only search in specific class
./SemanticSourceCode --mode search --class DatabaseService

# File path pattern
./SemanticSourceCode --mode search --file-pattern "*/Controllers/*"
```

Available filters:

| Filter | CLI Flag | Description |
|---|---|---|
| Namespace | `--namespace` |
Match namespace name (exact or partial) |
| Class | `--class` |
Match class name |
| HTTP Method | `--http-method` |
Match HTTP method (GET, POST, etc.) |
| File Pattern | `--file-pattern` |
Match file path (glob pattern) |

When no strong matches are found, the engine suggests alternative queries based on Levenshtein distance to known class and member names:

```
> DataBase
Do you mean: DatabaseService?
```

Suggestions are computed from the indexed codebase and require no external dependencies.

The similarity threshold adjusts automatically based on:

**Score Distribution**— Percentile-based analysis of result scores** Gap Detection**— Elbow method to find natural cutoffs** Query Specificity**— Shorter queries get lower thresholds (generic), longer queries get higher thresholds (specific)

Configure in `appsettings.json`

:

```
{
  "Search": {
    "AdaptiveThreshold": {
      "Enabled": true,
      "FloorThreshold": 0.30,
      "CeilingThreshold": 0.85,
      "Percentile": 70
    }
  }
}
```

Results are re-ranked using structural boosts:

| Signal | Boost | Description |
|---|---|---|
| ClassName Match | ×1.3 | Query matches class name |
| MemberName Match | ×1.0 | Query matches member name |
| Controller | ×1.1 | ASP.NET Controller detected |
| Service | ×1.1 | Service class detected |
| Middleware | ×1.1 | Middleware class detected |
| Documentation | ×1.05 | Has XML documentation |
| Small File | ×0.9 | Penalty for very small files (often helpers) |

All search features can be configured in `appsettings.json`

:

```
{
  "Search": {
    "MinimumSimilarity": 0.35,
    "TopK": 20,
    "DisplayCount": 5,
    "WeakMatchThreshold": 0.30,
    "Hybrid": {
      "SemanticWeight": 0.7,
      "KeywordWeight": 0.3
    },
    "AdaptiveThreshold": {
      "Enabled": true,
      "FloorThreshold": 0.30,
      "CeilingThreshold": 0.85,
      "Percentile": 70
    },
    "ReRanking": {
      "ClassNameBoost": 1.3,
      "MemberNameBoost": 1.0,
      "ControllerBoost": 1.1,
      "DocumentationBoost": 1.05
    }
  }
}
┌─────────────────┐      ┌──────────────────┐
│  C# Files       │ ───> │   CodeAnalyzer   │ (Roslyn)
└─────────────────┘      └────────┬─────────┘
                                 │ CodeChunks
                                 v
                        ┌──────────────────┐
                        │ EmbeddingProvider│ (Ollama/LM Studio)
                        └────────┬─────────┘
                                 │ float[]
                                 v
                        ┌──────────────────┐
                        │ SqliteVssDatabase│ (vec0)
                        └────────┬─────────┘
                                 │
                                 v
                        ┌──────────────────┐
                        │ SearchEngine     │ (Cosine Sim)
                        └──────────────────┘
```

| Components | Responsibility | File |
|---|---|---|
| CodeAnalyzer | Roslyn-based code decomposition | Services/CodeAnalyzer.cs |
| IEmbeddingService | Provider abstraction | Services/IEmbeddingService.cs |
| EmbeddingServiceFactory | Auto-detect provider | Services/EmbeddingServiceFactory.cs |
| IVectorDatabase | Vector storage with cosine similarity | Services/IVectorDatabase.cs |
| SqliteVssDatabase | SQLite + vec0 implementation | Services/SqliteVssDatabase.cs |
| HybridSearchService | Combines semantic + keyword search | Search/HybridSearchService.cs |
| ResultRanker | Re-ranking with structural signals | Search/ResultRanker.cs |
| QuerySuggester | Levenshtein-based suggestions | Search/QuerySuggester.cs |
| AdaptiveThreshold | Dynamic similarity threshold | Search/AdaptiveThreshold.cs |
| SearchFilter | Context filters (namespace, class, etc.) | Search/SearchFilter.cs |
| QueryExpander | Synonym expansion | Search/QueryExpander.cs |
| CodeChunk | Data model | Models/CodeChunk.cs |

[.NET 10.0 SDK](https://dotnet.microsoft.com/download/dotnet/10.0)- Either
[Ollama](https://ollama.com)or[LM Studio](https://lmstudio.ai)(locally installed)

```
# Using the dotnet-install script
curl -sSL https://dot.net/v1/dotnet-install.sh | bash /dev/stdin --channel 10.0 --install-dir ~/.dotnet

# Add to PATH
export PATH="$HOME/.dotnet:$PATH"

# Verify version
dotnet --version  # Should print 10.0.x
```

For other installation methods (Windows, package managers), see the [official .NET 10 documentation](https://learn.microsoft.com/en-us/dotnet/core/install/).

```
# Install Ollama (Linux/macOS)
curl -fsSL https://ollama.com/install.sh | sh

# Pull an embedding model
ollama pull nomic-embed-text
```

Default Ollama endpoint: `http://localhost:11434`

- Download and install
[LM Studio](https://lmstudio.ai)for your platform - Open LM Studio and go to the
**Developer** tab - Start the local server (default port: 1234)
- Load an embedding model, e.g.:
`nomic-ai/nomic-embed-text-v1.5`

`sentence-transformers/all-MiniLM-L6-v2`

Default LM Studio endpoint: `http://localhost:1234`

By default, the app uses **auto-detect** — you don't need to configure anything.

Just set:

```
{
  "Embedding": {
    "Provider": "auto"
  }
}
```

The app will automatically:

**Check LM Studio first**(faster, local UI) — port 1234** Fall back to Ollama**— port 11434** Pick whichever is available**with a loaded embedding model

**Zero-config out-of-the-box**— Install either LM Studio or Ollama, the app just works** Respects explicit choice**— Set`"ollama"`

or`"lmstudio"`

to pin a provider (fallback still works if that one is down)**Transparent logging**— The console tells you exactly which provider was chosen and why

| Config | First Try | Fallback |
|---|---|---|
`auto` |
LM Studio | Ollama |
`lmstudio` |
LM Studio | Ollama |
`ollama` |
Ollama | LM Studio |

If neither provider is reachable, you'll get a clear error with installation instructions for both.

```
dotnet restore
dotnet build
dotnet test        # All 109 tests should pass
dotnet publish -c Release
# Index C# files in a directory
./SemanticSourceCode --mode index --path ./src

# Example with absolute path
./SemanticSourceCode --mode index --path /home/user/projects/MyApp
# Start watch mode on a directory
./SemanticSourceCode --mode watch --path ./src
```

Watch mode runs an initial full index, then keeps the process running and
re-indexes the affected file automatically whenever a `*.cs`

file is
created, changed, deleted, or renamed. The index stays fresh within
~500 ms of an edit, so searches in another shell always see the latest code.

**Debounce**— Multiple rapid saves to the same file are coalesced into a single re-index (default: 500 ms).** Excluded directories**—`bin/`

,`obj/`

,`.git/`

,`.vs/`

,`.idea/`

,`node_modules/`

,`dist/`

,`build/`

are ignored automatically.**Stop**— Press`Ctrl+C`

to stop watching. The watcher exits cleanly, no leftover background tasks.

Example workflow:

```
# Terminal 1: start watching
./SemanticSourceCode --mode watch --path ./src

# Terminal 2: edit a file
vim ./src/Services/MyService.cs   # → re-indexes automatically

# Terminal 3: search while watching
./SemanticSourceCode --mode search
```

**Interactive mode:**

```
# Start interactive search mode
./SemanticSourceCode --mode search
```

Example queries:

- "How do I find all files in a directory?"
- "Database connection handling"
- "Async HTTP client"
- "User authentication"

**Non-interactive (one-shot) mode:**

```
# Default (text format) — prints human-readable results, exits
./SemanticSourceCode --mode search --query "arithmetic calculation"

# JSON output — for piping into jq, scripts, or other tools
./SemanticSourceCode --mode search --query "arithmetic calculation" --format json

# Quiet output — only the top-1 result, one line
./SemanticSourceCode --mode search --query "Add" --quiet

# Short flags
./SemanticSourceCode --mode search -q "Add" -f json -l 2

# With structural filter
./SemanticSourceCode --mode search -q "Query" --namespace MyApp.Data
```

The one-shot mode is perfect for scripts and agentic use:

| Flag | Description |
|---|---|
`--query, -q` |
The search query (triggers non-interactive mode) |
`--format, -f` |
`text` (default), `json` , or `quiet` |
`--limit, -l` |
Max results to display |
`--quiet` |
Shorthand for `--format quiet` |
`--namespace` |
Filter to chunks in this namespace |
`--class` |
Filter to chunks in this class |
`--http-method` |
Filter to controller methods with this verb |
`--file-pattern` |
Filter to files matching this glob |

**Exit codes** (non-interactive only):

`0`

— at least one result found`1`

— no results, validation error, or DB not initialized

```
# Start the MCP server over stdio
./SemanticSourceCode --mode mcp
```

The server speaks **JSON-RPC 2.0** over **stdin/stdout** (MCP standard). It
exposes two tools that AI agents can
call directly:

| Tool | Description |
|---|---|
`search_code` |
Semantic search with optional `namespace` , `class` , `filePattern` , `limit` filters |
`get_chunk_by_id` |
Fetch a single indexed chunk by its semantic ID |

Status messages go to **stderr** so the JSON-RPC channel on **stdout**
stays clean for client parsing.

**Example:
project-local .mcp.json):**

```
{
  "mcpServers": {
    "semantic-source-code": {
      "command": "SemanticSourceCode",
      "args": ["--mode", "mcp"]
    }
  }
}
```

After restarting the agent can call `search_code`

and
`get_chunk_by_id`

directly in its tool-using workflow.

Edit `appsettings.json`

to switch providers. Use `"auto"`

(default) for zero-config behavior, or explicitly pin a provider.

```
{
  "Embedding": {
    "Provider": "auto"
  },
  "Ollama": {
    "BaseUrl": "http://localhost:11434",
    "EmbeddingModel": "nomic-embed-text"
  },
  "LMStudio": {
    "BaseUrl": "http://localhost:1234",
    "EmbeddingModel": "text-embedding-nomic-embed-text-v1.5"
  },
  "Database": {
    "Path": "codechunks.db"
  }
}
{
  "Embedding": {
    "Provider": "ollama"
  },
  "Ollama": {
    "BaseUrl": "http://localhost:11434",
    "EmbeddingModel": "nomic-embed-text"
  },
  "Database": {
    "Path": "codechunks.db"
  }
}
{
  "Embedding": {
    "Provider": "lmstudio"
  },
  "LMStudio": {
    "BaseUrl": "http://localhost:1234",
    "EmbeddingModel": "text-embedding-nomic-embed-text-v1.5"
  },
  "Database": {
    "Path": "codechunks.db"
  }
}
```

| Section | Key | Default | Description |
|---|---|---|---|
`Embedding` |
`Provider` |
`auto` |
Provider: `auto` , `ollama` , or `lmstudio` |
`Ollama` |
`BaseUrl` |
`http://localhost:11434` |
Ollama API endpoint |
`Ollama` |
`EmbeddingModel` |
`nomic-embed-text` |
Model name in Ollama |
`LMStudio` |
`BaseUrl` |
`http://localhost:1234` |
LM Studio API endpoint |
`LMStudio` |
`EmbeddingModel` |
`text-embedding-nomic-embed-text-v1.5` |
Model identifier for LM Studio |
`Database` |
`Path` |
`codechunks.db` |
SQLite database file path |
`Chunking` |
`MaxChunkSize` |
`1000` |
Maximum tokens per chunk |
`Chunking` |
`OverlapTokens` |
`100` |
Overlap between chunks |

Search queries are automatically expanded with synonyms and related terms. You can customize this in `appsettings.json`

:

```
{
  "QueryExpansion": {
    "db": "database,sql,entity framework",
    "http": "web,api,rest,endpoint",
    "async": "asynchronous,task,background"
  }
}
```

Each C# class is split into separate chunks:

**Methods**— With signature, body and XML documentation** Properties**— Including getter/setter logic** Constructors**— Separate initialization logic** Fields**— With type and initialization

To improve search quality, the tool implements several techniques:

Each code chunk is enhanced with additional metadata to improve search relevance:

**Class Name Boosting**— Class names are repeated to increase their weight** Member Name Boosting**— Member names are emphasized for better matching** Framework Metadata**— Framework-specific terms are added for ASP.NET components

Search queries are automatically expanded with synonyms and related terms:

`db`

→`database`

,`sql`

,`entity framework`

`http`

→`web`

,`api`

,`rest`

,`endpoint`

`async`

→`asynchronous`

,`task`

,`background`

`sensor`

→`ultrasonic`

,`distance`

,`color`

,`gyro`

`file`

→`io`

,`read`

,`write`

,`stream`

- Uses the Ollama HTTP API (
`/api/embeddings`

) - Compatible with all Ollama embedding models
- Default:
`nomic-embed-text`

(768 dimensions) - Alternatives:
`mxbai-embed-large`

,`all-minilm`

- Uses the OpenAI-compatible HTTP API (
`/v1/embeddings`

) - Works with any model loaded in LM Studio
- Default:
`text-embedding-nomic-embed-text-v1.5`

- Supports models from HuggingFace, GGUF, etc.

Cosine similarity implementation:

```
similarity = (A · B) / (||A|| × ||B||)
```

If you see:

```
No embedding provider available.
```

Make sure at least one of these is running:

**LM Studio:**

- Open LM Studio and go to the
**Developer** tab - Start the local server (toggle should be green)
- Load an embedding model (e.g.
`nomic-embed-text-v1.5`

) - Verify:
`curl http://localhost:1234/v1/models`

**Ollama:**

```
# Pull an embedding model
ollama pull nomic-embed-text

# Ensure Ollama is running
ollama serve

# Verify
curl http://localhost:11434/api/tags
```

The app is set to `auto`

by default, so it will pick whichever is available.

If you see:

```
LM Studio erreichbar, aber kein Modell geladen.
```

Go to the **Developer** tab in LM Studio, load an embedding model, and make sure the server is started.

```
# Check if Ollama is running
curl http://localhost:11434/api/tags

# Start Ollama
ollama serve
```

- Open LM Studio and go to the
**Developer** tab - Ensure the server is started (toggle should be green)
- Verify the port in
`appsettings.json`

matches the displayed port - Test with:
`curl http://localhost:1234/v1/models`

- Make sure indexing completed successfully
- Check
`codechunks.db`

file size (should be > 0 bytes) - Use more specific search terms
- Verify your embedding provider is running and the model is loaded

- Embedding generation is CPU-intensive — expect slower performance on Raspberry Pi or low-power devices
- The tool processes chunks sequentially (batch size: 1)
- Consider using a machine with GPU support for faster embedding generation

We welcome contributions! Please see [CONTRIBUTING.md](/TheEifelYeti/SemanticSourceCode/blob/main/CONTRIBUTING.md) for details.

- Report bugs via
[GitHub Issues](https://github.com/YOUR_USERNAME/SemanticSourceCode/issues) - Request features via
[GitHub Discussions](https://github.com/YOUR_USERNAME/SemanticSourceCode/discussions) - Submit pull requests following our PR template

MIT