Using Model Context Protocol (MCP), dynamic schema discovery, and a secure SQL validation layer to query enterprise identity data with natural language.
Over the last few months, I’ve been experimenting with ways to make enterprise data more accessible through AI.
One challenge I kept running into was querying and reporting. In many enterprise identity platforms, valuable operational data is stored inside relational databases, but accessing that data typically requires knowledge of the schema, SQL expertise, and familiarity with the platform itself.
A seemingly simple question like: How many failed provisioning transactions occurred last week? can require searching through documentation, identifying the correct tables, understanding status values, and writing a custom SQL query.
** For example**, an operational team investigating provisioning failures, or a GRC team gathering evidence for an audit, would need to know that status data lives in a column with inconsistent values like “
Sometimes, going through the UI can take a significant amount of time to load when there are millions of records at the enterprise level.
At the same time, allowing an AI model unrestricted database access is not something most organizations would be comfortable with. This led me to explore a different approach: combining Model Context Protocol (MCP) with an enterprise identity database to enable natural language reporting while maintaining strict security controls.
The result was an MCP server capable of understanding a live database schema, generating read-only SQL queries, and returning meaningful answers through a conversational interface.
Model Context Protocol (MCP) has quickly become one of the most interesting developments in the AI ecosystem.
Before MCP, every AI application needed a custom integration for every external tool or data source.
If you had:
you quickly ended up maintaining dozens of integrations.
MCP changes this model by introducing a standard interface between AI systems and external resources.
Instead of building separate integrations for every combination, tools can expose capabilities through a common protocol and AI models can consume them consistently.
In this project, MCP became the bridge between Claude and an enterprise identity database.
The goal was straightforward:
Allow users to ask questions such as:
without requiring any SQL knowledge.
At the same time, the solution needed to satisfy three requirements:
The architecture consists of four major components:
The MCP server acts as the orchestration layer.
It exposes tools that Claude can invoke whenever it needs information from the database.
Examples include:
One of the biggest challenges with AI-generated SQL is schema awareness.
Most demonstrations rely on hardcoded table definitions.
That works initially, but quickly becomes difficult to maintain.
Instead, I implemented automatic schema discovery.
At startup, the server:
This allows the AI to work with the current database structure instead of a manually maintained copy.
Security was a non-negotiable requirement.
Every generated query passes through a validation layer before reaching the database.
Allowed:
SELECT ...WITH ...
Blocked:
INSERT ...UPDATE ...DELETE ...DROP ...ALTER ...TRUNCATE ...
Beyond SQL validation, every HTTP and webhook endpoint requires a separate authentication token, distinct from the AI provider’s API key. This ensures that even if someone discovers the server’s address, they cannot invoke any tool without proper authorization.
The validation layer also enforces row limits to prevent accidental large-scale queries.
The adapter layer abstracts database connectivity.
Instead of coupling the implementation to a single database technology, adapters provide a consistent interface.
This makes it possible to support:
through configuration rather than code changes.
One feature that proved particularly valuable was schema discovery.
Traditional AI-to-database integrations often fail because the model has incomplete knowledge of the schema.
For example, a user might ask: Show failed provisioning transactions.
The AI has no way of knowing:
unless that information is provided.
To solve this, the MCP server automatically inspects the database during startup.
Database |Read Tables |Read Columns |Read Data Types |Read Distinct Values |Generate AI Context
The generated metadata becomes part of the prompt supplied to the AI model.
As a result, the model understands:
before generating SQL.
This significantly improves accuracy.
When a user asks a question, several steps occur behind the scenes.
User Question | vMCP Tool | vSchema Context | vClaude Generates SQL | vSafety Validation | vDatabase Execution | vResponse Formatting
Let’s walk through an example.
User question: How many failed provisioning transactions occurred during the last seven days?
The MCP tool builds a prompt containing:
Claude then generates SQL.
The safety layer validates the query.
The database executes the query.
Results are formatted and returned to the user.
The entire process feels conversational while still leveraging structured database operations.
One aspect I wanted to preserve was flexibility.
The same MCP tools can be consumed through multiple transport mechanisms.
Using STDIO transport, Claude Desktop can communicate directly with the MCP server.
This provides a seamless conversational experience.
import { Server } from '@modelcontextprotocol/sdk/server/index.js';import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';import { CallToolRequestSchema, ListToolsRequestSchema,} from '@modelcontextprotocol/sdk/types.js';import { config } from '../config/config.js';import { createAdapter } from './adapters/adapter-factory.js';import { createAI } from './adapters/ai-factory.js';import { SchemaDiscovery } from './discovery/schema-discovery.js';import { SafetyGuard } from './discovery/safety.js';import { ToolRegistry } from './tools/registry.js';async function main() { console.error(`\n MCP DB Server starting.......`); console.error(` DB: ${config.database.type} / ${config.database.name || config.database.file}`); console.error(` AI: ${config.ai.provider} / ${config.ai.model}`); console.error(` Tools: ${config.tools.enabled.join(', ')}\n`); // Connect to database const adapter = await createAdapter(config.database); // Discover schema const schema = new SchemaDiscovery(adapter, config.schema); if (config.schema.autoDiscover) { await schema.build(); console.error(`[Schema] ${schema.getSummary()}`); } // Initialize AI provider const ai = await createAI(config.ai); // Safety guard const safety = new SafetyGuard(config.safety); // Shared context passed to all tools const context = { adapter, schema, ai, safety, safetyConfig: config.safety }; // Load tools const registry = new ToolRegistry(); await registry.load(config.tools.enabled, context); console.error(`[Tools] Active: ${registry.list().join(', ')}`); // Create MCP server const server = new Server( { name: config.server.name, version: config.server.version }, { capabilities: { tools: {} } } ); // List tools server.setRequestHandler(ListToolsRequestSchema, async () => ({ tools: registry.getDefinitions(), })); // Execute tool server.setRequestHandler(CallToolRequestSchema, async (request) => { const { name, arguments: args } = request.params; console.error(`[Tool] Calling: ${name}`, JSON.stringify(args)); try { return await registry.execute(name, args || {}); } catch (e) { console.error(`[Tool] Error in ${name}:`, e.message); return { content: [{ type: 'text', text: `Tool error: ${e.message}` }], isError: true, }; } }); // Connect transport const transport = new StdioServerTransport(); await server.connect(transport); console.error(`\n MCP server ready. Listening on stdio.\n`);}main().catch(e => { console.error('Fatal error:', e); process.exit(1);});
The MCP tools can also be exposed through HTTP endpoints.
This enables integration with:
Webhook support allows external systems to trigger database-driven AI analysis automatically.
In the current implementation, the triggering system either supplies the question directly in the payload, or the server applies simple pattern matching against known event shapes (such as a GitHub pull request payload) to infer one. A more mature version of this would let administrators define question templates per event type, rather than relying on inferred defaults.
For users who prefer a browser-based experience, the same functionality can be exposed through a lightweight web interface.
Because all transports use the same underlying tools, the business logic remains consistent across every access method.
import 'dotenv/config';import express from 'express';import cors from 'cors';import { dirname, join } from 'path';import { fileURLToPath } from 'url';import { createAdapter } from './adapters/adapter-factory.js';import { createAI } from './adapters/ai-factory.js';import { SchemaDiscovery } from './discovery/schema-discovery.js';import { SafetyGuard } from './discovery/safety.js';import { ToolRegistry } from './tools/registry.js';const __dirname = dirname(fileURLToPath(import.meta.url));// Build config directly from process.envfunction getConfig() { return { database: { type: process.env.DB_TYPE || 'mysql', host: process.env.DB_HOST || 'localhost', port: parseInt(process.env.DB_PORT || '3306'), name: process.env.DB_NAME || 'identityiq', user: process.env.DB_USER || 'root', password: process.env.DB_PASSWORD || '', file: process.env.DB_FILE || './data.db', tablePrefix: process.env.DB_TABLE_PREFIX || '', connectionLimit: 10, }, ai: { provider: process.env.AI_PROVIDER || 'anthropic', model: process.env.AI_MODEL || 'claude-sonnet-4-5', apiKey: process.env.ANTHROPIC_API_KEY || process.env.OPENAI_API_KEY || '', maxTokens: parseInt(process.env.AI_MAX_TOKENS || '2048'), }, tools: { enabled: (process.env.MCP_TOOLS || 'query_database,describe_schema,sample_data,search_records') .split(',').map(t => t.trim()), }, schema: { autoDiscover: process.env.SCHEMA_AUTO_DISCOVER !== 'false', maxTablesInContext: parseInt(process.env.SCHEMA_MAX_TABLES || '50'), enumColumnNames: (process.env.SCHEMA_ENUM_COLUMNS || 'status,state,type,phase,action,op,operation,source,level,completion_status,result,severity,outcome').split(','), maxEnumValues: parseInt(process.env.SCHEMA_MAX_ENUM_VALUES || '15'), skipColumns: (process.env.SCHEMA_SKIP_COLUMNS || 'attributes,xml,extended_attributes,scorecard,preferences,arguments,config').split(','), }, safety: { allowedStatements: ['SELECT', 'WITH'], blockedKeywords: ['INSERT', 'UPDATE', 'DELETE', 'DROP', 'ALTER', 'CREATE', 'TRUNCATE'], maxRowsReturned: parseInt(process.env.MAX_ROWS || '500'), }, };}// Authconst API_TOKEN = process.env.API_TOKEN || 'XXXXXXXX';function requireAuth(req, res, next) { const header = req.headers['authorization'] || ''; const token = header.startsWith('Bearer ') ? header.slice(7) : req.query.token; if (!token || token !== API_TOKEN) { return res.status(401).json({ error: 'Unauthorized. Pass: Authorization: Bearer <API_TOKEN>' }); } next();}// Webhook verifierfunction verifyWebhook(req, res, next) { const secret = process.env.WEBHOOK_SECRET; if (!secret) return next(); const sig = req.headers['x-hub-signature-256'] || req.headers['x-webhook-signature'] || ''; if (!sig) return res.status(400).json({ error: 'Missing webhook signature' }); import('crypto').then(({ createHmac }) => { const expected = 'sha256=' + createHmac('sha256', secret).update(JSON.stringify(req.body)).digest('hex'); if (sig !== expected) return res.status(403).json({ error: 'Invalid webhook signature' }); next(); });}// Webhook question extractorfunction extractQuestionFromWebhook(payload) { if (payload.pull_request) return `Show recent provisioning activity for user ${payload.pull_request.user?.login}`; if (payload.issue) return `Show recent failed provisioning transactions`; return payload.question || payload.text || payload.message || payload.query || null;}// Mainasync function startHTTPServer() { console.log('\n MCP HTTP Server starting...'); const config = getConfig(); // Verify key is loaded console.log(`[ENV] ANTHROPIC_API_KEY loaded: ${config.ai.apiKey ? 'YES (' + config.ai.apiKey.slice(0,15) + '...)' : 'NO ← CHECK YOUR .env FILE'}`); console.log(`[ENV] DB_PASSWORD loaded: ${config.database.password ? 'YES' : 'NO (empty)'}`); const adapter = await createAdapter(config.database); const schema = new SchemaDiscovery(adapter, config.schema); await schema.build(); const ai = await createAI(config.ai); const safety = new SafetyGuard(config.safety); const context = { adapter, schema, ai, safety, safetyConfig: config.safety }; const registry = new ToolRegistry(); await registry.load(config.tools.enabled, context); console.log(`[Tools] Active: ${registry.list().join(', ')}`); const app = express(); app.use(cors()); app.use(express.json()); // Serve UI app.use(express.static(join(__dirname, '../ui'))); app.get('/', (req, res) => res.sendFile(join(__dirname, '../ui/index.html'))); // Health check app.get('/health', async (req, res) => { try { await adapter.ping(); res.json({ status: 'ok', db: `${config.database.type}/${config.database.name}`, ai: `${config.ai.provider}/${config.ai.model}`, tools: registry.list(), schema: schema.getSummary() }); } catch (e) { res.status(500).json({ status: 'error', message: e.message }); } }); // List tools app.get('/api/tools', requireAuth, (req, res) => res.json({ tools: registry.getDefinitions() })); // Asking in the plain English app.post('/api/ask', requireAuth, async (req, res) => { const { question, explain = false } = req.body; if (!question?.trim()) return res.status(400).json({ error: 'question is required' }); try { const result = await registry.execute('query_database', { question, explain }); res.json({ answer: result.content?.[0]?.text || '', isError: result.isError || false }); } catch (e) { res.status(500).json({ error: e.message }); } }); // Call any tool directly app.post('/api/tools/:toolName', requireAuth, async (req, res) => { const { toolName } = req.params; if (!registry.has(toolName)) return res.status(404).json({ error: `Tool "${toolName}" not found`, available: registry.list() }); try { const result = await registry.execute(toolName, req.body || {}); res.json({ result: result.content?.[0]?.text, isError: result.isError || false }); } catch (e) { res.status(500).json({ error: e.message }); } }); // Schema app.get('/api/schema', requireAuth, async (req, res) => { const result = await registry.execute('describe_schema', { summary: req.query.summary === 'true' }); res.json({ schema: result.content?.[0]?.text }); }); // Webhooks app.post('/api/webhook', verifyWebhook, async (req, res) => { const question = extractQuestionFromWebhook(req.body); if (!question) return res.status(400).json({ error: 'Add a "question" field to your webhook payload.' }); console.log(`[Webhook] ${question}`); try { const result = await registry.execute('query_database', { question }); res.json({ question, answer: result.content?.[0]?.text || '', timestamp: new Date().toISOString() }); } catch (e) { res.status(500).json({ error: e.message }); } }); const PORT = process.env.HTTP_PORT || 3344; app.listen(PORT, () => { console.log(`\n HTTP server ready`); console.log(`\n Open UI: http://localhost:${PORT}`); console.log(` Token: ${API_TOKEN}`); console.log(`\n POST /api/ask → { "question": "how many identities?" }`); });}startHTTPServer().catch(e => { console.error('Fatal:', e.message); process.exit(1); });
Whenever AI interacts with enterprise data, security becomes a critical topic.
Several safeguards were included.
Only SELECT statements are allowed.
All modification commands are blocked.
Generated SQL is inspected before execution.
Automatic limits prevent excessive data retrieval.
The AI can only interact with capabilities explicitly exposed through MCP tools.
This significantly reduces risk compared to granting unrestricted database access.
Generated SQL | vValidation Layer | +--> Dangerous Command? | | | Yes | | | Reject | vExecute Query
Calling MySQL database from Claude Desktop
Calling MySQL database using MCP from External UI
Several lessons emerged during development.
Providing accurate schema information had a greater impact on query quality than switching between models.
AI-generated SQL must always be validated before execution. Trusting generated queries blindly is not a viable enterprise strategy.
The protocol provides a clean separation between AI models and enterprise systems. This makes the overall architecture easier to maintain and extend.
Automatically learning the schema eliminates a significant maintenance burden. As databases evolve, the AI remains aligned with reality.
This project started as an experiment in natural language reporting, but it opens several interesting possibilities.
Future enhancements could include:
The foundation already exists. The next step is expanding the intelligence available to the AI while preserving security and governance controls.
Connecting AI systems to enterprise databases is not a new idea.
Doing it safely, accurately, and in a way that remains maintainable over time is the real challenge.
By combining MCP, automatic schema discovery, a tool-based architecture, and a strict SQL validation layer, it becomes possible to create a natural language interface that can answer meaningful operational questions without exposing the underlying complexity of the database.
For me, the most interesting takeaway was not that AI could generate SQL.
It was that, with the right context and guardrails, AI could make enterprise data significantly more accessible while still respecting the security requirements that these environments demand.
How I Connected Claude MCP to an Enterprise Identity Database for Natural Language Querying &… was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.