{"slug": "show-hn-model-due-diligence", "title": "Show HN: Model Due Diligence", "summary": "A developer released model-due-diligence, an open-source Python CLI tool that performs static supply-chain security checks on local AI model files and repositories before they are imported into runtimes like Ollama or llama.cpp. The tool scans for unsafe serialization, suspicious content, exposed secrets, and weak provenance, generating reports to help users identify obvious risks without loading or executing the models.", "body_md": "`model-due-diligence`\n\nis a Python command-line tool for performing **static supply-chain due diligence** on local AI model files and cloned model repositories before they are imported into runtimes such as Ollama, llama.cpp, LM Studio or Transformers.\n\nIt is designed to help answer one practical question:\n\n“Is there obvious static evidence that this model artefact or repository should not be trusted, loaded or run without further review?”\n\nIt reduces practical risk from unsafe serialisation, suspicious repository content, weak provenance, exposed secrets, unexpected binaries, unsafe dependency files and malformed model metadata.\n\nIt does **not** prove that a model is safe.\n\nA clean report means only that this tool did not identify the specific static artefact risks it is designed to detect. It must not be treated as proof that model weights, repository content, runtime behaviour or downstream use are benign.\n\n[What the tool does](#what-the-tool-does)[What the tool does not do](#what-the-tool-does-not-do)[Architecture](#architecture)[Scanner coverage](#scanner-coverage)[Risk scoring](#risk-scoring)[Install](#install)[Quick start](#quick-start)[CLI reference](#cli-reference)[Example workflows](#example-workflows)[Reports and outputs](#reports-and-outputs)[Recommended operating model](#recommended-operating-model)[Development workflow](#development-workflow)[Testing and quality gates](#testing-and-quality-gates)[Repository structure](#repository-structure)[Security posture](#security-posture)[Standards alignment](#standards-alignment)[Limitations](#limitations)[Roadmap](#roadmap)[Contributing](#contributing)[Licence](#licence)\n\n`model-due-diligence`\n\nstatically inspects a local path and generates reviewable evidence.\n\nIt checks:\n\n- file inventory, SHA-256 hashes, permissions and symlinks;\n- high-risk serialisation formats such as pickle,\n`.pt`\n\n,`.pth`\n\n,`.bin`\n\n,`.joblib`\n\nand H5; - lower-risk model formats such as\n`.gguf`\n\n,`.safetensors`\n\nand`.onnx`\n\n; - GGUF magic bytes and version metadata;\n- safetensors header metadata;\n- suspicious text and binary strings;\n- Python AST indicators such as\n`eval`\n\n,`exec`\n\n,`compile`\n\n,`pickle.loads`\n\n,`os.system`\n\nand`subprocess`\n\n; `trust_remote_code=True`\n\nusage in Python and text files;- risky pickle-like byte markers in high-risk serialisation formats;\n- high-entropy non-model files;\n- Git provenance, origin remote, current commit, dirty worktree and Git LFS listing where available;\n- external scanner output from ModelScan, Semgrep, Bandit, pip-audit and detect-secrets;\n- optional quality self-checks using Ruff, Pyright and mypy.\n\nThe tool produces:\n\n- a human-readable Markdown report;\n- a deterministic JSON report for automation;\n- an optional SARIF report for code-scanning workflows;\n- raw external scanner outputs where external tools are run.\n\nThe tool is intentionally static. During normal scanning it does **not**:\n\n- load model weights;\n- import untrusted repository code;\n- execute model-specific scripts;\n- run model inference;\n- send artefacts to external services;\n- require network access for local scanning;\n- decide automatically that a model is safe.\n\nStatic scanning cannot reliably detect:\n\n- malicious behaviour encoded directly into model weights;\n- sleeper-agent or trigger-based backdoors;\n- training-data poisoning;\n- benchmark-specific manipulation;\n- malicious behaviour that appears only after fine-tuning;\n- malicious behaviour that appears only after tools are connected;\n- prompt-injection obedience in downstream RAG or agent workflows;\n- data exfiltration behaviour that only appears at runtime;\n- vulnerabilities in local model runtimes;\n- all unsafe deserialisation evasions.\n\nUse it as a **risk-reduction gate**, not as a trust oracle.\n\nThe project uses a **modular monolith** architecture. This keeps installation and local execution simple while maintaining clear internal boundaries between CLI, orchestration, scanners, risk scoring and reports.\n\n``` php\nflowchart LR\n    user[User / CI] --> cli[CLI]\n    cli --> app[Application Orchestrator]\n    app --> inventory[File Inventory]\n    app --> native[Native Static Scanners]\n    app --> external[External Scanner Adapters]\n    app --> risk[Risk Scorer]\n    risk --> report_model[Audit Report Model]\n    app --> report_model\n    report_model --> markdown[Markdown Report]\n    report_model --> json[JSON Report]\n    report_model --> sarif[SARIF Report]\n\n    native --> text[Text Patterns]\n    native --> ast[Python AST]\n    native --> binary[Binary Strings]\n    native --> entropy[Entropy]\n    native --> metadata[Model Metadata]\n    native --> pickle[Pickle Heuristics]\n    native --> git[Git Provenance]\n\n    external --> modelscan[ModelScan]\n    external --> semgrep[Semgrep]\n    external --> bandit[Bandit]\n    external --> pipaudit[pip-audit]\n    external --> secrets[detect-secrets]\n    external --> quality[Quality Self-Checks]\nsequenceDiagram\n    participant U as User / CI\n    participant C as CLI\n    participant A as App\n    participant I as Inventory\n    participant N as Native Scanners\n    participant E as External Scanners\n    participant R as Risk Scorer\n    participant W as Report Writers\n\n    U->>C: mdd <target> --out <dir>\n    C->>C: Parse arguments and build ScanContext\n    C->>A: Run scan\n    A->>I: Build file inventory and hashes\n    I-->>A: FileRecord[] + Finding[]\n    A->>N: Run static native scanners\n    N-->>A: Finding[] + ModelMetadata[]\n    A->>E: Run optional external scanners\n    E-->>A: CommandResult[] + Finding[]\n    A->>R: Score findings and tool outcomes\n    R-->>A: Risk score + risk level\n    A-->>C: AuditReport\n    C->>W: Write Markdown / JSON / SARIF\n    C-->>U: Print risk score, risk level and report paths\n```\n\nDependencies should flow in one direction:\n\n``` php\ncli -> app -> domain\napp -> inventory\napp -> scanners\napp -> external\napp -> reporting\nscanners -> domain/config/utils\nexternal -> domain/config/command_runner\nreporting -> domain/config\n```\n\nRules:\n\n- scanners must not import\n`app`\n\n; - reporters must not run scanners;\n- external adapters must not write final project reports directly;\n- domain models must not depend on filesystem, subprocess or reporting modules;\n- native scanners must not execute model artefacts or repository code.\n\n| Coverage area | Native support | External support | Status |\n|---|---|---|---|\n| File inventory, hashes and permissions | Yes | No | Covered |\n| Symlink detection | Yes | No | Covered |\n| Executable/script detection | Yes | Semgrep / Bandit | Covered |\n| High-risk serialisation detection | Yes | ModelScan | Covered |\n| Pickle heuristic indicators | Yes | ModelScan | Covered |\n| GGUF header inspection | Yes | No | Basic coverage |\n| Safetensors header inspection | Yes | No | Basic coverage |\n| Suspicious text/code patterns | Yes | Semgrep / Bandit | Covered |\n| Python AST dangerous-call detection | Yes | Bandit / CodeQL | Covered |\n| Binary string indicators | Yes | No | Basic coverage |\n| High-entropy anomaly detection | Yes | No | Basic coverage |\n| Secrets detection | Yes | detect-secrets | Covered |\n| Dependency vulnerability checks | No | pip-audit / Dependabot | Covered for `requirements.txt` |\n| Git provenance checks | Yes | No | Basic coverage |\n| Project code quality | No | Ruff / Pyright / mypy / pytest | Covered |\n| Repository semantic security analysis | No | CodeQL | Covered in GitHub Actions |\n| SARIF output | Yes | CodeQL native SARIF | Partial |\n| SBOM generation | No | No | Planned |\n| Sigstore / SLSA provenance | No | No | Planned |\n| Licence compatibility checks | No | No | Planned |\n| Model-card quality checks | No | No | Planned |\n| Weight-level backdoor detection | No | No | Not reliably detectable |\n| Runtime behavioural testing | No | No | Planned separately |\n\nFindings are normalised into severities and converted into a bounded score from `0`\n\nto `100`\n\n.\n\n| Severity | Current score contribution |\n|---|---|\n| INFO | 0 |\n| LOW | 3 |\n| MEDIUM | 10 |\n| HIGH | 30 |\n| CRITICAL | 60 |\n\nExternal scanner non-zero exits can also contribute to the score when the tool was available and produced reviewable signals.\n\n| Risk level | Score range | Meaning | Recommended action |\n|---|---|---|---|\n| LOW | 0-29 | No obvious supported static artefact risks were found | Acceptable for sandboxed first run |\n| MEDIUM | 30-69 | Reviewable findings exist | Do not import until findings are understood |\n| HIGH | 70-89 | Material risk indicators exist | Do not load unless every finding is justified |\n| CRITICAL | 90-100 | Severe or multiple high-risk indicators exist | Treat as unsafe by default |\n\nThe score is intentionally conservative. It is a decision aid, not an automated trust verdict.\n\n- Python 3.11 or later;\n- Git;\n- a Unix-like shell for the provided scripts;\n- optional external scanner CLIs if you want full coverage.\n\n```\npython3 -m venv .venv\nsource .venv/bin/activate\npython -m pip install --upgrade pip\npython -m pip install -e \".[dev,scanners]\"\n```\n\nOr use the setup script:\n\n```\n./scripts/dev-setup.sh\n```\n\nFor a lighter install without optional scanner integrations:\n\n```\n./scripts/dev-setup.sh --no-scanners\nmdd --help\nmdd-ollama --help\nmodel-due-diligence --help\npython -m model_due_diligence --help\n```\n\nScan a cloned model repository:\n\n```\nmdd ./downloaded-model --out ./audit\n```\n\nScan a local GGUF file:\n\n```\nmdd ~/models/qwen.gguf --out ./audit-qwen\n```\n\nScan an installed Ollama model by name:\n\n```\nmdd-ollama qwen3:4b --out ./audit-qwen3-ollama\n```\n\nFail the command when the risk level is medium or above:\n\n```\nmdd ./downloaded-model --out ./audit --fail-on medium\n```\n\nRun a fast smoke scan without optional external tools:\n\n```\nmdd tests/fixtures/safe_repo \\\n  --out ./audit-smoke \\\n  --fail-on critical \\\n  --skip-external\n```\n\nGenerate only JSON output:\n\n```\nmdd ./downloaded-model \\\n  --out ./audit-json \\\n  --format json\nusage: model-due-diligence [-h] [--out OUT] [--timeout TIMEOUT]\n                           [--format FORMATS] [--skip-external]\n                           [--skip-modelscan] [--skip-semgrep]\n                           [--skip-bandit] [--skip-pip-audit]\n                           [--skip-detect-secrets]\n                           [--skip-quality-self-check]\n                           [--quality-self-check]\n                           [--fail-on {low,medium,high,critical}]\n                           [--version]\n                           target\n```\n\n| Argument | Description |\n|---|---|\n`target` |\nPath to a model file or model directory |\n`--out` |\nOutput report directory |\n`--timeout` |\nPer-tool timeout in seconds |\n`--format` |\nComma-separated report formats: `markdown,json,sarif` |\n`--skip-external` |\nSkip all optional external scanner tools |\n`--skip-modelscan` |\nSkip ModelScan only |\n`--skip-semgrep` |\nSkip Semgrep only |\n`--skip-bandit` |\nSkip Bandit only |\n`--skip-pip-audit` |\nSkip pip-audit only |\n`--skip-detect-secrets` |\nSkip detect-secrets only |\n`--quality-self-check` |\nRun Ruff, Pyright and mypy against this project as optional self-checks |\n`--skip-quality-self-check` |\nSkip quality self-checks |\n`--fail-on` |\nReturn non-zero when risk is at or above the selected level |\n`--version` |\nPrint package version |\n\n`mdd-ollama`\n\nresolves an installed Ollama model from the local\n`OLLAMA_MODELS`\n\nstore, stages scan-friendly filenames in a temporary directory,\nand then runs the normal static due-diligence flow on that staged directory.\n\nIt does not require the Ollama server to be running as long as the local manifest and blob store is present.\n\n```\nusage: mdd-ollama [-h] [--ollama-models-dir OLLAMA_MODELS_DIR] [--out OUT]\n                  [--timeout TIMEOUT] [--format FORMATS] [--skip-external]\n                  [--skip-modelscan] [--skip-semgrep] [--skip-bandit]\n                  [--skip-pip-audit] [--skip-detect-secrets]\n                  [--skip-quality-self-check] [--quality-self-check]\n                  [--fail-on {low,medium,high,critical}] [--keep-staged]\n                  model\n```\n\nTypical usage:\n\n```\nmdd-ollama llama3:8b --out ./audit-llama3\n```\n\nFor an uninstalled checkout, run it with:\n\n```\nPYTHONPATH=src python3 -m model_due_diligence.ollama_cli llama3:8b --out ./audit-llama3\n```\n\nUse the helper script:\n\n```\n./examples/audit-huggingface-clone.sh \\\n  https://huggingface.co/Qwen/Qwen3-8B-GGUF \\\n  ./audit-qwen3\n```\n\nThe script clones into a temporary directory, runs the scanner, writes reports to the output directory and removes the temporary clone afterwards.\n\n```\n./examples/audit-local-gguf.sh \\\n  ~/models/qwen3-8b-q4_k_m.gguf \\\n  ./audit-qwen3-gguf\n./examples/audit-installed-ollama.sh \\\n  qwen3:4b \\\n  ./audit-qwen3-ollama\n```\n\nA conservative CI smoke gate can run without optional external scanners:\n\n```\nmdd tests/fixtures/safe_repo \\\n  --out ./audit-smoke \\\n  --fail-on critical \\\n  --skip-external\n```\n\nA fuller CI gate can install scanner extras and run:\n\n```\nmdd ./downloaded-model \\\n  --out ./audit \\\n  --fail-on high\n```\n\nA normal run can produce:\n\n```\naudit/\n├── model_due_diligence_report.md\n├── model_due_diligence_report.json\n├── model_due_diligence_report.sarif\n├── modelscan.json\n├── semgrep.json\n├── bandit.json\n├── pip-audit-<hash>.json\n└── detect-secrets.json\n```\n\n| File | Purpose |\n|---|---|\n`model_due_diligence_report.md` |\nHuman-readable review report |\n`model_due_diligence_report.json` |\nMachine-readable deterministic report |\n`model_due_diligence_report.sarif` |\nStatic-analysis output suitable for code-scanning workflows |\n`modelscan.json` |\nRaw ModelScan output |\n`semgrep.json` |\nRaw Semgrep output |\n`bandit.json` |\nRaw Bandit output |\n`pip-audit-<hash>.json` |\nRaw pip-audit output per requirements file |\n`detect-secrets.json` |\nRaw detect-secrets output |\n\nGenerated audit outputs may contain local paths, hashes, snippets and scanner evidence. Do not commit them unless you have reviewed them for sensitive content.\n\nUse `model-due-diligence`\n\nas one control in a broader model supply-chain process:\n\n```\nOfficial or reputable source\n+ pinned commit or hash\n+ static due-diligence scan\n+ first run in a no-network sandbox\n+ no credentials mounted\n+ restricted filesystem access\n+ adversarial behavioural test suite\n+ runtime monitoring\n+ human review\n= reasonable practical risk reduction\n```\n\nRecommended practice:\n\n- Prefer official publisher repositories or reputable quantisers.\n- Avoid floating tags such as\n`latest`\n\nfor operational use. - Pin exact Git revisions and record SHA-256 hashes.\n- Run\n`model-due-diligence`\n\nbefore importing or loading artefacts. - Review all HIGH and CRITICAL findings manually.\n- Run first inference in a network-disabled container or VM.\n- Do not mount API keys, SSH keys, cloud credentials or client data.\n- Test prompt-injection and tool-use behaviour before RAG or agent deployment.\n- Keep reports and accepted hashes for reproducibility.\n\nSet up the environment:\n\n```\n./scripts/dev-setup.sh\nsource .venv/bin/activate\n```\n\nRun quality gates:\n\n```\n./scripts/run-quality.sh\n```\n\nRun tests:\n\n```\n./scripts/run-tests.sh\n```\n\nBuild the package:\n\n```\n./scripts/build-package.sh\n```\n\nBuild without running local checks first:\n\n```\n./scripts/build-package.sh --skip-checks\n```\n\nThe expected local quality gates are:\n\n```\nruff format --check src tests\nruff check src tests\npyright\nmypy src tests\npytest\nmdd tests/fixtures/safe_repo --out ./audit-smoke --fail-on critical --skip-external\n```\n\nThe helper script runs the same pattern:\n\n```\n./scripts/run-quality.sh\n```\n\nUse fix mode for Ruff formatting and safe lint fixes:\n\n```\n./scripts/run-quality.sh --fix\n```\n\nRun unit tests only:\n\n```\n./scripts/run-tests.sh --unit\n```\n\nRun integration tests only:\n\n```\n./scripts/run-tests.sh --integration\n```\n\nRun with coverage:\n\n```\n./scripts/run-tests.sh --coverage\nmodel-due-diligence/\n├── .github/\n│   ├── workflows/\n│   │   ├── ci.yml\n│   │   ├── codeql.yml\n│   │   └── release.yml\n│   ├── dependabot.yml\n│   └── pull_request_template.md\n├── docs/\n│   ├── architecture.md\n│   ├── contribution-guide.md\n│   ├── limitations.md\n│   ├── scanner-coverage.md\n│   ├── standards-alignment.md\n│   └── threat-model.md\n├── examples/\n│   ├── audit-installed-ollama.sh\n│   ├── audit-huggingface-clone.sh\n│   ├── audit-local-gguf.sh\n│   └── sample-report.md\n├── scripts/\n│   ├── build-package.sh\n│   ├── dev-setup.sh\n│   ├── run-quality.sh\n│   └── run-tests.sh\n├── src/model_due_diligence/\n│   ├── cli.py\n│   ├── app.py\n│   ├── config/\n│   ├── domain/\n│   ├── external/\n│   ├── inventory/\n│   ├── ollama.py\n│   ├── ollama_cli.py\n│   ├── reporting/\n│   ├── scanners/\n│   └── utils.py\n├── tests/\n│   ├── fixtures/\n│   ├── integration/\n│   └── unit/\n├── .env.example\n├── .gitattributes\n├── .gitignore\n├── .python-version\n├── LICENSE\n├── pyproject.toml\n└── README.md\n```\n\nThe project follows these design rules:\n\n- static by default;\n- no model execution during scanning;\n- no untrusted repository code execution during scanning;\n- no shell invocation for external scanner commands;\n- external tool failures are visible in reports;\n- findings include severity, category, file, message, evidence where available and recommendation;\n- missing scanners are reported rather than silently ignored;\n- generated reports are ignored by Git by default;\n- real model artefacts are ignored by Git by default;\n- dependency updates are managed through Dependabot;\n- CodeQL runs through GitHub Actions;\n- releases build source and wheel distributions and validate metadata before publishing.\n\nAn explicit control mapping for relevant NIST, MITRE, and OWASP guidance is in\n[docs/standards-alignment.md](/mmccalla/model-due-diligence/blob/main/docs/standards-alignment.md).\n\nA clean report does not mean a model is safe.\n\nStatic checks cannot reliably detect:\n\n- subtle weight-level backdoors;\n- sleeper-agent behaviour;\n- poisoned training data;\n- malicious behaviour activated by rare prompts;\n- malicious behaviour activated only through tool use;\n- all deserialisation evasions;\n- all obfuscated payloads;\n- prompt-injection obedience in downstream RAG or agent workflows;\n- runtime exfiltration behaviour;\n- vulnerabilities in Ollama, llama.cpp, LM Studio, Transformers or other runtimes.\n\nThis tool should not be the sole approval mechanism for regulated production deployment, client-data processing, internet-connected agentic systems, autonomous coding agents with write access, or systems with access to secrets or privileged infrastructure.\n\nPlanned or candidate improvements:\n\n- fuller GGUF metadata validation;\n- safetensors tensor offset and shape validation;\n- Hugging Face metadata retrieval using pinned revisions;\n- SBOM generation;\n- Sigstore or SLSA provenance checks;\n- licence compatibility checks;\n- model-card quality scoring;\n- SARIF upload workflow;\n- sandboxed behavioural test harness for local inference;\n- prompt-injection and tool-use behavioural tests for RAG and agent workloads.\n\nSee [ docs/contribution-guide.md](/mmccalla/model-due-diligence/blob/main/docs/contribution-guide.md).\n\nBefore opening a pull request, run:\n\n```\n./scripts/run-quality.sh\n```\n\nContributions should preserve the project’s core boundary: scanning must remain static by default and must not execute untrusted model artefacts or repository code.\n\nLicensed under the Apache License, Version 2.0. See [ LICENSE](/mmccalla/model-due-diligence/blob/main/LICENSE).", "url": "https://wpnews.pro/news/show-hn-model-due-diligence", "canonical_source": "https://github.com/mmccalla/model-due-diligence", "published_at": "2026-06-13 10:10:45+00:00", "updated_at": "2026-06-13 10:19:57.948217+00:00", "lang": "en", "topics": ["ai-safety", "developer-tools", "artificial-intelligence", "machine-learning", "mlops"], "entities": ["model-due-diligence", "Ollama", "llama.cpp", "LM Studio", "Transformers", "ModelScan", "Semgrep", "Bandit"], "alternates": {"html": "https://wpnews.pro/news/show-hn-model-due-diligence", "markdown": "https://wpnews.pro/news/show-hn-model-due-diligence.md", "text": "https://wpnews.pro/news/show-hn-model-due-diligence.txt", "jsonld": "https://wpnews.pro/news/show-hn-model-due-diligence.jsonld"}}