AI Smart Contract Review: The Finding Is Not the Audit

AI smart contract review fails when teams treat a model's output as a completed audit, according to a developer's analysis. The useful version of AI-assisted review is narrower: a large language model can flag suspicious code, but that finding must survive tool evidence, execution path verification, standard requirement checks, and human review before it qualifies as an audit result. The developer argues that the strongest pattern pairs model inference with static analysis to confirm or filter claims, rather than leaving the model as the sole authority.

Disclosure: AI tools were used for source collection and editorial review. The article was written by a human author, who checked the facts, code, and conclusions. Crypto risk disclosure: This article is a technical explanation, not investment advice. It is not a recommendation to buy, sell or hold any cryptoasset. AI Smart Contract Review fails when a team treats a model sentence as an audit conclusion. The useful version of AI Smart Contract Review is narrower: the model can point at suspicious code, but the finding has to survive tool evidence, an execution path, a standard requirement, and human review before anyone calls it an audit result. The practical trap is not that models are always wrong. Papers such as GPTScan https://arxiv.org/abs/2308.03314 , iAudit https://arxiv.org/abs/2403.16073 , and Smart-LLaMA https://arxiv.org/abs/2411.06221 all support some model-assisted value. The problem is that useful triage is not the same claim as complete security review. The first boundary in AI Smart Contract Review sits between "the model noticed something" and "the contract has an exploitable issue." That boundary matters because a model can explain a familiar vulnerability pattern while missing the deployment context, external call path, storage layout, or economic condition that makes the issue real. Ince et al.'s 2025 survey https://arxiv.org/abs/2504.04685 is a good starting constraint because the survey treats large-language-model vulnerability detection as promising but not ready to replace traditional tools. AI Smart Contract Review should inherit that caution: a model finding is a lead, not a sign-off. The useful version of AI Smart Contract Review records how the finding failed. The artifact below is deliberately small because the audit decision needs a compact place to separate a model claim, tool evidence, missed context, and human review. | Review aid | What it can catch | False positive shape | False negative shape | Human audit decision | |---|---|---|---|---| | LLM review | Familiar vulnerability pattern, suspicious control flow, missing check explanation | Model labels unreachable or mitigated code as exploitable | Model misses business logic, protocol economics, or hidden state coupling | Confirm exploit path, impact, and remediation before treating it as a finding | | This table is the article's main artifact. AI Smart Contract Review protects review time when the table forces every model claim into "confirmed," "false positive," "missed by tool," or "needs manual threat-model review." The strongest AI Smart Contract Review pattern does not leave the model alone. GPTScan https://arxiv.org/abs/2308.03314 supports the hybrid idea: use a model to infer likely scenarios, then use static analysis to help confirm or filter the claim. That hybrid design is useful precisely because it weakens the model's authority. AI Smart Contract Review should say "the model proposed this, and static evidence confirmed part of it," not "the model audited the contract." A second AI Smart Contract Review boundary separates a correct label from a correct reason. iAudit https://arxiv.org/abs/2403.16073 is useful here because the reviewer's research summary noted a gap between headline metrics and reason agreement, including low agreement of reasons against the authors' reference. That limitation changes the workflow. AI Smart Contract Review should not accept a model's vulnerability name unless the reason names the code path, attacker capability, state precondition, and asset impact that a reviewer can check. model claim: label: reentrancy reason: external call before balance update audit record: execution path: pending affected asset: pending attacker capability: pending tool evidence: slither reentrancy warning standard requirement: SCSVS-ARCH decision: needs human review This record is intentionally boring. AI Smart Contract Review should make uncertainty visible instead of letting a confident model paragraph become a security decision. Older tools still matter inside AI Smart Contract Review. Slither https://github.com/crytic/slither describes itself as a static-analysis framework for Solidity and Vyper, with vulnerability detectors, confidence/impact categories, CI integration, and checklist output. That makes Slither useful evidence, not a final verdict. AI Smart Contract Review should treat a Slither hit as a concrete signal to inspect: where is the condition, is the path reachable, what value is affected, and did the model explain the same thing or only match the vulnerability name? Symbolic execution gives AI Smart Contract Review another boundary, not a magic proof. Mythril https://github.com/ConsenSysDiligence/mythril is valuable because symbolic execution can expose common EVM vulnerability classes, but bounded execution still lives inside assumptions about time, path depth, environment, and state space. That limit is useful for the table. If Mythril finds a path that the model missed, the model produced a false negative. If the model claims an exploit that symbolic execution and manual review cannot reproduce, the model produced a likely false positive, not an audit finding. Upgrade risk is easy for AI Smart Contract Review to flatten because upgrade safety is not just "does the function look dangerous." OpenZeppelin Upgrades https://docs.openzeppelin.com/upgrades-plugins focuses on checks such as storage-layout compatibility and upgrade-safety validation, which depend on project configuration and reference contracts. That boundary is a good example of why audits are broader than model review. AI Smart Contract Review can point at a proxy pattern, but the review still needs storage diff, initializer behavior, disabled checks, and deployment history before the team can judge upgrade risk. Standards are the target for AI Smart Contract Review, not marketing proof. OWASP SCSVS https://scs.owasp.org/SCSVS/ and EEA EthTrust Security Levels https://entethalliance.org/specs/ethtrust-sl/ help frame what a serious review should cover, while the SWC Registry https://swcregistry.io/ must be handled carefully because the registry says it is not actively maintained, incomplete, and may contain errors. That separation prevents a common shortcut. AI Smart Contract Review should not say "the model found an SWC, therefore this is audited." A better record says which requirement or weakness category is relevant, what code evidence supports it, and what the reviewer still has to verify. Model output belongs in AI Smart Contract Review, but only with a label. LLM4Vuln https://arxiv.org/abs/2401.16185 supports a useful distinction between model reasoning, model knowledge, supplied context, and prompting effects; that distinction is exactly what smart-contract teams need when the model sounds certain. The practical rule is simple: AI Smart Contract Review can write the first hypothesis. The audit record needs the second layer: source-linked evidence, tool confirmation or contradiction, and a human decision about exploitability and impact. AI Smart Contract Review is not a verdict; it is a queue. The model can move a code path into the queue, a static analyzer can strengthen or weaken the suspicion, a symbolic executor can test a path, and a standard can name the review obligation. The audit starts after that queue exists. That is the point of the false-positive/false-negative table: it lets teams use models without pretending the model already did the part that still belongs to security review.