Prompt Physics: Building a Cognitive Steering Layer for Gemma 4

Explaind**, a "cognitive steering layer" for Google's Gemma 4 model that functions as a structured prompt-assembly harness. Rather than being a chatbot or agent system, it is designed to explicitly engineer and control *how* Gemma 4 reasons by targeting the model's documented failure modes, such as weak system prompt adherence and overconfidence. The system demonstrates its effectiveness by producing distinct reasoning trajectories (e.g., skeptical, causal, adversarial) and genuine self-critique from the same model and question.

This is a submission for the Gemma 4 Challenge: Build with Gemma 4 explaind is a local-first cognitive steering layer for Gemma 4. It is not a chatbot wrapper, not an agent system, and not a RAG tool. It is a structured prompt-assembly harness that shapes how Gemma 4 reasons — not just what it says. The core thesis: for instruction-tuned models like Gemma 4, prompt structure is part of the system design. The harness, the template, and the injection positions are not presentation — they are the engineering. explaind makes that engineering explicit, inspectable, and testable. The system is built around Gemma 4's documented failure modes rather than pretending they do not exist. Weak system prompt adherence, overconfidence, preference for parametric knowledge over injected context, and stochastic output variance are not problems to work around — they are the design brief. What it does: Every design decision maps to a documented Gemma 4 failure mode: Run explaind --full-demo to see the full walkthrough live. Demo 1 — Same question, three reasoning trajectories "Was the 2008 financial crisis preventable?" Skeptical interrogates the question's framing before engaging its content: Surfaced Assumptions Embedded in the Question's Framing: 1. Assumption of Actionability: The term "preventable" implies that there exists a clear, identifiable intervention point... 2. Assumption of Linear Causality: The framing suggests a simple cause-and-effect relationship... 3. Assumption of Moral Culpability: The question implicitly seeks a judgment on whether actors should have acted differently... Null Hypothesis Test: The null hypothesis which the skeptical analysis must test is that the crisis was inevitable... Causal traces the mechanism backward from outcome to root condition: Chain Trace Working Backward : Proximate Cause <- Failure of Liquidity/Solvency Failure of Liquidity <- Excessive Leverage and Under-Capitalization Excessive Leverage <- Lax Risk Management and Regulatory Arbitrage Lax Risk Management <- Structural Flaws Root Conditions Trigger vs. Root Separation: Root Conditions: deregulation, complex financial instruments, failure of regulatory bodies to enforce adequate capital requirements Triggering Condition: collapse in the U.S. subprime mortgage market Devil constructs the strongest opposing case: The strongest genuine counterargument is that the 2008 crisis was not preventable — an inevitable systemic consequence arising from inherent structural flaws, complexity, and interconnectedness of the global financial architecture. The strongest version of the opposing case: The 2008 crisis was not preventable because it was an emergent property of a highly complex, interconnected, and inadequately regulated system. Three clearly distinct reasoning trajectories. Same model. Same question. Different prompt physics. Demo 2 — Self-critique honest mode "AI will eliminate most jobs within 10 years." Initial response balanced ability acknowledges the claim and preserves uncertainty. Self-critique skeptical audit then interrogates it: Surfaced Assumptions Embedded in the Claim: 1. Linearity of Technological Trajectory: The claim assumes the current pace of AI will continue without inflection points... 2. Negligible Adaptation Rate: It assumes workforce capacity for reskilling will be insufficient... 3. Stable Definition of "Job": The claim implicitly assumes the concept of a "job" remains relatively stable... Evidence Gap Analysis: - No evidence detailing mechanisms by which AI leads to elimination rather than transformation - No longitudinal data on workforce adaptation programs - "Most jobs" is undefined — the claim cannot be empirically tested The self-critique is substantively different from the initial response — not a restatement, a genuine adversarial audit. Demo 3 — Calibrated epistemic reasoning "Is the scientific consensus on climate change settled?" HIGH confidence A broad scientific consensus exists regarding the fundamental physics of the greenhouse effect and the role of anthropogenic emissions in driving current global warming trends. MEDIUM confidence The consensus is robust regarding the existence of human-caused warming, but not entirely "settled" regarding all future projections or the precise magnitude of future impacts. Falsification Conditions: 1. The consensus would be overturned if high-quality independent research definitively demonstrated that primary drivers of warming are not anthropogenic... Unknown Inventory What is NOT known : 1. The precise, non-linear tipping points for climate feedback loops 2. The exact socio-economic consequences of various climate scenarios 3. The precise weighting of uncertainty across scientific disciplines Explicit confidence markers. Named assumptions. Falsification conditions. Unknown inventory. This is calibrated reasoning, not performative hedging. Full source code and README: brendanddev/explaind python3.11 -m venv .venv && source .venv/bin/activate pip install -e . ollama pull gemma4-e2b q4 k m:latest explaind --full-demo full narrative walkthrough explaind --demo three curated live demos --consensus N Model choice: gemma4-e2b q4 k m quantized E2B I chose the E2B variant specifically for the edge deployment story — it runs on 8GB unified memory, which means the entire system works on a MacBook Air with no cloud dependency. The E2B is small enough to iterate with but capable enough to show real reasoning differentiation across abilities. More importantly, E2B's sensitivity to structured prompts is what makes the prompt physics approach work. A less instruction-sensitive model would ignore the BIAS FIELD. A model with perfect instruction following wouldn't need it. The architecture: The assembled prompt follows a strict layer order: SYSTEM PROMPT <- primacy anchor injected here GEMMA.md <- universal invariant layer <- periodic refresh 1 ABILITY <- structured bias vector <- periodic refresh 2 CONTEXT WINDOW <- scratchpad + context injection COGNITIVE SCAFFOLD <- optional, --chain --scaffold only BIAS FIELD <- recency position, strongest signal <user input Every layer has a job. The BIAS FIELD appears in three positions because transformer attention research shows position matters for instruction persistence across long contexts. The primacy anchor sets the initial interpretive frame. The periodic refreshes fight drift in long prompts. The recency field is the final forceful instruction before user input. What Gemma 4 specifically enabled: The <|think| token is Gemma 4-specific. When injected at the correct position — as the first token of <start of turn model — it activates Gemma 4's native thinking mode. This was a key discovery during development: placing it anywhere else in the prompt has no effect. What I discovered empirically: Building against a real running model produced findings that no static analysis would have caught: 1. The implicit thinking channel finding. At temperature=0.0, Gemma 4's analytical abilities route reasoning entirely through implicit <|channel thought blocks even without <|think| . The model reasons correctly but the output layer suppresses it — only the input question echoes back. Explicit <|think| activation makes that reasoning visible. 2. The Ollama double-wrapping issue. Without raw: true in the Ollama API call, hand-assembled chat template markers get wrapped inside Ollama's own template application — producing nested broken turn structure. Every claim about the chat template's effectiveness was contingent on this flag being set correctly. 3. The scaffold drift pattern. The cognitive scaffold works when the model complies with the JSON update instruction. Passes 2-3 of a chain showed turn token contamination </start of turn bleeding into the JSON output and corrupting the parse. Graceful degradation handles this — drift detected is set and output is preserved — but it is a real failure mode. The failure documentation: The README has an explicit "Where explaind Fails" section. Ability steering changes reasoning style more reliably than factual accuracy. Consensus at N 3 is slow on 8GB hardware. The three-position BIAS FIELD improves consistency but cannot guarantee instruction following. These are not marketing-friendly statements. They are part of the project.