# The Real Moat Isn't Software

> Source: <https://dev.to/joshadler/the-real-moat-isnt-software-5eoo>
> Published: 2026-05-30 02:00:39+00:00

Last month I ripped five 64MP cameras out of a wall-mounted sensor network and replaced them with 12MP ones. Downgrade on paper. Best decision I made all year.

Your AI knows what you type. That's it. Every piece of context, every preference, every behavioral pattern your AI has about you came through a text box. You manually told it, during a conversation you chose to have, about a topic you remembered to bring up.

Meanwhile the stuff that actually defines your behavior is invisible to you. You don't notice that you pace when you're anxious. You don't track how long you actually sit at your desk versus how long you think you do. You tell your AI you work out four times a week when you go twice.

The models are smart enough. The input layer is broken.

Five nodes. Each one is a Raspberry Pi Zero 2W ($15), an ArduCam IMX708 12MP camera with 120-degree FOV, and a WM8960 audio HAT for microphone capture. About $100 per node, $500 total.

Each node runs a custom Python daemon that handles:

Inference runs on an RTX 5090 on my local network. The whole thing fits on a desk.. well kinda, minus the cameras on the walls.

I started with OwlSight 64MP sensors using the `ov64a40`

driver. On paper, incredible. In practice, a nightmare.

The Pi Zero 2W would thermal throttle within twenty minutes. I'm talking 80C+ temps on a board that draws 4W under camera load. The dtoverlay configuration needed a specific `link-frequency`

parameter (`link-frequency=360000000`

) that I spent entire nights debugging. One node would initialize fine, an identical SD card image on the next node would fail. The answer was always something dumb: a loose ribbon cable, a kernel version mismatch, a PSU that couldn't sustain the current draw.

I eventually switched everything to the IMX708 with a simple `dtoverlay=imx708`

config. Less flashy specs, dramatically more stable. The boring choice was the right choice.

If you're building hardware: optimize for "does it actually work at 3am when nobody's watching," not for the spec sheet.

Within the first week, the system captured patterns I never would have typed into a chat window. Movement patterns through my apartment, actual sleep schedule versus what I'd report, real desk time versus perceived desk time. One hour of physical observation generates more behavioral data than a year of chat transcripts.

That's not an exaggeration. That's the gap.

Here's the framework that I keep coming back to:

**Layer 1: Observation.** Getting data from the physical world into a format AI can process. Cameras, microphones, sensors, wearables. This is what Paradox does.

**Layer 2: Memory.** Taking raw observational data plus conversational data and encoding it intelligently. Deciding what matters, letting stale information decay, surfacing the right context at the right time. This is what I built [TrueMemory](https://truememory.net) to solve. The architecture is in my [arXiv paper](https://arxiv.org/abs/2605.04897), and it's based on how biological memory actually works: encoding gates, salience scoring, temporal decay.

**Layer 3: Reasoning.** The LLM. Claude, GPT, whatever comes next.

Right now, billions of dollars are flowing into Layer 3. Anthropic, OpenAI, Google, all building better reasoning engines. And they're getting incredible. But Layer 3 is reasoning on top of almost nothing because Layers 1 and 2 barely exist.

It's like building the most powerful engine in the world and putting it in a car with no windows.

Software wrappers get replicated in a weekend. A better RAG pipeline, a smarter reranking algorithm, a novel encoding gate, those are all real innovations but they're also all just code. Somebody reads your paper, understands the approach, ships their own version.

Hardware can't be replicated like that. The physical deployment, sensor calibration, months of debugging driver conflicts and thermal issues and network topology, that's a different kind of moat entirely.

If you're looking for an interesting project:

The observation layer is the missing piece in AI. Everyone is building smarter reasoning on top of the same garbage input. Nobody is fixing the input.

The Pi Zero 2W draws about 1.5W idle but spikes to nearly 4W under camera load. Battery operation is not Realistic. These need to be plugged in.

Five cameras at 15fps generates a lot of data. Even with motion-triggered recording, my NAS fills up faster than I'd like. I spent a week building a cleanup pipeline just to keep storage from overflowing.

And there's the social cost. My girlfriend didn't talk to me for two days after I installed the cameras. We worked it out, there are zones now, rooms where the cameras don't run. But social acceptability is a constraint as hard as any engineering limitation. You can't debug your way out of it.

Nobody is going to win the AI race by building a better chat interface. The chat interface is a temporary artifact of the fact that we haven't figured out how to get AI into the room with you.

I don't have this figured out. I have five cameras generating data I'm still learning to process, a NAS that fills up too fast, and a lot of 2am debugging sessions behind me. But I know the moat isn't who builds the cleverest wrapper. It's who gets AI into the physical world first.

That's a hardware problem. And it's a lot harder than fine-tuning a prompt template.

*Josh Adler is a researcher at TrueMemory, a Sauron company. Research: arXiv:2605.04897. More at joshadler.com.*
