# Harness use research, is codex better?

> Source: <https://research.tamarillo.ai/coding-harness-inspection/>
> Published: 2026-05-29 20:39:42+00:00

analysis_static.py
Tamarillo — AI Coding Harness Adoption
`tamarillo`

— coding harness inspection

In the past 2 years coding harnesses (and even the term itself) became ubiquitous.
At Tamarillo one goal is to systematize the utilization of coding harnesses (that is why the `theta-spec`

and `theta`

were created).
~400K public GitHub repositories containing configuration files for AI coding
assistants (harnesses) were fetched. ~400K was the count at time of collection
after exhaustively searching GitHub public repos[[†]](#fn-forks).
The process to get the data is pretty straightforward
`filter criteria`

: `PATTERNS`

per harness were defined (explained in detail in [appendix A](#appendix-a-search-patterns-per-harness))
`repo search`

: Code searches on GitHub's REST API filtered against harness configuration files
`enriching stage`

: GitHub's GraphQL API was used to enrich files with commit count, file bytes, creation date, etc.

This document covers a couple of things, market share and adoption dynamics, configuration surface
anatomy (what files exist, how big, how often touched), multi-harness co-occurrence,
repo demographics by stars/language/owner type, and other yerbas. It was created with the intent of
being a DIY-thermometer for a **slice** of a domain. Although some suspicions were confirmed (maybe some obvious ones it MAY be argued),
it is strongly suggested to the reader to read the [limitations and methodology](#methodology--limitations) section in this document.
Only public repositories were fetched. The dataset reflects configuration intentions
(i.e. a repo that has a `.cursorrules`

file signals that someone set it up, not necessarily that Cursor is
daily used). This is a lower bound on harness adoption.
[†] the number of repositories fetched does include a negligible amount of forks of already captured repos (<0.1%). This were excluded for the analysis.
# Harness adoption: market share

At the time of writing, the C4H (Claude Code, Codex, Copilot, Cursor, Hermes) dominates ~80% of the public repositories that expose harness configurations. This is the Pareto principle in action — an expected outcome given that power laws [are usually at play in software](https://www.spinellis.gr/pubs/jrnl/2008-TOSEM-PowerLaws/html/LSV08.html) and software tools. From ~20 harnesses, 5 dominate.
## C4H domination

## Rolling share — new harness adoptions

Throughout the years and months, the landscape changed. At the beginning the market share was pretty much dominated by Cursor.
It is no surprise to confirm that, as well as the rise of Claude Code and shortly after Codex and Hermes.
Below, two charts measure the following signals: recent adoption dynamics, and the current state of market share[[†]](#fn-mortals).
- Rolling market share chart, showing the signal of recent adoption. That is, popularity calculated as the share for new repositories created in an
`x`

-day window, filtered by the date each harness config format became publicly available.
- Evolution of market share in absolute terms over the past ~2 years.

The dates used as filters correspond to harness-specific events associated with the release date of configuration files deemed mandatory in the context of a coding harness **&& ||** the product launch date. Details and sources for each date can be found [here](#appendix-a-search-patterns-per-harness).
[†] This analysis is constrained to what GitHub's public APIs expose. Private repos, enterprise installations, and home-directory config never reach the index, so the curves here are ballpark estimates that can't be confirmed at the population level.
As stated above, the evolution of market share in absolute terms over the past ~2 years filters
by release date without including the retroactive configured repos. The results displayed below are
the cumulative share for each harness present in the selected group. The cumulative adoption over time, filtered by harness release date, shows that:
- All growth shows exponential behavior.
- Harnesses experience rate discontinuity, e.g. Cursor ~2025-07, Copilot ~2025-11 (i.e. subtle change, but noticeable with eyesight on the rate of growth, the cause is up for elucubration)
- Even though harnesses like Claude Code or Codex were released later, they already caught up in terms of repo footprint.
[[†]](#fn-repo-footprint).

## Codex & Claude Code

There is colossal competition between frontier labs in many domains including harnesses, like Codex and Claude Code. The market share ratio
[was discussed on this post](https://www.wired.com/story/openai-codex-race-claude-code/),
which puts Codex at **~** 5% of Claude Code's usage in Sep 2025 and **~** 40% by Jan 2026
(emphasis on the approximation operator ~).
The public-repo measurement here sits above the ~5% and pretty accurately on ~40%: two effects MAY compound to produce the gap:
- WIRED reports "5 percent as much
**use**" (Sep 2025) and "40 percent of Claude Code's **user base**" (Jan 2026), attributed to "people with direct knowledge of the matter" — anonymous internal sources, no methodology disclosed. This chart counts public repos with a config file committed.
- Configs MAY be committed weeks or months after the user actually adopted the tool.

The proxy is more credible for cumulative adoption than for short-run growth. A 14-day rolling window is noisy and there is no public data or ballpark estimate to contrast that rate-of-change series against.
## Multi-harness adoption

If each repo independently decides to adopt one more harness with
constant probability p, the count at k follows a geometric
distribution: N(k)∝pk.
It MAY be a bit of a stretch, but there is no resistance to the temptation of fitting a line
on a log(y) vs x chart and interpreting what that constant probability means. x is
indexed over the count of co-occurrences of harness configurations in repos, so p
is the probability of adopting harness i+1 after already having i. Note that it
does not matter which harness is adopted (e.g. maybe it is always Claude Code because
it is popular), what matters is that there is a new one. The decision to add harness
k+1 does not depend on how many are already configured. The regression was done for harness counts
k∈[1,5] because above that the counts drop to single digits.
p^ is obtained via OLS on logN(k)=logC+klogp. The 95% CI uses the
[textbook normal-theory slope CI](https://en.wikipedia.org/wiki/Simple_linear_regression#Normality_assumption)
β^±1.96⋅SE(β^) and is then pushed through p^=eβ^.
The regression on log(N) is
heteroscedastic (small counts are noisier), so the reported CI is slightly optimistic. A
[Poisson GLM](https://en.wikipedia.org/wiki/Generalized_linear_model#Count_data)
fitting N(k)∼Poisson(μk), logμk=logC+klogp is the correct way of doing things. [[†]](#fn-2-repos)
[†] If only 2 harness are selected the linear regression will be evidently perfect.
