cd /news/artificial-intelligence/synaxi-predict-i-m-trying-to-predict… Β· home β€Ί topics β€Ί artificial-intelligence β€Ί article
[ARTICLE Β· art-30243] src=github.com β†— pub= topic=artificial-intelligence verified=true sentiment=↑ positive

Synaxi-predict: I'm trying to predict token cost before it happens

Synaxi-predict, a new tool from Synaxi, predicts the token cost, turn count, and pass rate of a Claude Code task before execution, enabling users to select the optimal model and avoid wasted tokens. The tool uses an MLP trained on ~53k agent runs and integrates with Claude Code to automatically record actual results for continuous improvement. It is part of the Synaxi ecosystem, which also includes a macOS app that reduces Claude API costs by stripping token waste.

read6 min views1 publishedJun 16, 2026

Predicts the cost, turn count, and pass rate of a Claude Code task before it runs β€” so you can pick the right model without wasting tokens on a bad fit. Closes the loop by capturing actual results and feeding them back into the model.

Part of the Synaxi ecosystem.

Synaxi is a macOS app that cuts Claude API costs by stripping token waste from every request before it leaves your machine β€” deduplicating tool schemas, pruning stale conversation history, compressing verbose JSON, and more. Average reduction: 40%+ per request, with no code changes and under 1ms added latency. Free for personal use.

synaxi-predict

tackles the complementary problem: picking the right model before the task runs. Together they cover both sides of Claude cost control β€” less waste per token, and fewer tokens on the wrong model.

/synaxi-predict Fix the failing migration
        β”‚
        β–Ό
  Model              Est. cost   Turns   Pass
  ─────────────────────────────────────────────
  single-haiku       $    0.35    28.1    8%  β—€ recommended
  single-sonnet      $    0.62    18.4   11%
        β”‚
        β–Ό  [you pick a model]
        β”‚
        β–Ό
  Subagent runs the task with the chosen model
        β”‚
        β”œβ”€ bin/parse-session reads the subagent's session JSONL
        β”‚   β†’ exact turns, token counts, real cost (not estimated)
        β”‚
        β”œβ”€ Eval agent checks git diff + test output β†’ passed: true/false
        β”‚
        └─ bin/record-actual logs prediction vs. actuals
           β†’ feeds back into next training run

Predictions use an MLP trained on ~53k agent runs (SWE-bench, SWE-smith, OpenHands, loong0814, real Claude Code runs). Input features: TF-IDF on task text + tree-sitter code complexity features from the current repo (see Features).

Inside any Claude Code session:

/plugin marketplace add BeadW/synaxi-predict
/plugin install synaxi-predict

On the next session start, Claude Code automatically:

  • Installs the Python package ( pip install -e

) - Downloads the model artifact (~190MB) from GitHub Releases into your platform data directory ( ~/Library/Application Support/synaxi-predict/

on macOS,~/.local/share/synaxi-predict/

on Linux)

Updates happen the same way β€” bump version

in .claude-plugin/plugin.json

, release, and the hook re-runs on next session.

Copy .env.example

to .env

and add your ANTHROPIC_API_KEY

if you plan to run benchmarks.

git clone https://github.com/BeadW/synaxi-predict ~/synaxi-predict
cd ~/synaxi-predict
git lfs pull        # download trained model (~190MB)
pip install -e .

In any Claude Code session, type:

/synaxi-predict Fix the failing login migration

Claude runs the predictor, shows the table, and asks which model you want. After you pick, it dispatches a subagent with that model, then automatically records the actual cost and turns against the prediction.

Once installed, Claude invokes this skill automatically whenever it decides to spawn a subagent β€” no explicit command needed. The prediction table is computed at skill load time via dynamic injection (tree-sitter code features included), so there's no extra tool call overhead.

bin/predict "Add OAuth login" --repo-path /path/to/project

bin/predict "Add OAuth login" --models single --repo-path .

bin/predict --list-models

bin/predict --version

bin/parse-session <agentId> /path/to/project

bin/record-actual <pred_id> --turns 18 --cost 0.42 --passed true

Each prediction combines three input groups:

1. Text features β€” TF-IDF (L2-normalised) over the model name prepended to the task description. Captures task type, verb, domain keywords.

2. Tree-sitter code features β€” extracted from the Python files in your repo at prediction time. Requires tree-sitter

and tree-sitter-python

(included in core dependencies). If unavailable the model falls back to text features only.

Feature Description
loc
Total lines of code across changed/all .py files
functions
Count of def statements
classes
Count of class statements
branches
if + for + while statements
try_blocks
try/except blocks
n_files
Number of .py files analysed
avg_loc
loc / n_files
branch_density
branches / max(loc, 1)
has_code_features
1 if extraction succeeded, 0 if it fell back to zeros

3. Model context β€” per-model average prompt tokens from training data (proxy for context window pressure).

Every completed task produces a ground-truth record in data/actuals_live.jsonl

, including the tree-sitter snapshot taken at prediction time:

{
  "prediction_id": "c7df172d",
  "model": "single-haiku",
  "pred_cost": 0.504,  "actual_cost": 0.243,
  "pred_turns": 86.8,  "actual_turns": 8,
  "passed": true,
  "code_features": {
    "loc": 9042, "functions": 460, "classes": 94,
    "branches": 623, "try_blocks": 51, "n_files": 30,
    "avg_loc": 301.4, "branch_density": 0.069, "has_code_features": 1
  }
}

The code_features

field is what makes contributed actuals valuable for retraining β€” it lets the model learn from real Claude Code runs on real codebases, not just SWE-bench benchmarks.

After accumulating enough actuals, retrain:

python -m predictor.train
Source Records Notes
SWE-smith 25,826 Synthetic SWE tasks, claude-3-7/3-5-sonnet/gpt-4o
loong0814 mini 9,990 SWE-bench Verified, 5 models
OpenHands SWE-bench Lite 5,693 19 models, real pass/fail
loong0814 full 3,639 Real API costs from accumulated_cost
Claude Code runs 725 HumanEval/MBPP runs, upsampled ~14Γ—
Target RΒ² MAE Within 2Γ—
turns 0.59 10.4 turns 91%
completion tokens 0.32 2,936 tok 75%
pass rate (AUC-ROC) 0.91 β€” 84% acc

Turn predictions are calibrated for SWE-bench-style tasks; real Claude Code runs tend to use fewer turns than predicted. This improves as more actuals are recorded via the synaxi-predict

skill.

bin/                     Executable wrappers (predict, record-actual, parse-session)
predictor/               Core prediction, training, and session-parsing logic
  predict.py             CLI entry point and cost calculation
  train.py               MLP training pipeline
  parse_session.py       Parses Claude Code session JSONL for exact metrics
  record_actual.py       Records actuals against predictions
scripts/                 Dataset importers and eval tools
  import_*.py            Normalise benchmark datasets β†’ data/runs/
  extract_*.py           Compute tree-sitter features for benchmark repos
  eval_holdout.py        RΒ², MAE, within-2Γ— on 20% holdout
  eval_pass_rate.py      AUC-ROC, Brier score, calibration
features/code/           Benchmark task definitions (HumanEval, MBPP, etc.)
data/runs/               Training corpus (JSONL, one record per agent run)
data/models/             Trained model artifact (Git LFS)
data/code_features.json  Tree-sitter features per benchmark instance
.claude-plugin/          Plugin metadata (plugin.json, marketplace.json)
commands/                /synaxi-predict slash command
skills/                  synaxi-predict skill (auto-invoked on subagent spawn)

Each scripts/import_*.py

pulls a public benchmark and normalises it into data/runs/

. New importers should produce JSONL with:

{
  "task_id":           "source/instance_id",
  "strategy":          "model-id",
  "task_text":         "description of the task",
  "prompt_tokens_raw": 45000,
  "completion_tokens": 8200,
  "num_turns":         32,
  "total_cost_usd":    0.84,
  "passed_criteria":   true,
  "mode":              "multi-turn"
}

prompt_tokens_raw

= context size at the final API call. completion_tokens

= total across all turns.

Actuals from real Claude Code runs are the most valuable training signal β€” benchmark data (SWE-bench etc.) doesn't capture how the model behaves on everyday coding tasks or typical codebases.

Every time the synaxi-predict

skill completes a task it writes a record to data/actuals_live.jsonl

containing the task text, actual turns/cost, pass result, and the tree-sitter code features of your repo at prediction time. You can share these records to improve the model for everyone:

bin/contribute      # shows uncontributed records, prompts to share via GitHub issue
bin/contribute --all  # non-interactive, contributes everything

Requires the gh

CLI to be authenticated (gh auth login

). Each record is posted as a GitHub issue with the contribution

label and validated before being merged into the training set.

What gets shared: task text, model name, predicted vs actual turns/cost, pass/fail, and the code_features

snapshot. No file contents, no diffs, no personal information.

When to contribute: after a few tasks have accumulated β€” check with bin/contribute

to see what's pending. The more diverse the tasks and codebases, the better the calibration for real Claude Code usage.

See CONTRIBUTING.md. PRs welcome β€” especially new benchmark importers and actuals data.

MIT β€” use freely, attribution appreciated.

── more in #artificial-intelligence 4 stories Β· sorted by recency
── more on @synaxi 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/synaxi-predict-i-m-t…] indexed:0 read:6min 2026-06-16 Β· β€”