cd /news/large-language-models/open-source-toolkit-to-analyze-your-โ€ฆ ยท home โ€บ topics โ€บ large-language-models โ€บ article
[ARTICLE ยท art-14979] src=github.com pub= topic=large-language-models verified=true sentiment=โ†‘ positive

Open source toolkit to analyze your ChatGPT/Claude usage from exports

A new open-source toolkit enables users to analyze their personal ChatGPT and Claude usage data entirely offline, processing official export files to generate model adoption timelines, topic breakdowns, and prompt engineering effectiveness metrics. The tool, available on GitHub, also computes cost efficiency metrics like Productive Output per Dollar (POD) and includes three methodology papers for quantifying practice hours and usage profiles. All analysis runs locally with no data leaving the user's machine, addressing privacy concerns around AI usage tracking.

read4 min publishedMay 27, 2026

Privacy-first tools for analyzing your personal LLM usage from official ChatGPT and Claude exports. Everything runs locally. Nothing leaves your machine.

Turn your raw export data into: model adoption timelines, topic breakdowns, prompt engineering effectiveness metrics, cost efficiency analysis, and formatted reports.

git clone https://github.com/noah-chelednik/llm-export-analytics.git
cd llm-export-analytics
./run_pipeline.sh --sample

This runs the full pipeline against included sample data so you can see what it produces before using your own exports.

Model adoption timeline shows which AI models you used and when you switched:

2025-01  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ GPT-4o (100%)
2025-02  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ GPT-5 (66%)  โ–ˆโ–ˆโ–ˆโ–ˆ GPT-4o (21%)
2025-03  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ GPT-5.2-T (78%)  โ–ˆโ–ˆโ–ˆโ–ˆ GPT-5 (22%)

Prompt technique tracking measures how your prompting style evolves over time:

Technique Adoption Rate First Appeared
Constraint specification 6.5% 2023-08
Context front- 4.6% 2024-02
Code inclusion 2.7% 2023-10
Multi-step instructions 1.7% 2023-11

Cost efficiency computes your Productive Output per Dollar (POD) from subscription costs and total output words. Also computes quality-adjusted POE with sensitivity analysis.

Plus: topic distribution, session dynamics, interaction style breakdown, conversation outcome classification, industry benchmark comparison, and more.

Three methodology papers are included:

: Rigorous methodology for quantifying LLM practice hours from export data. Tiered claims with sensitivity analysis and stress testing.LLM_Practice_Hours_Methodology_GIT.pdf - : Comprehensive data-driven usage profile covering model adoption (21 model versions), domain portfolio, prompt engineering effectiveness, and interaction patterns.Deep_LLM_Usage_Analysis.pdf - : The first standardized methodology for measuring individual LLM cost efficiency. Defines Productive Output per Dollar (POD), Deliverable-Linked Output per Dollar (DLOD), and quality-adjusted Productive Output Efficiency (POE).Productive_Output_Efficiency.pdf

scripts/
  analyze_chatgpt.py              # ChatGPT export normalization and basic stats
  analyze_claude.py               # Claude export normalization and basic stats
  analyze_combined.py             # Combined cross-platform analysis

scripts/deep_analysis/
  extract_chatgpt_metadata.py     # Model versions, tools, branching, reasoning
  classify_and_link.py            # Topic classification + project attribution
  analyze_effectiveness.py        # Prompt techniques, outcomes, interaction patterns
  generate_tables_and_charts.py   # Formatted tables and ASCII charts
  compute_pod.py                  # Productive Output per Dollar
  compute_dlod.py                 # Deliverable-Linked Output per Dollar
  compute_poe.py                  # Quality-adjusted POE with sensitivity analysis
  compare_benchmarks.py           # Industry benchmark comparison

examples/
  sample_chatgpt_export.json      # Synthetic sample data for testing
  sample_claude_export.json       # Synthetic sample data for testing

templates/
  example_projects.json           # Example project config (customize for your projects)
  cost_log_template.json          # Template for subscription cost tracking
  deliverable_inventory_template.json  # Template for verified deliverables
  quality_params.json             # Q coefficient configurations for POE
  benchmarks.json                 # Curated industry benchmark data with citations

docs/
  DEEP_ANALYSIS_GUIDE.md          # Step-by-step guide to the full pipeline
  Deep_LLM_Usage_Analysis.pdf
  Productive_Output_Efficiency.pdf
  Prompting_One_Pager.pdf

results/
  usage_snapshot_2025-12-09.md    # Historical snapshot
  usage_snapshot_2026-05-26.md    # Current snapshot

ChatGPT: Settings > Data controls > Export data. You'll receive a ZIP containing conversations.json

(or multiple shards).

Claude: Settings > Account > Export data. You'll receive a ZIP containing conversations.json

.

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python scripts/analyze_chatgpt.py --input /path/to/conversations.json --out outputs --utc
python scripts/analyze_claude.py --input /path/to/conversations.json --out outputs --utc
python scripts/analyze_combined.py \
  --chatgpt outputs/chatgpt_messages_normalized.csv \
  --claude outputs/claude_messages_normalized.csv --utc

Run the basic scripts with --include-content

first, then see the Deep Analysis Guide for the full pipeline: model tracking, topic classification, prompt effectiveness, cost efficiency, and industry benchmarks.

All processing is local. No network calls. No telemetry.

By default, scripts produce privacy-minimized CSVs (timestamps, IDs, roles only). Content analysis requires the explicit --include-content

flag. Never commit or share CSVs generated with content enabled.

ChatGPT: Reconstructs the primary conversation path from the mapping tree. Walks parent pointers from the current node back to root. Extracts per-message model metadata, tool usage, branching, and reasoning information.

Claude: Extracts text blocks from content arrays. Normalizes roles to match the shared schema. Supports both web/app and Claude Code CLI sessions.

Deep analysis: Adds topic classification via keyword taxonomy with content fallback, optional project attribution from user-provided config, prompt technique detection (10 techniques tracked), conversation outcome classification, interaction style analysis, and cost efficiency computation.

  • Results depend on platform export formats which may change
  • ChatGPT analysis reconstructs primary path only, not full conversation trees
  • Topic classification uses keyword matching (expect 40-50% "Other" for casual conversation titles)
  • Project attribution requires user-provided config and represents a lower bound
  • Cost metrics are designed for subscription pricing, not API pricing
  • Token counts are approximate and encoding-dependent

This repository contains no platform export files and distributes no conversation data. It exists to make long-horizon LLM usage measurable, reproducible, and inspectable without turning private conversations into a dataset.

โ”€โ”€ more in #large-language-models 4 stories ยท sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain โ€” perfect for shipping the agent you just read about.

$git push zahid main
โ†’ Live at https://your-agent.zahid.host โœ“
Get free account โ†’ Pricing
from โ‚ฌ0/mo ยท no card required
LIVE [news/open-source-toolkit-โ€ฆ] indexed:0 read:4min 2026-05-27 ยท โ€”