{"slug": "building-a-stable-fable-5-traces-workflow-in-colab-parsing-tool-calls-auditing", "title": "Building a Stable Fable 5 Traces Workflow in Colab: Parsing Tool Calls, Auditing Data, and Training Baselines", "summary": "A tutorial demonstrates building a stable workflow for the Fable 5 Traces dataset in Google Colab, including parsing tool calls, auditing data for secrets, and training Naive Bayes baselines to predict assistant output types and tool usage from trace context.", "body_md": "In this ** tutorial**, we work with the\n\n[from Hugging Face and build a complete workflow around real coding-agent trace data. We start by setting up a lightweight environment that avoids fragile dependencies such as datasets, scikit-learn, and scipy. Then we manually download and parse the merged JSONL file to keep the notebook stable in Colab. From there, we inspect repository files, preview raw trace examples, normalize tool calls and text outputs, audit the dataset structure, detect potential secret-like patterns, and visualize key distributions, including output types, tools, source roots, and text lengths. We also create safe no-CoT chat/SFT exports, build a simple keyword-search helper, and train pure-Python Naive Bayes baselines to assess whether trace context can predict the assistant’s output type and tool usage.](https://huggingface.co/datasets/Glint-Research/Fable-5-traces)\n\n**Fable 5 Traces dataset****Setting Up the Fable 5 Traces Colab Environment and Helpers**\n\n``` python\nimport os\nimport sys\nimport json\nimport re\nimport math\nimport random\nimport subprocess\nfrom pathlib import Path\nfrom collections import Counter, defaultdict\ndef install_packages():\n   packages = [\n       \"huggingface_hub>=0.23.0\",\n       \"rich>=13.0.0\",\n       \"tqdm>=4.66.0\",\n   ]\n   subprocess.run(\n       [\n           sys.executable,\n           \"-m\",\n           \"pip\",\n           \"install\",\n           \"-q\",\n           \"-U\",\n           \"--upgrade-strategy\",\n           \"only-if-needed\",\n           *packages,\n       ],\n       check=False,\n   )\ninstall_packages()\nimport pandas as pd\nimport matplotlib.pyplot as plt\ntry:\n   import numpy as np\nexcept Exception:\n   np = None\nfrom tqdm.auto import tqdm\nfrom rich import print as rprint\nfrom rich.panel import Panel\nfrom rich.table import Table\nfrom huggingface_hub import HfApi, hf_hub_download\nfrom IPython.display import display\nDATASET_ID = \"Glint-Research/Fable-5-traces\"\nFLAT_JSONL_FILENAME = \"fable5_cot_merged.jsonl\"\nOUT_DIR = Path(\"/content/fable5_traces_tutorial_outputs\")\nOUT_DIR.mkdir(parents=True, exist_ok=True)\nSEED = 42\nrandom.seed(SEED)\nif np is not None:\n   np.random.seed(SEED)\nMAX_PREVIEW_CHARS = 900\nN_AGENT_TRACE_PREVIEWS = 2\nN_SAFE_DATASET_PREVIEWS = 3\nSAVE_COT_RESEARCH_EXPORT = False\nMAX_ROWS_TO_LOAD = None\nrprint(\n   Panel.fit(\n       f\"[bold]Fable 5 Traces Advanced Tutorial[/bold]\\n\"\n       f\"Dataset: {DATASET_ID}\\n\"\n       f\"Output directory: {OUT_DIR}\\n\"\n       f\"Manual JSONL loading: True\\n\"\n       f\"CoT research export enabled: {SAVE_COT_RESEARCH_EXPORT}\",\n       title=\"Setup\",\n   )\n)\nSECRET_PATTERNS = [\n   r\"sk-[A-Za-z0-9_\\-]{20,}\",\n   r\"hf_[A-Za-z0-9_\\-]{20,}\",\n   r\"github_pat_[A-Za-z0-9_]{20,}\",\n   r\"ghp_[A-Za-z0-9]{20,}\",\n   r\"xox[baprs]-[A-Za-z0-9\\-]{20,}\",\n   r\"AKIA[0-9A-Z]{16}\",\n   r\"(?i:(api[_-]?key|secret|token|password)\\s*[:=]\\s*['\\\"]?[^'\\\"\\s]{8,})\",\n]\nSECRET_RE = re.compile(\"|\".join(f\"(?:{pattern})\" for pattern in SECRET_PATTERNS))\nTOKEN_RE = re.compile(r\"[A-Za-z_][A-Za-z_0-9]{1,}|[./\\\\-]{2,}|[{}()\\[\\]:=<>]+\")\ndef safe_json_dumps(obj, max_chars=None):\n   try:\n       text = json.dumps(obj, ensure_ascii=False, indent=2, default=str)\n   except Exception:\n       text = str(obj)\n   if max_chars is not None and len(text) > max_chars:\n       return text[:max_chars] + \"\\n... [truncated]\"\n   return text\ndef is_missing_scalar(value):\n   if value is None:\n       return True\n   if isinstance(value, (list, dict, tuple, set)):\n       return False\n   try:\n       return bool(pd.isna(value))\n   except Exception:\n       return False\ndef clean_for_json(value):\n   if is_missing_scalar(value):\n       return None\n   if isinstance(value, dict):\n       return {str(k): clean_for_json(v) for k, v in value.items()}\n   if isinstance(value, list):\n       return [clean_for_json(v) for v in value]\n   if isinstance(value, tuple):\n       return [clean_for_json(v) for v in value]\n   if np is not None:\n       if isinstance(value, np.integer):\n           return int(value)\n       if isinstance(value, np.floating):\n           if math.isnan(float(value)):\n               return None\n           return float(value)\n       if isinstance(value, np.ndarray):\n           return value.tolist()\n   return value\ndef redact_possible_secrets(text):\n   if text is None:\n       return \"\"\n   text = str(text)\n   return SECRET_RE.sub(\"[REDACTED_POSSIBLE_SECRET]\", text)\ndef contains_possible_secret(text):\n   if text is None:\n       return False\n   return bool(SECRET_RE.search(str(text)))\ndef preview_text(text, max_chars=MAX_PREVIEW_CHARS):\n   text = redact_possible_secrets(text)\n   text = re.sub(r\"\\s+\", \" \", text).strip()\n   if len(text) > max_chars:\n       return text[:max_chars] + \" ... [truncated]\"\n   return text\n```\n\nWe begin by setting up the Colab environment with only the lightweight packages needed for this workflow. We define the dataset path, output directory, random seed, preview limits, and export options so the tutorial behaves consistently. We also create the first set of helper functions for safe JSON formatting, secret redaction, missing-value handling, and clean text previews.\n\n**Building Parsing Utilities for Tool Calls and Text Outputs**\n\n``` python\ndef maybe_parse_json_string(value):\n   if isinstance(value, str):\n       stripped = value.strip()\n       if (stripped.startswith(\"{\") and stripped.endswith(\"}\")) or (\n           stripped.startswith(\"[\") and stripped.endswith(\"]\")\n       ):\n           try:\n               return json.loads(stripped)\n           except Exception:\n               return value\n   return value\ndef normalize_output_obj(value):\n   return maybe_parse_json_string(value)\ndef extract_tool_name(output):\n   output = normalize_output_obj(output)\n   if isinstance(output, dict):\n       direct_keys = [\n           \"name\",\n           \"tool_name\",\n           \"tool\",\n           \"function\",\n           \"command_name\",\n           \"recipient_name\",\n           \"toolName\",\n           \"callee\",\n       ]\n       for key in direct_keys:\n           value = output.get(key)\n           if isinstance(value, str) and value.strip():\n               return value.strip()\n       nested_keys = [\n           \"tool_call\",\n           \"toolCall\",\n           \"function_call\",\n           \"call\",\n           \"action\",\n       ]\n       for nested_key in nested_keys:\n           nested = output.get(nested_key)\n           if isinstance(nested, dict):\n               found = extract_tool_name(nested)\n               if found:\n                   return found\n       output_type = output.get(\"type\")\n       if isinstance(output_type, str):\n           output_type = output_type.strip()\n           if output_type and output_type.lower() not in {\"tool_use\", \"text\", \"message\"}:\n               return output_type\n   return \"\"\ndef extract_tool_args(output):\n   output = normalize_output_obj(output)\n   if isinstance(output, dict):\n       direct_arg_keys = [\n           \"input\",\n           \"args\",\n           \"arguments\",\n           \"parameters\",\n           \"kwargs\",\n           \"json\",\n           \"payload\",\n       ]\n       for key in direct_arg_keys:\n           if key in output:\n               return output[key]\n       nested_keys = [\n           \"tool_call\",\n           \"toolCall\",\n           \"function_call\",\n           \"call\",\n           \"action\",\n       ]\n       for nested_key in nested_keys:\n           nested = output.get(nested_key)\n           if isinstance(nested, dict):\n               args = extract_tool_args(nested)\n               if args not in [None, \"\", {}]:\n                   return args\n       ignored = {\n           \"name\",\n           \"tool_name\",\n           \"tool\",\n           \"function\",\n           \"command_name\",\n           \"recipient_name\",\n           \"toolName\",\n           \"callee\",\n           \"type\",\n       }\n       return {key: value for key, value in output.items() if key not in ignored}\n   return {}\ndef extract_text_payload(output):\n   output = normalize_output_obj(output)\n   if isinstance(output, str):\n       return output\n   if isinstance(output, dict):\n       text_keys = [\n           \"text\",\n           \"content\",\n           \"message\",\n           \"output\",\n           \"value\",\n           \"result\",\n       ]\n       for key in text_keys:\n           value = output.get(key)\n           if isinstance(value, str):\n               return value\n           if isinstance(value, list):\n               return safe_json_dumps(value)\n           if isinstance(value, dict):\n               nested = extract_text_payload(value)\n               if nested:\n                   return nested\n       return safe_json_dumps(output)\n   return str(output)\ndef robust_len(value):\n   if value is None:\n       return 0\n   return len(str(value))\ndef source_root(source_file):\n   source_file = str(source_file or \"\").replace(\"\\\\\", \"/\")\n   if not source_file:\n       return \"unknown\"\n   parts = [part for part in source_file.split(\"/\") if part]\n   for marker in [\"projects\", \"AIArchives\", \"archives\", \"claude\"]:\n       if marker in parts:\n           idx = parts.index(marker)\n           if idx + 1 < len(parts):\n               return parts[idx + 1]\n   if len(parts) >= 2:\n       return parts[-2]\n   if parts:\n       return parts[0]\n   return \"unknown\"\ndef write_jsonl(path, records):\n   path = Path(path)\n   with path.open(\"w\", encoding=\"utf-8\") as file:\n       for record in records:\n           file.write(json.dumps(clean_for_json(record), ensure_ascii=False, default=str) + \"\\n\")\ndef save_plot(path):\n   path = Path(path)\n   plt.tight_layout()\n   plt.savefig(path, dpi=160, bbox_inches=\"tight\")\n   plt.show()\n   plt.close()\n   return path\ndef print_basic_table(title, rows, columns=(\"Metric\", \"Value\")):\n   table = Table(title=title)\n   for column in columns:\n       table.add_column(str(column))\n   for row in rows:\n       table.add_row(*[str(item) for item in row])\n   rprint(table)\ndef tokenize(text, max_chars=12000):\n   text = str(text or \"\")[:max_chars].lower()\n   return TOKEN_RE.findall(text)\ndef load_jsonl_manual(path, max_rows=None):\n   records = []\n   bad_lines = []\n   with open(path, \"r\", encoding=\"utf-8\") as file:\n       for line_number, line in tqdm(enumerate(file, start=1), desc=\"Reading JSONL\"):\n           line = line.strip()\n           if not line:\n               continue\n           try:\n               records.append(json.loads(line))\n           except Exception as error:\n               bad_lines.append(\n                   {\n                       \"line_number\": line_number,\n                       \"error\": repr(error),\n                       \"preview\": line[:500],\n                   }\n               )\n           if max_rows is not None and len(records) >= max_rows:\n               break\n   return records, bad_lines\n```\n\nWe build the core parsing utilities that turn raw output fields into usable tool names, tool arguments, and text payloads. We also define helpers for measuring text length, identifying source roots, writing JSONL files, saving plots, and printing clean tables. We finish this snippet by adding tokenization and manual JSONL loading to avoid fragile dataset-loading dependencies.\n\n**Inspecting the Hugging Face Repository and Loading JSONL Traces**\n\n```\nrprint(Panel.fit(\"[bold]Inspecting Hugging Face dataset repository[/bold]\"))\napi = HfApi()\nfiles = api.list_repo_files(repo_id=DATASET_ID, repo_type=\"dataset\")\npi_trace_files = [\n   file for file in files\n   if file.startswith(\"pi-traces/\") and file.endswith(\".jsonl\")\n]\nfile_summary = {\n   \"total_repo_files\": len(files),\n   \"jsonl_files\": sum(file.endswith(\".jsonl\") for file in files),\n   \"pi_trace_files\": len(pi_trace_files),\n   \"claude_files\": sum(file.startswith(\"claude/\") for file in files),\n   \"has_flat_jsonl\": FLAT_JSONL_FILENAME in files,\n}\nprint_basic_table(\n   \"Repository File Summary\",\n   [(key, value) for key, value in file_summary.items()],\n)\nrprint(\"[bold]Sample repository files:[/bold]\")\nfor file in files[:20]:\n   print(\" -\", file)\nrprint(Panel.fit(\"[bold]Manual raw pi-trace preview[/bold]\"))\npi_examples = []\nif pi_trace_files:\n   for trace_file in pi_trace_files[:N_AGENT_TRACE_PREVIEWS]:\n       try:\n           local_trace_path = hf_hub_download(\n               repo_id=DATASET_ID,\n               repo_type=\"dataset\",\n               filename=trace_file,\n           )\n           trace_records, trace_bad_lines = load_jsonl_manual(local_trace_path, max_rows=1)\n           if trace_records:\n               example = trace_records[0]\n               pi_examples.append(example)\n               preview_payload = {\n                   \"trace_file\": trace_file,\n                   \"keys\": list(example.keys()),\n                   \"preview\": example,\n               }\n               rprint(\n                   Panel(\n                       safe_json_dumps(preview_payload, max_chars=3000),\n                       title=f\"Raw pi-trace preview: {trace_file}\",\n                   )\n               )\n           if trace_bad_lines:\n               rprint(\n                   f\"[yellow]Bad JSONL lines in {trace_file}: {len(trace_bad_lines)}[/yellow]\"\n               )\n       except Exception as error:\n           rprint(f\"[yellow]Could not preview {trace_file}[/yellow]\")\n           rprint(repr(error))\nelse:\n   rprint(\"[yellow]No pi-traces JSONL files found.[/yellow]\")\nrprint(Panel.fit(\"[bold]Downloading flat merged JSONL from Hugging Face Hub[/bold]\"))\nflat_path = hf_hub_download(\n   repo_id=DATASET_ID,\n   repo_type=\"dataset\",\n   filename=FLAT_JSONL_FILENAME,\n)\nrprint(f\"[green]Downloaded flat file:[/green] {flat_path}\")\nrprint(Panel.fit(\"[bold]Loading flat JSONL manually[/bold]\"))\nrecords, bad_lines = load_jsonl_manual(flat_path, max_rows=MAX_ROWS_TO_LOAD)\nif bad_lines:\n   bad_lines_path = OUT_DIR / \"bad_jsonl_lines.json\"\n   with open(bad_lines_path, \"w\", encoding=\"utf-8\") as file:\n       json.dump(bad_lines, file, ensure_ascii=False, indent=2)\n   rprint(f\"[yellow]Bad JSONL lines found: {len(bad_lines)} -> {bad_lines_path}[/yellow]\")\ndf = pd.DataFrame.from_records(records)\nrprint(f\"[green]Loaded rows:[/green] {len(df):,}\")\nrprint(f\"[green]DataFrame shape:[/green] {df.shape}\")\nrprint(\"[bold]Columns:[/bold]\")\nprint(list(df.columns))\ndisplay(df.head(3))\nexpected_cols = [\n   \"uid\",\n   \"source_file\",\n   \"session\",\n   \"model\",\n   \"context\",\n   \"cot\",\n   \"output_type\",\n   \"output\",\n   \"completion\",\n   \"origin\",\n]\nfor column in expected_cols:\n   if column not in df.columns:\n       df[column] = None\ndf[\"output_norm\"] = df[\"output\"].map(normalize_output_obj)\ndf[\"tool_name\"] = df[\"output_norm\"].map(extract_tool_name)\ndf[\"tool_args\"] = df[\"output_norm\"].map(extract_tool_args)\ndf[\"text_payload\"] = df[\"output_norm\"].map(extract_text_payload)\ndf[\"context_chars\"] = df[\"context\"].map(robust_len)\ndf[\"cot_chars\"] = df[\"cot\"].map(robust_len)\ndf[\"completion_chars\"] = df[\"completion\"].map(robust_len)\ndf[\"text_payload_chars\"] = df[\"text_payload\"].map(robust_len)\ndf[\"source_root\"] = df[\"source_file\"].map(source_root)\ndf[\"possible_secret_in_context\"] = df[\"context\"].map(contains_possible_secret)\ndf[\"possible_secret_in_completion\"] = df[\"completion\"].map(contains_possible_secret)\ndf[\"possible_secret_anywhere\"] = (\n   df[\"possible_secret_in_context\"] | df[\"possible_secret_in_completion\"]\n)\n```\n\nWe inspect the Hugging Face dataset repository and summarize the number of files, JSONL traces, and flat-merged files available. We manually preview a few raw Pi trace files to understand the structure without relying on the datasets library. We then download the merged JSONL file, load it into a DataFrame, and normalize key fields for later analysis.\n\n**Auditing Dataset Structure and Visualizing Trace Distributions**\n\n```\naudit_rows = [\n   (\"rows\", len(df)),\n   (\"columns\", len(df.columns)),\n   (\"unique_uid\", df[\"uid\"].nunique(dropna=True)),\n   (\"duplicate_uid_rows\", int(df[\"uid\"].duplicated().sum())),\n   (\"unique_sessions\", df[\"session\"].nunique(dropna=True)),\n   (\"unique_models\", df[\"model\"].nunique(dropna=True)),\n   (\"missing_context\", int(df[\"context\"].isna().sum())),\n   (\"missing_cot\", int(df[\"cot\"].isna().sum())),\n   (\"missing_output\", int(df[\"output\"].isna().sum())),\n   (\"rows_with_possible_secret_pattern\", int(df[\"possible_secret_anywhere\"].sum())),\n   (\"median_context_chars\", round(float(df[\"context_chars\"].median()), 2)),\n   (\"median_cot_chars\", round(float(df[\"cot_chars\"].median()), 2)),\n   (\"median_completion_chars\", round(float(df[\"completion_chars\"].median()), 2)),\n   (\"max_completion_chars\", int(df[\"completion_chars\"].max())),\n]\nprint_basic_table(\"Flat JSONL Audit\", audit_rows)\nrprint(\"\\n[bold]Output type distribution:[/bold]\")\ndisplay(df[\"output_type\"].value_counts(dropna=False).to_frame(\"rows\"))\nrprint(\"\\n[bold]Model distribution:[/bold]\")\ndisplay(df[\"model\"].value_counts(dropna=False).to_frame(\"rows\").head(20))\nrprint(\"\\n[bold]Origin distribution:[/bold]\")\ndisplay(df[\"origin\"].value_counts(dropna=False).to_frame(\"rows\"))\nrprint(\"\\n[bold]Top source roots:[/bold]\")\ndisplay(df[\"source_root\"].value_counts().head(20).to_frame(\"rows\"))\nrprint(\"\\n[bold]Top tool names:[/bold]\")\ndisplay(\n   df.loc[df[\"output_type\"].eq(\"tool_use\"), \"tool_name\"]\n   .replace(\"\", pd.NA)\n   .value_counts(dropna=False)\n   .head(25)\n   .to_frame(\"rows\")\n)\nrprint(\n   Panel.fit(\n       \"[bold]Safe previews[/bold]\\n\"\n       \"These previews redact common secret-like patterns and never execute trace commands.\"\n   )\n)\nsample_df = df.sample(\n   n=min(N_SAFE_DATASET_PREVIEWS, len(df)),\n   random_state=SEED,\n).reset_index(drop=True)\nfor index, row in sample_df.iterrows():\n   payload = {\n       \"uid\": row.get(\"uid\"),\n       \"session\": row.get(\"session\"),\n       \"model\": row.get(\"model\"),\n       \"origin\": row.get(\"origin\"),\n       \"output_type\": row.get(\"output_type\"),\n       \"tool_name\": row.get(\"tool_name\"),\n       \"context_preview\": preview_text(row.get(\"context\")),\n       \"cot_preview\": preview_text(row.get(\"cot\")),\n       \"text_or_tool_payload_preview\": preview_text(row.get(\"text_payload\")),\n   }\n   rprint(\n       Panel(\n           safe_json_dumps(payload, max_chars=4000),\n           title=f\"Safe Row Preview {index}\",\n       )\n   )\nrprint(Panel.fit(\"[bold]Creating plots[/bold]\"))\nplot_paths = {}\noutput_counts = df[\"output_type\"].fillna(\"missing\").value_counts()\nplt.figure(figsize=(8, 5))\noutput_counts.plot(kind=\"bar\")\nplt.title(\"Output Type Distribution\")\nplt.xlabel(\"Output Type\")\nplt.ylabel(\"Rows\")\nplt.xticks(rotation=25, ha=\"right\")\nplot_paths[\"output_type_distribution\"] = str(\n   save_plot(OUT_DIR / \"output_type_distribution.png\")\n)\ntool_counts = (\n   df.loc[df[\"output_type\"].eq(\"tool_use\"), \"tool_name\"]\n   .replace(\"\", \"unknown\")\n   .value_counts()\n   .head(20)\n)\nif len(tool_counts) > 0:\n   plt.figure(figsize=(9, 6))\n   tool_counts.sort_values().plot(kind=\"barh\")\n   plt.title(\"Top Tool Names\")\n   plt.xlabel(\"Rows\")\n   plt.ylabel(\"Tool\")\n   plot_paths[\"top_tools\"] = str(save_plot(OUT_DIR / \"top_tools.png\"))\nelse:\n   rprint(\"[yellow]No tool-use rows found for tool plot.[/yellow]\")\nsource_counts = df[\"source_root\"].fillna(\"unknown\").value_counts().head(20)\nplt.figure(figsize=(9, 6))\nsource_counts.sort_values().plot(kind=\"barh\")\nplt.title(\"Top Source Roots\")\nplt.xlabel(\"Rows\")\nplt.ylabel(\"Source Root\")\nplot_paths[\"top_source_roots\"] = str(save_plot(OUT_DIR / \"top_source_roots.png\"))\nlength_cols = [\n   \"context_chars\",\n   \"cot_chars\",\n   \"completion_chars\",\n   \"text_payload_chars\",\n]\nfor column in length_cols:\n   plt.figure(figsize=(8, 5))\n   clipped = df[column].clip(upper=df[column].quantile(0.99))\n   plt.hist(clipped, bins=50)\n   plt.title(f\"{column} Distribution, Clipped at P99\")\n   plt.xlabel(\"Characters\")\n   plt.ylabel(\"Rows\")\n   plot_paths[f\"{column}_histogram\"] = str(\n       save_plot(OUT_DIR / f\"{column}_histogram.png\")\n   )\n```\n\nWe audit the dataset by checking row counts, unique sessions, duplicate IDs, missing fields, text lengths, and possible secret-like patterns. We display important distributions across output types, models, origins, source roots, and tool names to understand the data’s shape. We also create safe previews and visual plots so we can inspect the traces without executing any commands.\n\n**Projecting Traces and Exporting Safe No-CoT Chat Datasets**\n\n```\nrprint(Panel.fit(\"[bold]Creating pure NumPy TF-IDF-style projection[/bold]\"))\nif np is not None:\n   try:\n       projection_sample = df.sample(n=min(1000, len(df)), random_state=SEED).copy()\n       projection_texts = projection_sample[\"context\"].fillna(\"\").astype(str).tolist()\n       doc_tokens = [tokenize(text, max_chars=8000) for text in projection_texts]\n       doc_freq = Counter()\n       for tokens in doc_tokens:\n           doc_freq.update(set(tokens))\n       vocab_items = [\n           item for item in doc_freq.items()\n           if item[1] >= 2 and len(item[0]) > 1\n       ]\n       vocab_items = sorted(vocab_items, key=lambda item: item[1], reverse=True)[:1000]\n       vocab = {token: idx for idx, (token, _) in enumerate(vocab_items)}\n       if len(vocab) >= 3 and len(doc_tokens) >= 10:\n           X = np.zeros((len(doc_tokens), len(vocab)), dtype=np.float32)\n           df_counts = np.zeros(len(vocab), dtype=np.float32)\n           for row_idx, tokens in enumerate(doc_tokens):\n               counts = Counter(token for token in tokens if token in vocab)\n               for token, count in counts.items():\n                   col_idx = vocab[token]\n                   X[row_idx, col_idx] = float(count)\n               for token in counts.keys():\n                   df_counts[vocab[token]] += 1.0\n           idf = np.log((1.0 + len(doc_tokens)) / (1.0 + df_counts)) + 1.0\n           X = X * idf.reshape(1, -1)\n           row_norms = np.linalg.norm(X, axis=1, keepdims=True)\n           row_norms[row_norms == 0] = 1.0\n           X = X / row_norms\n           X = X - X.mean(axis=0, keepdims=True)\n           U, S, Vt = np.linalg.svd(X, full_matrices=False)\n           coords = U[:, :2] * S[:2]\n           projection_sample[\"svd_x\"] = coords[:, 0]\n           projection_sample[\"svd_y\"] = coords[:, 1]\n           projection_sample[\"plot_label\"] = projection_sample[\"output_type\"].fillna(\"missing\").astype(str)\n           plt.figure(figsize=(8, 6))\n           for label, part in projection_sample.groupby(\"plot_label\"):\n               plt.scatter(\n                   part[\"svd_x\"],\n                   part[\"svd_y\"],\n                   s=12,\n                   alpha=0.65,\n                   label=label,\n               )\n           plt.title(\"Context Projection with Pure NumPy TF-IDF + SVD\")\n           plt.xlabel(\"SVD component 1\")\n           plt.ylabel(\"SVD component 2\")\n           plt.legend()\n           plot_paths[\"tfidf_svd_projection\"] = str(\n               save_plot(OUT_DIR / \"tfidf_svd_projection.png\")\n           )\n           projection_sample[\n               [\n                   \"uid\",\n                   \"output_type\",\n                   \"tool_name\",\n                   \"source_root\",\n                   \"svd_x\",\n                   \"svd_y\",\n               ]\n           ].to_csv(\n               OUT_DIR / \"tfidf_svd_projection_points.csv\",\n               index=False,\n           )\n           pd.DataFrame(vocab_items, columns=[\"token\", \"document_frequency\"]).to_csv(\n               OUT_DIR / \"projection_vocabulary.csv\",\n               index=False,\n           )\n       else:\n           rprint(\"[yellow]Skipping projection because vocabulary or row count is too small.[/yellow]\")\n   except Exception as error:\n       rprint(\"[yellow]Projection failed, but the rest of the tutorial will continue.[/yellow]\")\n       rprint(repr(error))\nelse:\n   rprint(\"[yellow]NumPy is not available, so projection is skipped.[/yellow]\")\nrprint(Panel.fit(\"[bold]Creating safe no-CoT chat/SFT exports[/bold]\"))\nSYSTEM_PROMPT = (\n   \"You are a coding agent. Given the user's context and prior transcript, \"\n   \"produce the next assistant action. If a tool call is needed, return a structured tool call JSON. \"\n   \"Do not expose hidden reasoning.\"\n)\ndef make_no_cot_target(row):\n   output_type = str(row.get(\"output_type\") or \"\")\n   if output_type == \"tool_use\":\n       tool_name = row.get(\"tool_name\") or \"unknown_tool\"\n       tool_args = row.get(\"tool_args\")\n       return json.dumps(\n           {\n               \"type\": \"tool_call\",\n               \"tool_name\": tool_name,\n               \"arguments\": tool_args,\n           },\n           ensure_ascii=False,\n           default=str,\n       )\n   payload = row.get(\"text_payload\")\n   if payload is None or str(payload).strip() == \"\":\n       payload = row.get(\"completion\", \"\")\n   return str(payload)\ndef make_chat_record(row, include_cot=False):\n   user_context = redact_possible_secrets(row.get(\"context\", \"\"))\n   target = redact_possible_secrets(make_no_cot_target(row))\n   messages = [\n       {\n           \"role\": \"system\",\n           \"content\": SYSTEM_PROMPT,\n       },\n       {\n           \"role\": \"user\",\n           \"content\": user_context,\n       },\n       {\n           \"role\": \"assistant\",\n           \"content\": target,\n       },\n   ]\n   record = {\n       \"uid\": row.get(\"uid\"),\n       \"session\": row.get(\"session\"),\n       \"model\": row.get(\"model\"),\n       \"origin\": row.get(\"origin\"),\n       \"output_type\": row.get(\"output_type\"),\n       \"tool_name\": row.get(\"tool_name\"),\n       \"messages\": messages,\n   }\n   if include_cot:\n       record[\"reasoning_trace\"] = redact_possible_secrets(row.get(\"cot\", \"\"))\n   return clean_for_json(record)\nexport_df = df.copy()\nexport_df = export_df.sample(frac=1.0, random_state=SEED).reset_index(drop=True)\nnum_rows = len(export_df)\ntrain_end = int(0.90 * num_rows)\nvalidation_end = int(0.95 * num_rows)\nsplits = {\n   \"train\": export_df.iloc[:train_end],\n   \"validation\": export_df.iloc[train_end:validation_end],\n   \"test\": export_df.iloc[validation_end:],\n}\nfor split_name, split_df in splits.items():\n   records = [\n       make_chat_record(row, include_cot=False)\n       for _, row in split_df.iterrows()\n   ]\n   output_path = OUT_DIR / f\"fable5_no_cot_chat_{split_name}.jsonl\"\n   write_jsonl(output_path, records)\n   rprint(\n       f\"[green]Saved[/green] {split_name}: \"\n       f\"{len(records)} records -> {output_path}\"\n   )\nif SAVE_COT_RESEARCH_EXPORT:\n   cot_records = [\n       make_chat_record(row, include_cot=True)\n       for _, row in export_df.iterrows()\n   ]\n   cot_path = OUT_DIR / \"fable5_cot_research_export.jsonl\"\n   write_jsonl(cot_path, cot_records)\n   rprint(f\"[yellow]Saved CoT-preserving research export:[/yellow] {cot_path}\")\nelse:\n   rprint(\n       \"[cyan]Skipped CoT-preserving export because \"\n       \"SAVE_COT_RESEARCH_EXPORT=False.[/cyan]\"\n   )\nanalysis_cols = [\n   \"uid\",\n   \"session\",\n   \"model\",\n   \"origin\",\n   \"source_file\",\n   \"source_root\",\n   \"output_type\",\n   \"tool_name\",\n   \"context_chars\",\n   \"cot_chars\",\n   \"completion_chars\",\n   \"text_payload_chars\",\n   \"possible_secret_anywhere\",\n]\nanalysis_df = df[analysis_cols].copy()\nanalysis_df.to_csv(\n   OUT_DIR / \"fable5_analysis_index.csv\",\n   index=False,\n)\nanalysis_df.to_pickle(\n   OUT_DIR / \"fable5_analysis_index.pkl\",\n)\nrprint(f\"[green]Saved analysis CSV:[/green] {OUT_DIR / 'fable5_analysis_index.csv'}\")\nrprint(f\"[green]Saved analysis pickle:[/green] {OUT_DIR / 'fable5_analysis_index.pkl'}\")\n```\n\nWe create a pure NumPy TF-IDF-style projection to visualize trace contexts without using scikit-learn or scipy. We then prepare safe no-CoT chat-style exports that turn each trace into a structured system, user, and assistant message format. We save the train, validation, and test CSV and pickle artifacts so the dataset is easier to inspect, reuse, and fine-tune.\n\n**Implementing Pure-Python Naive Bayes Classification Utilities**\n\n``` python\ndef stratified_train_test_indices(labels, test_size=0.2, seed=SEED):\n   rng = random.Random(seed)\n   label_to_indices = defaultdict(list)\n   for idx, label in enumerate(labels):\n       label_to_indices[label].append(idx)\n   train_indices = []\n   test_indices = []\n   for label, indices in label_to_indices.items():\n       indices = indices[:]\n       rng.shuffle(indices)\n       if len(indices) <= 1:\n           train_indices.extend(indices)\n           continue\n       n_test = max(1, int(round(len(indices) * test_size)))\n       if n_test >= len(indices):\n           n_test = len(indices) - 1\n       test_indices.extend(indices[:n_test])\n       train_indices.extend(indices[n_test:])\n   rng.shuffle(train_indices)\n   rng.shuffle(test_indices)\n   return train_indices, test_indices\nclass PureMultinomialNB:\n   def __init__(self, max_features=20000, min_df=2, alpha=1.0):\n       self.max_features = max_features\n       self.min_df = min_df\n       self.alpha = alpha\n       self.vocab = {}\n       self.labels = []\n       self.class_log_prior = {}\n       self.feature_log_prob = {}\n       self.class_token_totals = {}\n   def fit(self, texts, labels):\n       texts = list(texts)\n       labels = list(labels)\n       doc_freq = Counter()\n       for text in texts:\n           doc_freq.update(set(tokenize(text)))\n       vocab_items = [\n           item for item in doc_freq.items()\n           if item[1] >= self.min_df\n       ]\n       vocab_items = sorted(vocab_items, key=lambda item: item[1], reverse=True)\n       vocab_items = vocab_items[:self.max_features]\n       self.vocab = {token: idx for idx, (token, _) in enumerate(vocab_items)}\n       self.labels = sorted(set(labels))\n       class_doc_counts = Counter(labels)\n       total_docs = len(labels)\n       num_classes = len(self.labels)\n       token_counts_by_class = {label: Counter() for label in self.labels}\n       token_totals_by_class = {label: 0 for label in self.labels}\n       for text, label in zip(texts, labels):\n           counts = Counter(token for token in tokenize(text) if token in self.vocab)\n           token_counts_by_class[label].update(counts)\n           token_totals_by_class[label] += sum(counts.values())\n       vocab_size = max(len(self.vocab), 1)\n       for label in self.labels:\n           self.class_log_prior[label] = math.log(\n               (class_doc_counts[label] + self.alpha) /\n               (total_docs + self.alpha * num_classes)\n           )\n           denom = token_totals_by_class[label] + self.alpha * vocab_size\n           self.class_token_totals[label] = token_totals_by_class[label]\n           self.feature_log_prob[label] = {}\n           for token in self.vocab:\n               count = token_counts_by_class[label][token]\n               self.feature_log_prob[label][token] = math.log((count + self.alpha) / denom)\n       return self\n   def predict_one(self, text):\n       counts = Counter(token for token in tokenize(text) if token in self.vocab)\n       best_label = None\n       best_score = -float(\"inf\")\n       for label in self.labels:\n           score = self.class_log_prior[label]\n           feature_probs = self.feature_log_prob[label]\n           for token, count in counts.items():\n               score += count * feature_probs.get(token, 0.0)\n           if score > best_score:\n               best_score = score\n               best_label = label\n       return best_label\n   def predict(self, texts):\n       return [self.predict_one(text) for text in texts]\n   def top_tokens_for_class(self, label, n=20):\n       if label not in self.feature_log_prob:\n           return []\n       base_scores = self.feature_log_prob[label]\n       other_labels = [item for item in self.labels if item != label]\n       rows = []\n       for token in self.vocab:\n           this_score = base_scores[token]\n           if other_labels:\n               other_score = sum(\n                   self.feature_log_prob[other][token]\n                   for other in other_labels\n               ) / len(other_labels)\n               margin = this_score - other_score\n           else:\n               margin = this_score\n           rows.append((token, margin))\n       rows = sorted(rows, key=lambda item: item[1], reverse=True)\n       return rows[:n]\ndef evaluate_predictions(y_true, y_pred):\n   labels = sorted(set(y_true) | set(y_pred))\n   rows = []\n   total_correct = 0\n   total = len(y_true)\n   for label in labels:\n       tp = sum((true == label and pred == label) for true, pred in zip(y_true, y_pred))\n       fp = sum((true != label and pred == label) for true, pred in zip(y_true, y_pred))\n       fn = sum((true == label and pred != label) for true, pred in zip(y_true, y_pred))\n       support = sum(true == label for true in y_true)\n       precision = tp / (tp + fp) if (tp + fp) else 0.0\n       recall = tp / (tp + fn) if (tp + fn) else 0.0\n       f1 = 2 * precision * recall / (precision + recall) if (precision + recall) else 0.0\n       rows.append(\n           {\n               \"label\": label,\n               \"precision\": precision,\n               \"recall\": recall,\n               \"f1\": f1,\n               \"support\": support,\n           }\n       )\n       total_correct += tp\n   accuracy = total_correct / total if total else 0.0\n   macro_f1 = sum(row[\"f1\"] for row in rows) / len(rows) if rows else 0.0\n   weighted_f1 = (\n       sum(row[\"f1\"] * row[\"support\"] for row in rows) / total\n       if total\n       else 0.0\n   )\n   report_df = pd.DataFrame(rows)\n   metrics = {\n       \"accuracy\": accuracy,\n       \"macro_f1\": macro_f1,\n       \"weighted_f1\": weighted_f1,\n       \"labels\": labels,\n       \"rows\": rows,\n   }\n   return metrics, report_df\ndef confusion_matrix_df(y_true, y_pred):\n   labels = sorted(set(y_true) | set(y_pred))\n   matrix = pd.DataFrame(\n       0,\n       index=labels,\n       columns=labels,\n       dtype=int,\n   )\n   for true, pred in zip(y_true, y_pred):\n       matrix.loc[true, pred] += 1\n   matrix.index.name = \"actual\"\n   matrix.columns.name = \"predicted\"\n   return matrix\n```\n\nWe define pure-Python classification utilities for stratified train-test splitting, Naive Bayes training, prediction, and evaluation. We implement the classifier from scratch, so the tutorial stays stable even in Colab environments with broken scientific Python binaries. We also add reporting tools for precision, recall, F1 score, confusion matrices, and top class-specific tokens.\n\n**Training Naive Bayes Baselines and Keyword Search Over Traces**\n\n```\nrprint(Panel.fit(\"[bold]Baseline 1: Predict output_type from context using pure Python Naive Bayes[/bold]\"))\nmodel_artifacts = {}\nclassifier_df = df.dropna(subset=[\"output_type\"]).copy()\nclassifier_df = classifier_df[\n   classifier_df[\"output_type\"].astype(str).str.len() > 0\n].copy()\nif classifier_df[\"output_type\"].nunique() >= 2 and len(classifier_df) >= 30:\n   X_text = (\n       classifier_df[\"context\"]\n       .fillna(\"\")\n       .astype(str)\n       .map(lambda text: text[:12000])\n       .tolist()\n   )\n   y = classifier_df[\"output_type\"].astype(str).tolist()\n   train_indices, test_indices = stratified_train_test_indices(y, test_size=0.2, seed=SEED)\n   X_train = [X_text[i] for i in train_indices]\n   y_train = [y[i] for i in train_indices]\n   X_test = [X_text[i] for i in test_indices]\n   y_test = [y[i] for i in test_indices]\n   output_type_classifier = PureMultinomialNB(\n       max_features=20000,\n       min_df=2,\n       alpha=1.0,\n   )\n   output_type_classifier.fit(X_train, y_train)\n   predictions = output_type_classifier.predict(X_test)\n   output_type_metrics, output_report_df = evaluate_predictions(y_test, predictions)\n   output_matrix_df = confusion_matrix_df(y_test, predictions)\n   output_type_metrics[\"train_rows\"] = len(X_train)\n   output_type_metrics[\"test_rows\"] = len(X_test)\n   output_type_metrics[\"vocab_size\"] = len(output_type_classifier.vocab)\n   rprint(\"[bold]Output type classifier report:[/bold]\")\n   display(output_report_df)\n   display(output_matrix_df)\n   output_report_df.to_csv(OUT_DIR / \"output_type_classifier_report.csv\", index=False)\n   output_matrix_df.to_csv(OUT_DIR / \"output_type_confusion_matrix.csv\")\n   top_token_records = []\n   for label in output_type_classifier.labels:\n       for token, margin in output_type_classifier.top_tokens_for_class(label, n=25):\n           top_token_records.append(\n               {\n                   \"label\": label,\n                   \"token\": token,\n                   \"score_margin\": margin,\n               }\n           )\n   pd.DataFrame(top_token_records).to_csv(\n       OUT_DIR / \"output_type_top_tokens.csv\",\n       index=False,\n   )\n   with open(\n       OUT_DIR / \"output_type_classifier_metrics.json\",\n       \"w\",\n       encoding=\"utf-8\",\n   ) as file:\n       json.dump(output_type_metrics, file, ensure_ascii=False, indent=2)\n   model_artifacts[\"output_type_classifier_metrics\"] = str(\n       OUT_DIR / \"output_type_classifier_metrics.json\"\n   )\n   model_artifacts[\"output_type_classifier_report\"] = str(\n       OUT_DIR / \"output_type_classifier_report.csv\"\n   )\n   model_artifacts[\"output_type_confusion_matrix\"] = str(\n       OUT_DIR / \"output_type_confusion_matrix.csv\"\n   )\n   model_artifacts[\"output_type_top_tokens\"] = str(\n       OUT_DIR / \"output_type_top_tokens.csv\"\n   )\nelse:\n   rprint(\n       \"[yellow]Skipping output_type classifier because there are too few \"\n       \"classes or rows.[/yellow]\"\n   )\n   output_type_metrics = {}\nrprint(Panel.fit(\"[bold]Baseline 2: Predict tool_name from context using pure Python Naive Bayes[/bold]\"))\ntool_classifier_df = df[\n   df[\"output_type\"].eq(\"tool_use\")\n   & df[\"tool_name\"].fillna(\"\").astype(str).str.len().gt(0)\n].copy()\nif len(tool_classifier_df) >= 50 and tool_classifier_df[\"tool_name\"].nunique() >= 2:\n   top_tools = tool_classifier_df[\"tool_name\"].value_counts().head(12).index.tolist()\n   tool_classifier_df[\"tool_label\"] = tool_classifier_df[\"tool_name\"].where(\n       tool_classifier_df[\"tool_name\"].isin(top_tools),\n       \"__OTHER__\",\n   )\n   y_tool = tool_classifier_df[\"tool_label\"].astype(str).tolist()\n   X_tool_text = (\n       tool_classifier_df[\"context\"]\n       .fillna(\"\")\n       .astype(str)\n       .map(lambda text: text[:12000])\n       .tolist()\n   )\n   if len(set(y_tool)) >= 2:\n       train_indices, test_indices = stratified_train_test_indices(y_tool, test_size=0.2, seed=SEED)\n       X_train = [X_tool_text[i] for i in train_indices]\n       y_train = [y_tool[i] for i in train_indices]\n       X_test = [X_tool_text[i] for i in test_indices]\n       y_test = [y_tool[i] for i in test_indices]\n       tool_classifier = PureMultinomialNB(\n           max_features=20000,\n           min_df=2,\n           alpha=1.0,\n       )\n       tool_classifier.fit(X_train, y_train)\n       tool_predictions = tool_classifier.predict(X_test)\n       tool_metrics, tool_report_df = evaluate_predictions(y_test, tool_predictions)\n       tool_matrix_df = confusion_matrix_df(y_test, tool_predictions)\n       tool_metrics[\"train_rows\"] = len(X_train)\n       tool_metrics[\"test_rows\"] = len(X_test)\n       tool_metrics[\"vocab_size\"] = len(tool_classifier.vocab)\n       rprint(\"[bold]Tool classifier report:[/bold]\")\n       display(tool_report_df)\n       display(tool_matrix_df)\n       tool_report_df.to_csv(OUT_DIR / \"tool_name_classifier_report.csv\", index=False)\n       tool_matrix_df.to_csv(OUT_DIR / \"tool_name_confusion_matrix.csv\")\n       top_tool_token_records = []\n       for label in tool_classifier.labels:\n           for token, margin in tool_classifier.top_tokens_for_class(label, n=25):\n               top_tool_token_records.append(\n                   {\n                       \"label\": label,\n                       \"token\": token,\n                       \"score_margin\": margin,\n                   }\n               )\n       pd.DataFrame(top_tool_token_records).to_csv(\n           OUT_DIR / \"tool_name_top_tokens.csv\",\n           index=False,\n       )\n       with open(\n           OUT_DIR / \"tool_name_classifier_metrics.json\",\n           \"w\",\n           encoding=\"utf-8\",\n       ) as file:\n           json.dump(tool_metrics, file, ensure_ascii=False, indent=2)\n       model_artifacts[\"tool_name_classifier_metrics\"] = str(\n           OUT_DIR / \"tool_name_classifier_metrics.json\"\n       )\n       model_artifacts[\"tool_name_classifier_report\"] = str(\n           OUT_DIR / \"tool_name_classifier_report.csv\"\n       )\n       model_artifacts[\"tool_name_confusion_matrix\"] = str(\n           OUT_DIR / \"tool_name_confusion_matrix.csv\"\n       )\n       model_artifacts[\"tool_name_top_tokens\"] = str(\n           OUT_DIR / \"tool_name_top_tokens.csv\"\n       )\n   else:\n       rprint(\"[yellow]Skipping tool classifier because labels collapsed to one class.[/yellow]\")\n       tool_metrics = {}\nelse:\n   rprint(\n       \"[yellow]Skipping tool classifier because there are too few tool-use \"\n       \"rows or tool classes.[/yellow]\"\n   )\n   tool_metrics = {}\nrprint(Panel.fit(\"[bold]Building simple keyword search helper[/bold]\"))\ndef search_rows(keyword, limit=5, search_cols=(\"context\", \"cot\", \"completion\", \"text_payload\")):\n   keyword = str(keyword).lower()\n   mask = pd.Series(False, index=df.index)\n   for column in search_cols:\n       mask = mask | (\n           df[column]\n           .fillna(\"\")\n           .astype(str)\n           .str.lower()\n           .str.contains(re.escape(keyword), regex=True)\n       )\n   hits = df[mask].head(limit)\n   results = []\n   for _, row in hits.iterrows():\n       results.append(\n           {\n               \"uid\": row.get(\"uid\"),\n               \"session\": row.get(\"session\"),\n               \"output_type\": row.get(\"output_type\"),\n               \"tool_name\": row.get(\"tool_name\"),\n               \"context_preview\": preview_text(row.get(\"context\"), 400),\n               \"payload_preview\": preview_text(row.get(\"text_payload\"), 400),\n           }\n       )\n   return results\nexample_queries = [\n   \"Bash\",\n   \"Write\",\n   \"browser\",\n   \"test\",\n   \"README\",\n]\nsearch_demo = {\n   query: search_rows(query, limit=2)\n   for query in example_queries\n}\nwith open(\n   OUT_DIR / \"keyword_search_demo.json\",\n   \"w\",\n   encoding=\"utf-8\",\n) as file:\n   json.dump(search_demo, file, ensure_ascii=False, indent=2)\nrprint(\"[bold]Example keyword search results:[/bold]\")\nrprint(safe_json_dumps(search_demo, max_chars=5000))\nsummary = {\n   \"dataset_id\": DATASET_ID,\n   \"flat_jsonl_filename\": FLAT_JSONL_FILENAME,\n   \"output_directory\": str(OUT_DIR),\n   \"repo_file_summary\": file_summary,\n   \"rows\": int(len(df)),\n   \"columns\": list(df.columns),\n   \"output_type_distribution\": (\n       df[\"output_type\"]\n       .fillna(\"missing\")\n       .value_counts()\n       .to_dict()\n   ),\n   \"top_tools\": (\n       df.loc[df[\"output_type\"].eq(\"tool_use\"), \"tool_name\"]\n       .replace(\"\", \"unknown\")\n       .value_counts()\n       .head(20)\n       .to_dict()\n   ),\n   \"top_source_roots\": (\n       df[\"source_root\"]\n       .fillna(\"unknown\")\n       .value_counts()\n       .head(20)\n       .to_dict()\n   ),\n   \"length_summary\": {\n       column: {\n           \"mean\": float(df[column].mean()),\n           \"median\": float(df[column].median()),\n           \"p90\": float(df[column].quantile(0.90)),\n           \"p95\": float(df[column].quantile(0.95)),\n           \"max\": int(df[column].max()),\n       }\n       for column in [\n           \"context_chars\",\n           \"cot_chars\",\n           \"completion_chars\",\n           \"text_payload_chars\",\n       ]\n   },\n   \"possible_secret_rows\": int(df[\"possible_secret_anywhere\"].sum()),\n   \"plots\": plot_paths,\n   \"model_artifacts\": model_artifacts,\n   \"safe_exports\": {\n       \"train\": str(OUT_DIR / \"fable5_no_cot_chat_train.jsonl\"),\n       \"validation\": str(OUT_DIR / \"fable5_no_cot_chat_validation.jsonl\"),\n       \"test\": str(OUT_DIR / \"fable5_no_cot_chat_test.jsonl\"),\n   },\n   \"analysis_files\": {\n       \"csv\": str(OUT_DIR / \"fable5_analysis_index.csv\"),\n       \"pickle\": str(OUT_DIR / \"fable5_analysis_index.pkl\"),\n       \"keyword_search_demo\": str(OUT_DIR / \"keyword_search_demo.json\"),\n   },\n}\nwith open(\n   OUT_DIR / \"analysis_summary.json\",\n   \"w\",\n   encoding=\"utf-8\",\n) as file:\n   json.dump(clean_for_json(summary), file, ensure_ascii=False, indent=2, default=str)\nFENCE = chr(96) * 3\nreport_md = (\n   \"# Fable 5 Traces Advanced Tutorial Report\\n\\n\"\n   \"## Dataset\\n\\n\"\n   f\"- Dataset: `{DATASET_ID}`\\n\"\n   f\"- Flat JSONL: `{FLAT_JSONL_FILENAME}`\\n\"\n   f\"- Rows loaded: `{len(df):,}`\\n\"\n   f\"- Unique source sessions: `{df['session'].nunique(dropna=True):,}`\\n\"\n   f\"- Unique models: `{df['model'].nunique(dropna=True):,}`\\n\\n\"\n   \"## Important safety note\\n\\n\"\n   \"This tutorial treats the dataset as agent telemetry. It previews and analyzes commands, \"\n   \"tool calls, file edits, and transcript text, but it never executes commands found inside \"\n   \"the traces.\\n\\n\"\n   f\"Potential secret-like patterns detected: `{int(df['possible_secret_anywhere'].sum()):,}` rows.\\n\"\n   \"Exports redact common API-key/token-like patterns.\\n\\n\"\n   \"## Output type distribution\\n\\n\"\n   f\"{FENCE}json\\n\"\n   f\"{json.dumps(clean_for_json(summary['output_type_distribution']), indent=2, ensure_ascii=False)}\\n\"\n   f\"{FENCE}\\n\\n\"\n   \"## Top tools\\n\\n\"\n   f\"{FENCE}json\\n\"\n   f\"{json.dumps(clean_for_json(summary['top_tools']), indent=2, ensure_ascii=False)}\\n\"\n   f\"{FENCE}\\n\\n\"\n   \"## Saved files\\n\\n\"\n   \"- `analysis_summary.json`\\n\"\n   \"- `fable5_analysis_index.csv`\\n\"\n   \"- `fable5_analysis_index.pkl`\\n\"\n   \"- `fable5_no_cot_chat_train.jsonl`\\n\"\n   \"- `fable5_no_cot_chat_validation.jsonl`\\n\"\n   \"- `fable5_no_cot_chat_test.jsonl`\\n\"\n   \"- plot PNG files\\n\"\n   \"- baseline classifier metrics, when enough rows/classes are available\\n\\n\"\n   \"## Recommended next steps\\n\\n\"\n   \"1. Inspect `fable5_no_cot_chat_train.jsonl` before any fine-tuning.\\n\"\n   \"2. Keep the dataset license in mind before model training or redistribution.\\n\"\n   \"3. Avoid training directly on raw terminal outputs without additional privacy and safety filtering.\\n\"\n   \"4. Start with the no-CoT chat export unless your research explicitly requires reasoning-trace supervision.\\n\"\n)\nwith open(\n   OUT_DIR / \"REPORT.md\",\n   \"w\",\n   encoding=\"utf-8\",\n) as file:\n   file.write(report_md)\nrprint(\n   Panel.fit(\n       f\"[bold green]Tutorial complete.[/bold green]\\n\\n\"\n       f\"Artifacts saved in:\\n{OUT_DIR}\\n\\n\"\n       f\"Key files:\\n\"\n       f\"- {OUT_DIR / 'REPORT.md'}\\n\"\n       f\"- {OUT_DIR / 'analysis_summary.json'}\\n\"\n       f\"- {OUT_DIR / 'fable5_no_cot_chat_train.jsonl'}\\n\"\n       f\"- {OUT_DIR / 'fable5_analysis_index.csv'}\",\n       title=\"Done\",\n   )\n)\ndisplay(\n   pd.DataFrame(\n       {\n           \"artifact\": [\n               \"Report\",\n               \"Summary JSON\",\n               \"No-CoT train export\",\n               \"No-CoT validation export\",\n               \"No-CoT test export\",\n               \"Analysis CSV\",\n               \"Analysis pickle\",\n               \"Keyword search demo\",\n           ],\n           \"path\": [\n               str(OUT_DIR / \"REPORT.md\"),\n               str(OUT_DIR / \"analysis_summary.json\"),\n               str(OUT_DIR / \"fable5_no_cot_chat_train.jsonl\"),\n               str(OUT_DIR / \"fable5_no_cot_chat_validation.jsonl\"),\n               str(OUT_DIR / \"fable5_no_cot_chat_test.jsonl\"),\n               str(OUT_DIR / \"fable5_analysis_index.csv\"),\n               str(OUT_DIR / \"fable5_analysis_index.pkl\"),\n               str(OUT_DIR / \"keyword_search_demo.json\"),\n           ],\n       }\n   )\n)\n```\n\nWe train a baseline model to predict whether the assistant’s output is text or a tool call based on the trace context. We also train a second baseline that predicts the likely tool name for tool-use rows and save the evaluation artifacts. We finish by adding keyword search, writing the final summary JSON and Markdown report, and displaying the saved tutorial outputs.\n\n**Conclusion**\n\nIn conclusion, we have a practical and reliable workflow for exploring Fable 5 Traces without depending on packages that may break in a Colab runtime. We moved from raw Hugging Face files to structured analysis tables, safe previews, plots, searchable examples, cleaned chat-style exports, and baseline modeling artifacts. We treated the traces as agent telemetry, so we redacted possible secrets, avoided executing any commands from the dataset, and kept the chain of thought out of the default training export.\n\nCheck out the ** Full Codes here**.\n\n**Also, feel free to follow us on**\n\n**and don’t forget to join our**[Twitter](https://x.com/intent/follow?screen_name=marktechpost)\n\n**and Subscribe to**\n\n[150k+ML SubReddit](https://www.reddit.com/r/machinelearningnews/)**. Wait! are you on telegram?**\n\n[our Newsletter](https://www.aidevsignals.com/)\n\n[now you can join us on telegram as well.](https://t.me/machinelearningresearchnews)Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? [Connect with us](https://forms.gle/wbash1wF6efRj8G58)\n\nSana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.\n\n- Sana Hassan\n- Sana Hassan\n- Sana Hassan\n- Sana Hassan", "url": "https://wpnews.pro/news/building-a-stable-fable-5-traces-workflow-in-colab-parsing-tool-calls-auditing", "canonical_source": "https://www.marktechpost.com/2026/06/28/building-a-stable-fable-5-traces-workflow-in-colab-parsing-tool-calls-auditing-data-and-training-baselines/", "published_at": "2026-06-28 07:02:54+00:00", "updated_at": "2026-06-28 07:12:08.291466+00:00", "lang": "en", "topics": ["ai-agents", "machine-learning", "natural-language-processing", "developer-tools"], "entities": ["Hugging Face", "Fable 5 Traces", "Google Colab", "Naive Bayes"], "alternates": {"html": "https://wpnews.pro/news/building-a-stable-fable-5-traces-workflow-in-colab-parsing-tool-calls-auditing", "markdown": "https://wpnews.pro/news/building-a-stable-fable-5-traces-workflow-in-colab-parsing-tool-calls-auditing.md", "text": "https://wpnews.pro/news/building-a-stable-fable-5-traces-workflow-in-colab-parsing-tool-calls-auditing.txt", "jsonld": "https://wpnews.pro/news/building-a-stable-fable-5-traces-workflow-in-colab-parsing-tool-calls-auditing.jsonld"}}