Building a Stable Fable 5 Traces Workflow in Colab: Parsing Tool Calls, Auditing Data, and Training Baselines

A tutorial demonstrates building a stable workflow for the Fable 5 Traces dataset in Google Colab, including parsing tool calls, auditing data for secrets, and training Naive Bayes baselines to predict assistant output types and tool usage from trace context.

In this tutorial , we work with the from Hugging Face and build a complete workflow around real coding-agent trace data. We start by setting up a lightweight environment that avoids fragile dependencies such as datasets, scikit-learn, and scipy. Then we manually download and parse the merged JSONL file to keep the notebook stable in Colab. From there, we inspect repository files, preview raw trace examples, normalize tool calls and text outputs, audit the dataset structure, detect potential secret-like patterns, and visualize key distributions, including output types, tools, source roots, and text lengths. We also create safe no-CoT chat/SFT exports, build a simple keyword-search helper, and train pure-Python Naive Bayes baselines to assess whether trace context can predict the assistant’s output type and tool usage. https://huggingface.co/datasets/Glint-Research/Fable-5-traces Fable 5 Traces dataset Setting Up the Fable 5 Traces Colab Environment and Helpers python import os import sys import json import re import math import random import subprocess from pathlib import Path from collections import Counter, defaultdict def install packages : packages = "huggingface hub =0.23.0", "rich =13.0.0", "tqdm =4.66.0", subprocess.run sys.executable, "-m", "pip", "install", "-q", "-U", "--upgrade-strategy", "only-if-needed", packages, , check=False, install packages import pandas as pd import matplotlib.pyplot as plt try: import numpy as np except Exception: np = None from tqdm.auto import tqdm from rich import print as rprint from rich.panel import Panel from rich.table import Table from huggingface hub import HfApi, hf hub download from IPython.display import display DATASET ID = "Glint-Research/Fable-5-traces" FLAT JSONL FILENAME = "fable5 cot merged.jsonl" OUT DIR = Path "/content/fable5 traces tutorial outputs" OUT DIR.mkdir parents=True, exist ok=True SEED = 42 random.seed SEED if np is not None: np.random.seed SEED MAX PREVIEW CHARS = 900 N AGENT TRACE PREVIEWS = 2 N SAFE DATASET PREVIEWS = 3 SAVE COT RESEARCH EXPORT = False MAX ROWS TO LOAD = None rprint Panel.fit f" bold Fable 5 Traces Advanced Tutorial /bold \n" f"Dataset: {DATASET ID}\n" f"Output directory: {OUT DIR}\n" f"Manual JSONL loading: True\n" f"CoT research export enabled: {SAVE COT RESEARCH EXPORT}", title="Setup", SECRET PATTERNS = r"sk- A-Za-z0-9 \- {20,}", r"hf A-Za-z0-9 \- {20,}", r"github pat A-Za-z0-9 {20,}", r"ghp A-Za-z0-9 {20,}", r"xox baprs - A-Za-z0-9\- {20,}", r"AKIA 0-9A-Z {16}", r" ?i: api - ?key|secret|token|password \s := \s '\" ? ^'\"\s {8,} ", SECRET RE = re.compile "|".join f" ?:{pattern} " for pattern in SECRET PATTERNS TOKEN RE = re.compile r" A-Za-z A-Za-z 0-9 {1,}| ./\\- {2,}| {} \ \ :=< +" def safe json dumps obj, max chars=None : try: text = json.dumps obj, ensure ascii=False, indent=2, default=str except Exception: text = str obj if max chars is not None and len text max chars: return text :max chars + "\n... truncated " return text def is missing scalar value : if value is None: return True if isinstance value, list, dict, tuple, set : return False try: return bool pd.isna value except Exception: return False def clean for json value : if is missing scalar value : return None if isinstance value, dict : return {str k : clean for json v for k, v in value.items } if isinstance value, list : return clean for json v for v in value if isinstance value, tuple : return clean for json v for v in value if np is not None: if isinstance value, np.integer : return int value if isinstance value, np.floating : if math.isnan float value : return None return float value if isinstance value, np.ndarray : return value.tolist return value def redact possible secrets text : if text is None: return "" text = str text return SECRET RE.sub " REDACTED POSSIBLE SECRET ", text def contains possible secret text : if text is None: return False return bool SECRET RE.search str text def preview text text, max chars=MAX PREVIEW CHARS : text = redact possible secrets text text = re.sub r"\s+", " ", text .strip if len text max chars: return text :max chars + " ... truncated " return text We begin by setting up the Colab environment with only the lightweight packages needed for this workflow. We define the dataset path, output directory, random seed, preview limits, and export options so the tutorial behaves consistently. We also create the first set of helper functions for safe JSON formatting, secret redaction, missing-value handling, and clean text previews. Building Parsing Utilities for Tool Calls and Text Outputs python def maybe parse json string value : if isinstance value, str : stripped = value.strip if stripped.startswith "{" and stripped.endswith "}" or stripped.startswith " " and stripped.endswith " " : try: return json.loads stripped except Exception: return value return value def normalize output obj value : return maybe parse json string value def extract tool name output : output = normalize output obj output if isinstance output, dict : direct keys = "name", "tool name", "tool", "function", "command name", "recipient name", "toolName", "callee", for key in direct keys: value = output.get key if isinstance value, str and value.strip : return value.strip nested keys = "tool call", "toolCall", "function call", "call", "action", for nested key in nested keys: nested = output.get nested key if isinstance nested, dict : found = extract tool name nested if found: return found output type = output.get "type" if isinstance output type, str : output type = output type.strip if output type and output type.lower not in {"tool use", "text", "message"}: return output type return "" def extract tool args output : output = normalize output obj output if isinstance output, dict : direct arg keys = "input", "args", "arguments", "parameters", "kwargs", "json", "payload", for key in direct arg keys: if key in output: return output key nested keys = "tool call", "toolCall", "function call", "call", "action", for nested key in nested keys: nested = output.get nested key if isinstance nested, dict : args = extract tool args nested if args not in None, "", {} : return args ignored = { "name", "tool name", "tool", "function", "command name", "recipient name", "toolName", "callee", "type", } return {key: value for key, value in output.items if key not in ignored} return {} def extract text payload output : output = normalize output obj output if isinstance output, str : return output if isinstance output, dict : text keys = "text", "content", "message", "output", "value", "result", for key in text keys: value = output.get key if isinstance value, str : return value if isinstance value, list : return safe json dumps value if isinstance value, dict : nested = extract text payload value if nested: return nested return safe json dumps output return str output def robust len value : if value is None: return 0 return len str value def source root source file : source file = str source file or "" .replace "\\", "/" if not source file: return "unknown" parts = part for part in source file.split "/" if part for marker in "projects", "AIArchives", "archives", "claude" : if marker in parts: idx = parts.index marker if idx + 1 < len parts : return parts idx + 1 if len parts = 2: return parts -2 if parts: return parts 0 return "unknown" def write jsonl path, records : path = Path path with path.open "w", encoding="utf-8" as file: for record in records: file.write json.dumps clean for json record , ensure ascii=False, default=str + "\n" def save plot path : path = Path path plt.tight layout plt.savefig path, dpi=160, bbox inches="tight" plt.show plt.close return path def print basic table title, rows, columns= "Metric", "Value" : table = Table title=title for column in columns: table.add column str column for row in rows: table.add row str item for item in row rprint table def tokenize text, max chars=12000 : text = str text or "" :max chars .lower return TOKEN RE.findall text def load jsonl manual path, max rows=None : records = bad lines = with open path, "r", encoding="utf-8" as file: for line number, line in tqdm enumerate file, start=1 , desc="Reading JSONL" : line = line.strip if not line: continue try: records.append json.loads line except Exception as error: bad lines.append { "line number": line number, "error": repr error , "preview": line :500 , } if max rows is not None and len records = max rows: break return records, bad lines We build the core parsing utilities that turn raw output fields into usable tool names, tool arguments, and text payloads. We also define helpers for measuring text length, identifying source roots, writing JSONL files, saving plots, and printing clean tables. We finish this snippet by adding tokenization and manual JSONL loading to avoid fragile dataset-loading dependencies. Inspecting the Hugging Face Repository and Loading JSONL Traces rprint Panel.fit " bold Inspecting Hugging Face dataset repository /bold " api = HfApi files = api.list repo files repo id=DATASET ID, repo type="dataset" pi trace files = file for file in files if file.startswith "pi-traces/" and file.endswith ".jsonl" file summary = { "total repo files": len files , "jsonl files": sum file.endswith ".jsonl" for file in files , "pi trace files": len pi trace files , "claude files": sum file.startswith "claude/" for file in files , "has flat jsonl": FLAT JSONL FILENAME in files, } print basic table "Repository File Summary", key, value for key, value in file summary.items , rprint " bold Sample repository files: /bold " for file in files :20 : print " -", file rprint Panel.fit " bold Manual raw pi-trace preview /bold " pi examples = if pi trace files: for trace file in pi trace files :N AGENT TRACE PREVIEWS : try: local trace path = hf hub download repo id=DATASET ID, repo type="dataset", filename=trace file, trace records, trace bad lines = load jsonl manual local trace path, max rows=1 if trace records: example = trace records 0 pi examples.append example preview payload = { "trace file": trace file, "keys": list example.keys , "preview": example, } rprint Panel safe json dumps preview payload, max chars=3000 , title=f"Raw pi-trace preview: {trace file}", if trace bad lines: rprint f" yellow Bad JSONL lines in {trace file}: {len trace bad lines } /yellow " except Exception as error: rprint f" yellow Could not preview {trace file} /yellow " rprint repr error else: rprint " yellow No pi-traces JSONL files found. /yellow " rprint Panel.fit " bold Downloading flat merged JSONL from Hugging Face Hub /bold " flat path = hf hub download repo id=DATASET ID, repo type="dataset", filename=FLAT JSONL FILENAME, rprint f" green Downloaded flat file: /green {flat path}" rprint Panel.fit " bold Loading flat JSONL manually /bold " records, bad lines = load jsonl manual flat path, max rows=MAX ROWS TO LOAD if bad lines: bad lines path = OUT DIR / "bad jsonl lines.json" with open bad lines path, "w", encoding="utf-8" as file: json.dump bad lines, file, ensure ascii=False, indent=2 rprint f" yellow Bad JSONL lines found: {len bad lines } - {bad lines path} /yellow " df = pd.DataFrame.from records records rprint f" green Loaded rows: /green {len df :,}" rprint f" green DataFrame shape: /green {df.shape}" rprint " bold Columns: /bold " print list df.columns display df.head 3 expected cols = "uid", "source file", "session", "model", "context", "cot", "output type", "output", "completion", "origin", for column in expected cols: if column not in df.columns: df column = None df "output norm" = df "output" .map normalize output obj df "tool name" = df "output norm" .map extract tool name df "tool args" = df "output norm" .map extract tool args df "text payload" = df "output norm" .map extract text payload df "context chars" = df "context" .map robust len df "cot chars" = df "cot" .map robust len df "completion chars" = df "completion" .map robust len df "text payload chars" = df "text payload" .map robust len df "source root" = df "source file" .map source root df "possible secret in context" = df "context" .map contains possible secret df "possible secret in completion" = df "completion" .map contains possible secret df "possible secret anywhere" = df "possible secret in context" | df "possible secret in completion" We inspect the Hugging Face dataset repository and summarize the number of files, JSONL traces, and flat-merged files available. We manually preview a few raw Pi trace files to understand the structure without relying on the datasets library. We then download the merged JSONL file, load it into a DataFrame, and normalize key fields for later analysis. Auditing Dataset Structure and Visualizing Trace Distributions audit rows = "rows", len df , "columns", len df.columns , "unique uid", df "uid" .nunique dropna=True , "duplicate uid rows", int df "uid" .duplicated .sum , "unique sessions", df "session" .nunique dropna=True , "unique models", df "model" .nunique dropna=True , "missing context", int df "context" .isna .sum , "missing cot", int df "cot" .isna .sum , "missing output", int df "output" .isna .sum , "rows with possible secret pattern", int df "possible secret anywhere" .sum , "median context chars", round float df "context chars" .median , 2 , "median cot chars", round float df "cot chars" .median , 2 , "median completion chars", round float df "completion chars" .median , 2 , "max completion chars", int df "completion chars" .max , print basic table "Flat JSONL Audit", audit rows rprint "\n bold Output type distribution: /bold " display df "output type" .value counts dropna=False .to frame "rows" rprint "\n bold Model distribution: /bold " display df "model" .value counts dropna=False .to frame "rows" .head 20 rprint "\n bold Origin distribution: /bold " display df "origin" .value counts dropna=False .to frame "rows" rprint "\n bold Top source roots: /bold " display df "source root" .value counts .head 20 .to frame "rows" rprint "\n bold Top tool names: /bold " display df.loc df "output type" .eq "tool use" , "tool name" .replace "", pd.NA .value counts dropna=False .head 25 .to frame "rows" rprint Panel.fit " bold Safe previews /bold \n" "These previews redact common secret-like patterns and never execute trace commands." sample df = df.sample n=min N SAFE DATASET PREVIEWS, len df , random state=SEED, .reset index drop=True for index, row in sample df.iterrows : payload = { "uid": row.get "uid" , "session": row.get "session" , "model": row.get "model" , "origin": row.get "origin" , "output type": row.get "output type" , "tool name": row.get "tool name" , "context preview": preview text row.get "context" , "cot preview": preview text row.get "cot" , "text or tool payload preview": preview text row.get "text payload" , } rprint Panel safe json dumps payload, max chars=4000 , title=f"Safe Row Preview {index}", rprint Panel.fit " bold Creating plots /bold " plot paths = {} output counts = df "output type" .fillna "missing" .value counts plt.figure figsize= 8, 5 output counts.plot kind="bar" plt.title "Output Type Distribution" plt.xlabel "Output Type" plt.ylabel "Rows" plt.xticks rotation=25, ha="right" plot paths "output type distribution" = str save plot OUT DIR / "output type distribution.png" tool counts = df.loc df "output type" .eq "tool use" , "tool name" .replace "", "unknown" .value counts .head 20 if len tool counts 0: plt.figure figsize= 9, 6 tool counts.sort values .plot kind="barh" plt.title "Top Tool Names" plt.xlabel "Rows" plt.ylabel "Tool" plot paths "top tools" = str save plot OUT DIR / "top tools.png" else: rprint " yellow No tool-use rows found for tool plot. /yellow " source counts = df "source root" .fillna "unknown" .value counts .head 20 plt.figure figsize= 9, 6 source counts.sort values .plot kind="barh" plt.title "Top Source Roots" plt.xlabel "Rows" plt.ylabel "Source Root" plot paths "top source roots" = str save plot OUT DIR / "top source roots.png" length cols = "context chars", "cot chars", "completion chars", "text payload chars", for column in length cols: plt.figure figsize= 8, 5 clipped = df column .clip upper=df column .quantile 0.99 plt.hist clipped, bins=50 plt.title f"{column} Distribution, Clipped at P99" plt.xlabel "Characters" plt.ylabel "Rows" plot paths f"{column} histogram" = str save plot OUT DIR / f"{column} histogram.png" We audit the dataset by checking row counts, unique sessions, duplicate IDs, missing fields, text lengths, and possible secret-like patterns. We display important distributions across output types, models, origins, source roots, and tool names to understand the data’s shape. We also create safe previews and visual plots so we can inspect the traces without executing any commands. Projecting Traces and Exporting Safe No-CoT Chat Datasets rprint Panel.fit " bold Creating pure NumPy TF-IDF-style projection /bold " if np is not None: try: projection sample = df.sample n=min 1000, len df , random state=SEED .copy projection texts = projection sample "context" .fillna "" .astype str .tolist doc tokens = tokenize text, max chars=8000 for text in projection texts doc freq = Counter for tokens in doc tokens: doc freq.update set tokens vocab items = item for item in doc freq.items if item 1 = 2 and len item 0 1 vocab items = sorted vocab items, key=lambda item: item 1 , reverse=True :1000 vocab = {token: idx for idx, token, in enumerate vocab items } if len vocab = 3 and len doc tokens = 10: X = np.zeros len doc tokens , len vocab , dtype=np.float32 df counts = np.zeros len vocab , dtype=np.float32 for row idx, tokens in enumerate doc tokens : counts = Counter token for token in tokens if token in vocab for token, count in counts.items : col idx = vocab token X row idx, col idx = float count for token in counts.keys : df counts vocab token += 1.0 idf = np.log 1.0 + len doc tokens / 1.0 + df counts + 1.0 X = X idf.reshape 1, -1 row norms = np.linalg.norm X, axis=1, keepdims=True row norms row norms == 0 = 1.0 X = X / row norms X = X - X.mean axis=0, keepdims=True U, S, Vt = np.linalg.svd X, full matrices=False coords = U :, :2 S :2 projection sample "svd x" = coords :, 0 projection sample "svd y" = coords :, 1 projection sample "plot label" = projection sample "output type" .fillna "missing" .astype str plt.figure figsize= 8, 6 for label, part in projection sample.groupby "plot label" : plt.scatter part "svd x" , part "svd y" , s=12, alpha=0.65, label=label, plt.title "Context Projection with Pure NumPy TF-IDF + SVD" plt.xlabel "SVD component 1" plt.ylabel "SVD component 2" plt.legend plot paths "tfidf svd projection" = str save plot OUT DIR / "tfidf svd projection.png" projection sample "uid", "output type", "tool name", "source root", "svd x", "svd y", .to csv OUT DIR / "tfidf svd projection points.csv", index=False, pd.DataFrame vocab items, columns= "token", "document frequency" .to csv OUT DIR / "projection vocabulary.csv", index=False, else: rprint " yellow Skipping projection because vocabulary or row count is too small. /yellow " except Exception as error: rprint " yellow Projection failed, but the rest of the tutorial will continue. /yellow " rprint repr error else: rprint " yellow NumPy is not available, so projection is skipped. /yellow " rprint Panel.fit " bold Creating safe no-CoT chat/SFT exports /bold " SYSTEM PROMPT = "You are a coding agent. Given the user's context and prior transcript, " "produce the next assistant action. If a tool call is needed, return a structured tool call JSON. " "Do not expose hidden reasoning." def make no cot target row : output type = str row.get "output type" or "" if output type == "tool use": tool name = row.get "tool name" or "unknown tool" tool args = row.get "tool args" return json.dumps { "type": "tool call", "tool name": tool name, "arguments": tool args, }, ensure ascii=False, default=str, payload = row.get "text payload" if payload is None or str payload .strip == "": payload = row.get "completion", "" return str payload def make chat record row, include cot=False : user context = redact possible secrets row.get "context", "" target = redact possible secrets make no cot target row messages = { "role": "system", "content": SYSTEM PROMPT, }, { "role": "user", "content": user context, }, { "role": "assistant", "content": target, }, record = { "uid": row.get "uid" , "session": row.get "session" , "model": row.get "model" , "origin": row.get "origin" , "output type": row.get "output type" , "tool name": row.get "tool name" , "messages": messages, } if include cot: record "reasoning trace" = redact possible secrets row.get "cot", "" return clean for json record export df = df.copy export df = export df.sample frac=1.0, random state=SEED .reset index drop=True num rows = len export df train end = int 0.90 num rows validation end = int 0.95 num rows splits = { "train": export df.iloc :train end , "validation": export df.iloc train end:validation end , "test": export df.iloc validation end: , } for split name, split df in splits.items : records = make chat record row, include cot=False for , row in split df.iterrows output path = OUT DIR / f"fable5 no cot chat {split name}.jsonl" write jsonl output path, records rprint f" green Saved /green {split name}: " f"{len records } records - {output path}" if SAVE COT RESEARCH EXPORT: cot records = make chat record row, include cot=True for , row in export df.iterrows cot path = OUT DIR / "fable5 cot research export.jsonl" write jsonl cot path, cot records rprint f" yellow Saved CoT-preserving research export: /yellow {cot path}" else: rprint " cyan Skipped CoT-preserving export because " "SAVE COT RESEARCH EXPORT=False. /cyan " analysis cols = "uid", "session", "model", "origin", "source file", "source root", "output type", "tool name", "context chars", "cot chars", "completion chars", "text payload chars", "possible secret anywhere", analysis df = df analysis cols .copy analysis df.to csv OUT DIR / "fable5 analysis index.csv", index=False, analysis df.to pickle OUT DIR / "fable5 analysis index.pkl", rprint f" green Saved analysis CSV: /green {OUT DIR / 'fable5 analysis index.csv'}" rprint f" green Saved analysis pickle: /green {OUT DIR / 'fable5 analysis index.pkl'}" We create a pure NumPy TF-IDF-style projection to visualize trace contexts without using scikit-learn or scipy. We then prepare safe no-CoT chat-style exports that turn each trace into a structured system, user, and assistant message format. We save the train, validation, and test CSV and pickle artifacts so the dataset is easier to inspect, reuse, and fine-tune. Implementing Pure-Python Naive Bayes Classification Utilities python def stratified train test indices labels, test size=0.2, seed=SEED : rng = random.Random seed label to indices = defaultdict list for idx, label in enumerate labels : label to indices label .append idx train indices = test indices = for label, indices in label to indices.items : indices = indices : rng.shuffle indices if len indices <= 1: train indices.extend indices continue n test = max 1, int round len indices test size if n test = len indices : n test = len indices - 1 test indices.extend indices :n test train indices.extend indices n test: rng.shuffle train indices rng.shuffle test indices return train indices, test indices class PureMultinomialNB: def init self, max features=20000, min df=2, alpha=1.0 : self.max features = max features self.min df = min df self.alpha = alpha self.vocab = {} self.labels = self.class log prior = {} self.feature log prob = {} self.class token totals = {} def fit self, texts, labels : texts = list texts labels = list labels doc freq = Counter for text in texts: doc freq.update set tokenize text vocab items = item for item in doc freq.items if item 1 = self.min df vocab items = sorted vocab items, key=lambda item: item 1 , reverse=True vocab items = vocab items :self.max features self.vocab = {token: idx for idx, token, in enumerate vocab items } self.labels = sorted set labels class doc counts = Counter labels total docs = len labels num classes = len self.labels token counts by class = {label: Counter for label in self.labels} token totals by class = {label: 0 for label in self.labels} for text, label in zip texts, labels : counts = Counter token for token in tokenize text if token in self.vocab token counts by class label .update counts token totals by class label += sum counts.values vocab size = max len self.vocab , 1 for label in self.labels: self.class log prior label = math.log class doc counts label + self.alpha / total docs + self.alpha num classes denom = token totals by class label + self.alpha vocab size self.class token totals label = token totals by class label self.feature log prob label = {} for token in self.vocab: count = token counts by class label token self.feature log prob label token = math.log count + self.alpha / denom return self def predict one self, text : counts = Counter token for token in tokenize text if token in self.vocab best label = None best score = -float "inf" for label in self.labels: score = self.class log prior label feature probs = self.feature log prob label for token, count in counts.items : score += count feature probs.get token, 0.0 if score best score: best score = score best label = label return best label def predict self, texts : return self.predict one text for text in texts def top tokens for class self, label, n=20 : if label not in self.feature log prob: return base scores = self.feature log prob label other labels = item for item in self.labels if item = label rows = for token in self.vocab: this score = base scores token if other labels: other score = sum self.feature log prob other token for other in other labels / len other labels margin = this score - other score else: margin = this score rows.append token, margin rows = sorted rows, key=lambda item: item 1 , reverse=True return rows :n def evaluate predictions y true, y pred : labels = sorted set y true | set y pred rows = total correct = 0 total = len y true for label in labels: tp = sum true == label and pred == label for true, pred in zip y true, y pred fp = sum true = label and pred == label for true, pred in zip y true, y pred fn = sum true == label and pred = label for true, pred in zip y true, y pred support = sum true == label for true in y true precision = tp / tp + fp if tp + fp else 0.0 recall = tp / tp + fn if tp + fn else 0.0 f1 = 2 precision recall / precision + recall if precision + recall else 0.0 rows.append { "label": label, "precision": precision, "recall": recall, "f1": f1, "support": support, } total correct += tp accuracy = total correct / total if total else 0.0 macro f1 = sum row "f1" for row in rows / len rows if rows else 0.0 weighted f1 = sum row "f1" row "support" for row in rows / total if total else 0.0 report df = pd.DataFrame rows metrics = { "accuracy": accuracy, "macro f1": macro f1, "weighted f1": weighted f1, "labels": labels, "rows": rows, } return metrics, report df def confusion matrix df y true, y pred : labels = sorted set y true | set y pred matrix = pd.DataFrame 0, index=labels, columns=labels, dtype=int, for true, pred in zip y true, y pred : matrix.loc true, pred += 1 matrix.index.name = "actual" matrix.columns.name = "predicted" return matrix We define pure-Python classification utilities for stratified train-test splitting, Naive Bayes training, prediction, and evaluation. We implement the classifier from scratch, so the tutorial stays stable even in Colab environments with broken scientific Python binaries. We also add reporting tools for precision, recall, F1 score, confusion matrices, and top class-specific tokens. Training Naive Bayes Baselines and Keyword Search Over Traces rprint Panel.fit " bold Baseline 1: Predict output type from context using pure Python Naive Bayes /bold " model artifacts = {} classifier df = df.dropna subset= "output type" .copy classifier df = classifier df classifier df "output type" .astype str .str.len 0 .copy if classifier df "output type" .nunique = 2 and len classifier df = 30: X text = classifier df "context" .fillna "" .astype str .map lambda text: text :12000 .tolist y = classifier df "output type" .astype str .tolist train indices, test indices = stratified train test indices y, test size=0.2, seed=SEED X train = X text i for i in train indices y train = y i for i in train indices X test = X text i for i in test indices y test = y i for i in test indices output type classifier = PureMultinomialNB max features=20000, min df=2, alpha=1.0, output type classifier.fit X train, y train predictions = output type classifier.predict X test output type metrics, output report df = evaluate predictions y test, predictions output matrix df = confusion matrix df y test, predictions output type metrics "train rows" = len X train output type metrics "test rows" = len X test output type metrics "vocab size" = len output type classifier.vocab rprint " bold Output type classifier report: /bold " display output report df display output matrix df output report df.to csv OUT DIR / "output type classifier report.csv", index=False output matrix df.to csv OUT DIR / "output type confusion matrix.csv" top token records = for label in output type classifier.labels: for token, margin in output type classifier.top tokens for class label, n=25 : top token records.append { "label": label, "token": token, "score margin": margin, } pd.DataFrame top token records .to csv OUT DIR / "output type top tokens.csv", index=False, with open OUT DIR / "output type classifier metrics.json", "w", encoding="utf-8", as file: json.dump output type metrics, file, ensure ascii=False, indent=2 model artifacts "output type classifier metrics" = str OUT DIR / "output type classifier metrics.json" model artifacts "output type classifier report" = str OUT DIR / "output type classifier report.csv" model artifacts "output type confusion matrix" = str OUT DIR / "output type confusion matrix.csv" model artifacts "output type top tokens" = str OUT DIR / "output type top tokens.csv" else: rprint " yellow Skipping output type classifier because there are too few " "classes or rows. /yellow " output type metrics = {} rprint Panel.fit " bold Baseline 2: Predict tool name from context using pure Python Naive Bayes /bold " tool classifier df = df df "output type" .eq "tool use" & df "tool name" .fillna "" .astype str .str.len .gt 0 .copy if len tool classifier df = 50 and tool classifier df "tool name" .nunique = 2: top tools = tool classifier df "tool name" .value counts .head 12 .index.tolist tool classifier df "tool label" = tool classifier df "tool name" .where tool classifier df "tool name" .isin top tools , " OTHER ", y tool = tool classifier df "tool label" .astype str .tolist X tool text = tool classifier df "context" .fillna "" .astype str .map lambda text: text :12000 .tolist if len set y tool = 2: train indices, test indices = stratified train test indices y tool, test size=0.2, seed=SEED X train = X tool text i for i in train indices y train = y tool i for i in train indices X test = X tool text i for i in test indices y test = y tool i for i in test indices tool classifier = PureMultinomialNB max features=20000, min df=2, alpha=1.0, tool classifier.fit X train, y train tool predictions = tool classifier.predict X test tool metrics, tool report df = evaluate predictions y test, tool predictions tool matrix df = confusion matrix df y test, tool predictions tool metrics "train rows" = len X train tool metrics "test rows" = len X test tool metrics "vocab size" = len tool classifier.vocab rprint " bold Tool classifier report: /bold " display tool report df display tool matrix df tool report df.to csv OUT DIR / "tool name classifier report.csv", index=False tool matrix df.to csv OUT DIR / "tool name confusion matrix.csv" top tool token records = for label in tool classifier.labels: for token, margin in tool classifier.top tokens for class label, n=25 : top tool token records.append { "label": label, "token": token, "score margin": margin, } pd.DataFrame top tool token records .to csv OUT DIR / "tool name top tokens.csv", index=False, with open OUT DIR / "tool name classifier metrics.json", "w", encoding="utf-8", as file: json.dump tool metrics, file, ensure ascii=False, indent=2 model artifacts "tool name classifier metrics" = str OUT DIR / "tool name classifier metrics.json" model artifacts "tool name classifier report" = str OUT DIR / "tool name classifier report.csv" model artifacts "tool name confusion matrix" = str OUT DIR / "tool name confusion matrix.csv" model artifacts "tool name top tokens" = str OUT DIR / "tool name top tokens.csv" else: rprint " yellow Skipping tool classifier because labels collapsed to one class. /yellow " tool metrics = {} else: rprint " yellow Skipping tool classifier because there are too few tool-use " "rows or tool classes. /yellow " tool metrics = {} rprint Panel.fit " bold Building simple keyword search helper /bold " def search rows keyword, limit=5, search cols= "context", "cot", "completion", "text payload" : keyword = str keyword .lower mask = pd.Series False, index=df.index for column in search cols: mask = mask | df column .fillna "" .astype str .str.lower .str.contains re.escape keyword , regex=True hits = df mask .head limit results = for , row in hits.iterrows : results.append { "uid": row.get "uid" , "session": row.get "session" , "output type": row.get "output type" , "tool name": row.get "tool name" , "context preview": preview text row.get "context" , 400 , "payload preview": preview text row.get "text payload" , 400 , } return results example queries = "Bash", "Write", "browser", "test", "README", search demo = { query: search rows query, limit=2 for query in example queries } with open OUT DIR / "keyword search demo.json", "w", encoding="utf-8", as file: json.dump search demo, file, ensure ascii=False, indent=2 rprint " bold Example keyword search results: /bold " rprint safe json dumps search demo, max chars=5000 summary = { "dataset id": DATASET ID, "flat jsonl filename": FLAT JSONL FILENAME, "output directory": str OUT DIR , "repo file summary": file summary, "rows": int len df , "columns": list df.columns , "output type distribution": df "output type" .fillna "missing" .value counts .to dict , "top tools": df.loc df "output type" .eq "tool use" , "tool name" .replace "", "unknown" .value counts .head 20 .to dict , "top source roots": df "source root" .fillna "unknown" .value counts .head 20 .to dict , "length summary": { column: { "mean": float df column .mean , "median": float df column .median , "p90": float df column .quantile 0.90 , "p95": float df column .quantile 0.95 , "max": int df column .max , } for column in "context chars", "cot chars", "completion chars", "text payload chars", }, "possible secret rows": int df "possible secret anywhere" .sum , "plots": plot paths, "model artifacts": model artifacts, "safe exports": { "train": str OUT DIR / "fable5 no cot chat train.jsonl" , "validation": str OUT DIR / "fable5 no cot chat validation.jsonl" , "test": str OUT DIR / "fable5 no cot chat test.jsonl" , }, "analysis files": { "csv": str OUT DIR / "fable5 analysis index.csv" , "pickle": str OUT DIR / "fable5 analysis index.pkl" , "keyword search demo": str OUT DIR / "keyword search demo.json" , }, } with open OUT DIR / "analysis summary.json", "w", encoding="utf-8", as file: json.dump clean for json summary , file, ensure ascii=False, indent=2, default=str FENCE = chr 96 3 report md = " Fable 5 Traces Advanced Tutorial Report\n\n" " Dataset\n\n" f"- Dataset: {DATASET ID} \n" f"- Flat JSONL: {FLAT JSONL FILENAME} \n" f"- Rows loaded: {len df :,} \n" f"- Unique source sessions: {df 'session' .nunique dropna=True :,} \n" f"- Unique models: {df 'model' .nunique dropna=True :,} \n\n" " Important safety note\n\n" "This tutorial treats the dataset as agent telemetry. It previews and analyzes commands, " "tool calls, file edits, and transcript text, but it never executes commands found inside " "the traces.\n\n" f"Potential secret-like patterns detected: {int df 'possible secret anywhere' .sum :,} rows.\n" "Exports redact common API-key/token-like patterns.\n\n" " Output type distribution\n\n" f"{FENCE}json\n" f"{json.dumps clean for json summary 'output type distribution' , indent=2, ensure ascii=False }\n" f"{FENCE}\n\n" " Top tools\n\n" f"{FENCE}json\n" f"{json.dumps clean for json summary 'top tools' , indent=2, ensure ascii=False }\n" f"{FENCE}\n\n" " Saved files\n\n" "- analysis summary.json \n" "- fable5 analysis index.csv \n" "- fable5 analysis index.pkl \n" "- fable5 no cot chat train.jsonl \n" "- fable5 no cot chat validation.jsonl \n" "- fable5 no cot chat test.jsonl \n" "- plot PNG files\n" "- baseline classifier metrics, when enough rows/classes are available\n\n" " Recommended next steps\n\n" "1. Inspect fable5 no cot chat train.jsonl before any fine-tuning.\n" "2. Keep the dataset license in mind before model training or redistribution.\n" "3. Avoid training directly on raw terminal outputs without additional privacy and safety filtering.\n" "4. Start with the no-CoT chat export unless your research explicitly requires reasoning-trace supervision.\n" with open OUT DIR / "REPORT.md", "w", encoding="utf-8", as file: file.write report md rprint Panel.fit f" bold green Tutorial complete. /bold green \n\n" f"Artifacts saved in:\n{OUT DIR}\n\n" f"Key files:\n" f"- {OUT DIR / 'REPORT.md'}\n" f"- {OUT DIR / 'analysis summary.json'}\n" f"- {OUT DIR / 'fable5 no cot chat train.jsonl'}\n" f"- {OUT DIR / 'fable5 analysis index.csv'}", title="Done", display pd.DataFrame { "artifact": "Report", "Summary JSON", "No-CoT train export", "No-CoT validation export", "No-CoT test export", "Analysis CSV", "Analysis pickle", "Keyword search demo", , "path": str OUT DIR / "REPORT.md" , str OUT DIR / "analysis summary.json" , str OUT DIR / "fable5 no cot chat train.jsonl" , str OUT DIR / "fable5 no cot chat validation.jsonl" , str OUT DIR / "fable5 no cot chat test.jsonl" , str OUT DIR / "fable5 analysis index.csv" , str OUT DIR / "fable5 analysis index.pkl" , str OUT DIR / "keyword search demo.json" , , } We train a baseline model to predict whether the assistant’s output is text or a tool call based on the trace context. We also train a second baseline that predicts the likely tool name for tool-use rows and save the evaluation artifacts. We finish by adding keyword search, writing the final summary JSON and Markdown report, and displaying the saved tutorial outputs. Conclusion In conclusion, we have a practical and reliable workflow for exploring Fable 5 Traces without depending on packages that may break in a Colab runtime. We moved from raw Hugging Face files to structured analysis tables, safe previews, plots, searchable examples, cleaned chat-style exports, and baseline modeling artifacts. We treated the traces as agent telemetry, so we redacted possible secrets, avoided executing any commands from the dataset, and kept the chain of thought out of the default training export. Check out the Full Codes here . Also, feel free to follow us on and don’t forget to join our Twitter https://x.com/intent/follow?screen name=marktechpost and Subscribe to 150k+ML SubReddit https://www.reddit.com/r/machinelearningnews/ . Wait are you on telegram? our Newsletter https://www.aidevsignals.com/ now you can join us on telegram as well. https://t.me/machinelearningresearchnews Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us https://forms.gle/wbash1wF6efRj8G58 Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions. - Sana Hassan - Sana Hassan - Sana Hassan - Sana Hassan