Harness Engineering 101 — สิ่งที่อยู่ใต้พรมของ Agentic AI

Harness engineering is the discipline of building the scaffolding that wraps an LLM and manages failures so that agents can work in an imperfect world. A developer recounts how an agent searching for condos on a Thai auction site failed because the tool returned 'clicked' before the page finished submitting, requiring verification via browser_snapshot. The harness also handles context window growth by compressing old tool results, and implements adaptive retry strategies that change tactics on each attempt, such as using button.click() instead of form.submit() to avoid CAPTCHA race conditions.

บทความก่อนเราคุยกันเรื่อง "จาก LLM เปล่า → Agentic AI" แบบ 7 layer คราวนี้มาดูว่าภายในแต่ละ layer มันทำงานยังไง — และอะไรที่พังได้บ้าง เวลาเราใช้ Claude Code, Cursor, หรือ Hermes — เราเห็น AI ทำงานเป็นขั้นตอน: คิด → เรียก tool → ดูผล → คิดต่อ → เรียก tool → เสร็จ สิ่งที่เราไม่เห็นคือ ทุกอย่างที่พังระหว่างทาง — และมีคน หรือโค้ด ที่คอยจัดการความพังพวกนั้นอยู่ตลอดเวลา นั่นแหละคือ Harness Engineering — ศาสตร์แห่งการสร้าง "โครง" ที่ห่อหุ้ม LLM ไว้ แล้วจัดการทุกอย่างให้ agent ทำงานได้จริงในโลกที่ไม่มีอะไร perfect นี่คือ loop ที่ทุก agent รัน: while not done and budget ok: response = llm.chat messages, tools if response.has tool calls : for tool in response.tool calls: result = execute tool tool messages.append result else: return response.text ดูเผิน ๆ เหมือน while loop ธรรมดา — แต่นี่คือที่ที่ทุกอย่างพังได้ ครั้งหนึ่งผมให้ Hermes หาคอนโดในเว็บ LED เว็บประมูลทรัพย์ของกรมบังคับคดี AI วางแผน: เข้าเว็บ → กรอกฟอร์ม → กด submit → อ่านผล AI เรียก tool browser click ref="submit button" — tool return ว่า "clicked" AI ดีใจ — "เรียบร้อย ได้ผลลัพธ์แล้ว" — แล้วพยายามอ่านผลลัพธ์จากหน้าที่ไม่โหลดขึ้นมาจริง เกิดอะไรขึ้น? Tool return "clicked" แต่หน้าเว็บยัง submit ไม่เสร็จ — JavaScript ยังทำงาน, DOM ยังไม่เปลี่ยน, CAPTCHA ยังไม่ validate Harness ต้องจัดการ: หลังจาก browser click ต้องมี browser snapshot เพื่อยืนยันว่าหน้าเว็บเปลี่ยนจริง — และถ้าหน้าไม่เปลี่ยน ต้อง retry หรือเปลี่ยนกลยุทธ์ นี่คือสิ่งที่ harness ทำ — มันไม่เชื่อ tool call ทันที แต่มัน verify ทุกครั้งที่ AI เรียก tool — context window จะยาวขึ้น เพราะต้องเก็บ: user message → assistant tool call → tool result → assistant tool call → tool result → ... ถ้า AI ทำงาน 50 รอบ — context อาจยาวถึง 100K+ tokens ปัญหา: Harness ต้องจัดการ: if token count threshold: compress context ตัด tool result เก่า ๆ ออก เหลือแต่ใจความ เรื่องเล่า: เคยมีครั้งนึง ในทีมที่ดูแล agent ตัวหนึ่ง — AI ทำงานนาน 50+ tool calls — context ยาว 150K tokens — มันเริ่มวน loop: อ่านไฟล์ซ้ำ, แก้แล้วแก้อีก, ลืมว่าตัวเองทำอะไรไปแล้ว — harness ตัดสินใจ compress context อัตโนมัติ — เหลือ 30K tokens — AI กลับมามีสติและทำงานต่อได้ทันที Tool call ไม่ได้สำเร็จเสมอไป: - terminal "git push" : Permission denied - browser click "submit" : Page did not change - web search "Go 1.27" : CAPTCHA blocked - read file "config.yaml" : File not found AI ต้องรู้ว่า tool ล้มเหลว — และต้องมี กลยุทธ์กู้คืน Hermes ต้องค้นหาทรัพย์ในเว็บ LED — ครั้งแรกทำตามปกติ: กรอกฟอร์ม → กด submit → CAPTCHA block Harness pattern ที่ใช้จริง: Attempt 1: form.submit → CAPTCHA block ❌ Attempt 2: อ่าน CAPTCHA ก่อน → submit → CAPTCHA เปลี่ยนระหว่าง submit ❌ Attempt 3: กรอกข้อมูลในฟอร์มทุกช่องแบบเงียบ ๆ — ใช้ JavaScript ใส่ค่าลงใน input field โดยตรง โดยไม่ให้เว็บรู้ว่ากำลังมีคนกรอก ไม่ trigger onChange event เพราะ event พวกนั้นจะไปเรียก AJAX โหลดข้อมูลอำเภอ ซึ่งทำให้ CAPTCHA รีเฟรชก่อน submit → อ่าน CAPTCHA เป็นขั้นตอนสุดท้าย → กดปุ่ม submit ด้วย button.click แทน form.submit → ✅ สำเร็จ Harness ไม่ได้แค่ "retry" แบบโง่ ๆ — มันเปลี่ยนกลยุทธ์ในแต่ละครั้ง: | Attempt | กลยุทธ์ | ผล | |---|---|---| | 1 | form.submit | ❌ CAPTCHA | | 2 | CAPTCHA ก่อน submit | ❌ race condition | | 3 | button.click + CAPTCHA last | ✅ | นี่คือ adaptive retry — ไม่ใช่แค่เรียกซ้ำด้วย parameter เดิม สมมติ AI เรียก read file "main.go" — ได้โค้ด 500 บรรทัด ใน loop ถัดไป AI อ่านอีก 3 ไฟล์, รัน test, แก้โค้ด, รัน test อีก — context ยาวขึ้นเรื่อย ๆ แต่ AI ไม่จำเป็นต้อง "จำ" เนื้อหาทั้ง 500 บรรทัดของ main.go ในรอบที่ 10 — มันอาจต้องรู้แค่ "main.go มี function main ที่เรียก RunServer" Harness ต้องตัด: 📄 main.go 500 lines — รอบที่ 1: เก็บหมด 📄 main.go 500 lines — รอบที่ 10: เก็บเฉพาะ summary "defines main , calls RunServer " เทคนิคนี้เรียกว่า context compaction — harness ใช้ LLM ตัวเล็ก ถูก อ่าน tool result เก่า ๆ → สรุป → เก็บเฉพาะ summary เรื่องจริง: Hermes มี compression.threshold ใน config — default 0.50 50% ของ context window — พอ context เกินครึ่ง มันจะ compress อัตโนมัติ — จาก 100K → 20K tokens —ประหยัดเงินไป $0.24 ต่อ API call LLM มีกฏตายตัว: ข้อความใน conversation ต้องสลับ role กัน ✅ user → assistant → user → assistant → user ❌ user → assistant → assistant → user ← พัง ฟังดูง่าย — แต่ใน agent loop มันไม่ง่ายเลย: user: "สร้าง API" assistant: tool call read file ← assistant tool result: "ไฟล์มี 200 บรรทัด" ← tool ไม่ใช่ user ไม่ใช่ assistant assistant: tool call write file ← assistant ซ้อน assistant ❌ Harness ต้องแก้: รวม tool calls ที่ต่อเนื่องกันเข้าเป็น assistant message เดียว แทนที่จะส่งทีละ tool call — harness รวบก่อนส่ง messages = {"role": "user", "content": "สร้าง API"}, {"role": "assistant", "tool calls": read file, write file, run test }, รวม 3 calls {"role": "tool", "results": ... }, {"role": "assistant", "content": "เสร็จแล้วครับ"} เรื่องจริง: มี bug ใน Hermes เวอร์ชันก่อนที่ role alternation พังตอน /stop — AI กำลังเรียก tool แล้ว user กด stop — harness ไม่ได้รวบ tool calls ที่ค้างอยู่ → ส่ง assistant สองครั้งติด → API error — ใช้เวลา debug 3 ชั่วโมงถึงเจอ เรามี function ใน Python: php def read file path: str, offset: int = 1, limit: int = 500 - dict: """Read a text file with line numbers.""" LLM ไม่เข้าใจ Python — harness ต้องแปลงเป็น JSON schema: { "name": "read file", "description": "Read a text file with line numbers.", "parameters": { "path": {"type": "string", "description": "Path to the file"}, "offset": {"type": "integer", "default": 1}, "limit": {"type": "integer", "default": 500} } } และเมื่อ LLM ตอบกลับมา: {"name": "read file", "arguments": {"path": "/home/user/main.go"}} Harness ต้องแปลงกลับเป็น read file path="/home/user/main.go" — แล้วเรียกจริง ฟังดู trivial — แต่เวลามี 50 tools, แต่ละตัวมี parameter 5-10 ตัว — harness จัดการ schema ทั้งหมดนี้ให้เราโดยที่เราไม่ต้องคิด ผู้ใช้เปลี่ยนใจกลางทาง: "หยุด ไม่เอาอันนั้นแล้ว" Harness ต้อง: terminal ที่รัน build อยู่ ตัวอย่าง: ผู้ใช้สั่ง /stop ตอน AI กำลัง clone repo ขนาดใหญ่ — git clone รันไป 80% แล้ว — harness ต้องฆ่า process, cleanup ไฟล์ที่ clone มาแล้วบางส่วน, แล้วกลับมาพร้อมตอบ — ทั้งหมดนี้ในเวลา < 1 วิ สรุป: Harness Engineering คือ วิศวกรรมซอฟต์แวร์ธรรมดา ที่ถูกออกแบบมาเพื่อจัดการกับ LLM ที่ไม่ธรรมดา | ปัญหา | วิธีแก้ | ไม่ใช่ AI — คือ Engineering | |---|---|---| | Context ยาวเกิน | Compaction | จัดการ memory | | Tool call พัง | Retry + adaptive strategy | Error handling | | Role ซ้ำ | Merge tool calls | Message routing | | Schema ไม่ตรง | Auto-generate JSON schema | Serialization | | User สั่ง stop | Graceful interrupt | Process management | LLM คือสมอง — Harness คือระบบประสาท กล้ามเนื้อ และภูมิคุ้มกันที่ห่อหุ้มสมองนั้นอยู่ อยากลองสร้าง harness ของตัวเอง? python harness ขั้นต่ำ — 30 บรรทัด def run agent user input, tools, max rounds=10 : messages = {"role": "user", "content": user input} for in range max rounds : response = llm.chat messages, tools if not response.has tool calls : return response.text ✅ จบ รวบทุก tool call เป็น message เดียว role alternation tool results = for tc in response.tool calls: result = execute tc.name, tc.args tool results.append result messages.append {"role": "assistant", "tool calls": response.tool calls} messages.append {"role": "tool", "results": tool results} Token budget — compress ถ้าเกิน 80% if count tokens messages 80000: messages = compress messages return "Max rounds exceeded" จาก 30 บรรทัดนี้ — คุณจะเจอปัญหาเดียวกับที่ Kiro, Antigravity, Hermes เจอ — และนั่นคือจุดเริ่มต้นของ Harness Engineering 📚 อ่านต่อ: