# Harness Engineering 101 — สิ่งที่อยู่ใต้พรมของ Agentic AI

> Source: <https://dev.to/gophernment/harness-engineering-101-singthiiyuuaitphrmkhng-agentic-ai-50nk>
> Published: 2026-07-01 02:12:00+00:00

บทความก่อนเราคุยกันเรื่อง "จาก LLM เปล่า → Agentic AI" แบบ 7 layer

คราวนี้มาดูว่าภายในแต่ละ layer มันทำงานยังไง — และอะไรที่พังได้บ้าง

เวลาเราใช้ Claude Code, Cursor, หรือ Hermes — เราเห็น AI ทำงานเป็นขั้นตอน:

```
คิด → เรียก tool → ดูผล → คิดต่อ → เรียก tool → เสร็จ
```

สิ่งที่เราไม่เห็นคือ **ทุกอย่างที่พังระหว่างทาง** — และมีคน (หรือโค้ด) ที่คอยจัดการความพังพวกนั้นอยู่ตลอดเวลา

นั่นแหละคือ **Harness Engineering** — ศาสตร์แห่งการสร้าง "โครง" ที่ห่อหุ้ม LLM ไว้ แล้วจัดการทุกอย่างให้ agent ทำงานได้จริงในโลกที่ไม่มีอะไร perfect

นี่คือ loop ที่ทุก agent รัน:

```
while not done and budget_ok:
    response = llm.chat(messages, tools)

    if response.has_tool_calls():
        for tool in response.tool_calls:
            result = execute_tool(tool)
            messages.append(result)
    else:
        return response.text
```

ดูเผิน ๆ เหมือน while loop ธรรมดา — แต่นี่คือที่ที่ทุกอย่างพังได้

ครั้งหนึ่งผมให้ Hermes หาคอนโดในเว็บ LED (เว็บประมูลทรัพย์ของกรมบังคับคดี)

AI วางแผน: เข้าเว็บ → กรอกฟอร์ม → กด submit → อ่านผล

AI เรียก tool `browser_click(ref="submit_button")`

— tool return ว่า "clicked"

AI ดีใจ — "เรียบร้อย! ได้ผลลัพธ์แล้ว" — แล้วพยายามอ่านผลลัพธ์จากหน้าที่ไม่โหลดขึ้นมาจริง

**เกิดอะไรขึ้น?** Tool return "clicked" แต่หน้าเว็บยัง submit ไม่เสร็จ — JavaScript ยังทำงาน, DOM ยังไม่เปลี่ยน, CAPTCHA ยังไม่ validate

**Harness ต้องจัดการ:** หลังจาก `browser_click`

ต้องมี `browser_snapshot`

เพื่อยืนยันว่าหน้าเว็บเปลี่ยนจริง — และถ้าหน้าไม่เปลี่ยน ต้อง retry หรือเปลี่ยนกลยุทธ์

นี่คือสิ่งที่ harness ทำ — **มันไม่เชื่อ tool call ทันที แต่มัน verify**

ทุกครั้งที่ AI เรียก tool — context window จะยาวขึ้น เพราะต้องเก็บ:

```
[user message] → [assistant tool_call] → [tool result] → [assistant tool_call] → [tool result] → ...
```

ถ้า AI ทำงาน 50 รอบ — context อาจยาวถึง 100K+ tokens

**ปัญหา:**

**Harness ต้องจัดการ:**

```
if token_count > threshold:
    compress_context()  # ตัด tool result เก่า ๆ ออก เหลือแต่ใจความ
```

เรื่องเล่า: เคยมีครั้งนึง ในทีมที่ดูแล agent ตัวหนึ่ง — AI ทำงานนาน 50+ tool calls — context ยาว 150K tokens — มันเริ่มวน loop: อ่านไฟล์ซ้ำ, แก้แล้วแก้อีก, ลืมว่าตัวเองทำอะไรไปแล้ว — harness ตัดสินใจ compress context อัตโนมัติ — เหลือ 30K tokens — AI กลับมามีสติและทำงานต่อได้ทันที

Tool call ไม่ได้สำเร็จเสมอไป:

```
- terminal("git push"): Permission denied
- browser_click("submit"): Page did not change
- web_search("Go 1.27"): CAPTCHA blocked
- read_file("config.yaml"): File not found
```

AI ต้องรู้ว่า tool ล้มเหลว — และต้องมี **กลยุทธ์กู้คืน**

Hermes ต้องค้นหาทรัพย์ในเว็บ LED — ครั้งแรกทำตามปกติ: กรอกฟอร์ม → กด submit → **CAPTCHA block**

Harness pattern ที่ใช้จริง:

```
Attempt 1: form.submit() → CAPTCHA block ❌
Attempt 2: อ่าน CAPTCHA ก่อน → submit → CAPTCHA เปลี่ยนระหว่าง submit ❌  
Attempt 3: กรอกข้อมูลในฟอร์มทุกช่องแบบเงียบ ๆ — ใช้ JavaScript ใส่ค่าลงใน input field โดยตรง โดยไม่ให้เว็บรู้ว่ากำลังมีคนกรอก (ไม่ trigger `onChange` event เพราะ event พวกนั้นจะไปเรียก AJAX โหลดข้อมูลอำเภอ ซึ่งทำให้ CAPTCHA รีเฟรชก่อน submit) → อ่าน CAPTCHA เป็นขั้นตอนสุดท้าย → กดปุ่ม submit ด้วย `button.click()` แทน `form.submit()` → ✅ สำเร็จ!
```

Harness ไม่ได้แค่ "retry" แบบโง่ ๆ — มันเปลี่ยนกลยุทธ์ในแต่ละครั้ง:

| Attempt | กลยุทธ์ | ผล |
|---|---|---|
| 1 | form.submit() | ❌ CAPTCHA |
| 2 | CAPTCHA ก่อน submit | ❌ race condition |
| 3 | button.click() + CAPTCHA last | ✅ |

นี่คือ **adaptive retry** — ไม่ใช่แค่เรียกซ้ำด้วย parameter เดิม

สมมติ AI เรียก `read_file("main.go")`

— ได้โค้ด 500 บรรทัด

ใน loop ถัดไป AI อ่านอีก 3 ไฟล์, รัน test, แก้โค้ด, รัน test อีก — context ยาวขึ้นเรื่อย ๆ

แต่ AI ไม่จำเป็นต้อง "จำ" เนื้อหาทั้ง 500 บรรทัดของ `main.go`

ในรอบที่ 10 — มันอาจต้องรู้แค่ "main.go มี function main ที่เรียก RunServer"

**Harness ต้องตัด:**

```
📄 main.go (500 lines) — รอบที่ 1: เก็บหมด
📄 main.go (500 lines) — รอบที่ 10: เก็บเฉพาะ summary "defines main(), calls RunServer()"
```

เทคนิคนี้เรียกว่า **context compaction** — harness ใช้ LLM ตัวเล็ก (ถูก) อ่าน tool result เก่า ๆ → สรุป → เก็บเฉพาะ summary

เรื่องจริง: Hermes มี

`compression.threshold`

ใน config — default 0.50 (50% ของ context window) — พอ context เกินครึ่ง มันจะ compress อัตโนมัติ — จาก 100K → 20K tokens —ประหยัดเงินไป $0.24 ต่อ API call

LLM มีกฏตายตัว: **ข้อความใน conversation ต้องสลับ role กัน**

```
✅ user → assistant → user → assistant → user
❌ user → assistant → assistant → user    ← พัง!
```

ฟังดูง่าย — แต่ใน agent loop มันไม่ง่ายเลย:

```
[user: "สร้าง API"]
[assistant: tool_call read_file]   ← assistant
[tool result: "ไฟล์มี 200 บรรทัด"]  ← tool (ไม่ใช่ user ไม่ใช่ assistant)
[assistant: tool_call write_file]  ← assistant ซ้อน assistant! ❌
```

**Harness ต้องแก้:** รวม tool calls ที่ต่อเนื่องกันเข้าเป็น assistant message เดียว

```
# แทนที่จะส่งทีละ tool call — harness รวบก่อนส่ง
messages = [
    {"role": "user", "content": "สร้าง API"},
    {"role": "assistant", "tool_calls": [read_file, write_file, run_test]},  # รวม 3 calls
    {"role": "tool", "results": [...]},
    {"role": "assistant", "content": "เสร็จแล้วครับ"}
]
```

เรื่องจริง: มี bug ใน Hermes เวอร์ชันก่อนที่

`role alternation`

พังตอน`/stop`

— AI กำลังเรียก tool แล้ว user กด stop — harness ไม่ได้รวบ tool calls ที่ค้างอยู่ → ส่ง assistant สองครั้งติด → API error — ใช้เวลา debug 3 ชั่วโมงถึงเจอ

เรามี function ใน Python:

``` php
def read_file(path: str, offset: int = 1, limit: int = 500) -> dict:
    """Read a text file with line numbers."""
```

LLM ไม่เข้าใจ Python — harness ต้องแปลงเป็น JSON schema:

```
{
  "name": "read_file",
  "description": "Read a text file with line numbers.",
  "parameters": {
    "path": {"type": "string", "description": "Path to the file"},
    "offset": {"type": "integer", "default": 1},
    "limit": {"type": "integer", "default": 500}
  }
}
```

และเมื่อ LLM ตอบกลับมา:

```
{"name": "read_file", "arguments": {"path": "/home/user/main.go"}}
```

Harness ต้องแปลงกลับเป็น `read_file(path="/home/user/main.go")`

— แล้วเรียกจริง

ฟังดู trivial — แต่เวลามี 50 tools, แต่ละตัวมี parameter 5-10 ตัว — harness จัดการ schema ทั้งหมดนี้ให้เราโดยที่เราไม่ต้องคิด

ผู้ใช้เปลี่ยนใจกลางทาง: "หยุด! ไม่เอาอันนั้นแล้ว"

Harness ต้อง:

`terminal`

ที่รัน build อยู่)ตัวอย่าง: ผู้ใช้สั่ง

`/stop`

ตอน AI กำลัง clone repo ขนาดใหญ่ —`git clone`

รันไป 80% แล้ว — harness ต้องฆ่า process, cleanup ไฟล์ที่ clone มาแล้วบางส่วน, แล้วกลับมาพร้อมตอบ — ทั้งหมดนี้ในเวลา < 1 วิ

สรุป: Harness Engineering คือ **วิศวกรรมซอฟต์แวร์ธรรมดา** ที่ถูกออกแบบมาเพื่อจัดการกับ LLM ที่ไม่ธรรมดา

| ปัญหา | วิธีแก้ | ไม่ใช่ AI — คือ Engineering |
|---|---|---|
| Context ยาวเกิน | Compaction | จัดการ memory |
| Tool call พัง | Retry + adaptive strategy | Error handling |
| Role ซ้ำ | Merge tool calls | Message routing |
| Schema ไม่ตรง | Auto-generate JSON schema | Serialization |
| User สั่ง stop | Graceful interrupt | Process management |

LLM คือสมอง — Harness คือระบบประสาท กล้ามเนื้อ และภูมิคุ้มกันที่ห่อหุ้มสมองนั้นอยู่

อยากลองสร้าง harness ของตัวเอง?

``` python
# harness ขั้นต่ำ — 30 บรรทัด
def run_agent(user_input, tools, max_rounds=10):
    messages = [{"role": "user", "content": user_input}]

    for _ in range(max_rounds):
        response = llm.chat(messages, tools)

        if not response.has_tool_calls():
            return response.text  # ✅ จบ

        # รวบทุก tool call เป็น message เดียว (role alternation)
        tool_results = []
        for tc in response.tool_calls:
            result = execute(tc.name, tc.args)
            tool_results.append(result)

        messages.append({"role": "assistant", "tool_calls": response.tool_calls})
        messages.append({"role": "tool", "results": tool_results})

        # Token budget — compress ถ้าเกิน 80%
        if count_tokens(messages) > 80000:
            messages = compress(messages)

    return "Max rounds exceeded"
```

จาก 30 บรรทัดนี้ — คุณจะเจอปัญหาเดียวกับที่ Kiro, Antigravity, Hermes เจอ — และนั่นคือจุดเริ่มต้นของ Harness Engineering

📚

อ่านต่อ: