{"slug": "agnostic-cluster-refactor-skill-for-antigrafity-cli-building-an-ai-agent-that-to", "title": "Agnostic Cluster Refactor Skill for Antigrafity CLI: Building an AI Agent that Migrates Apps from AWS to GKE (Subagents, HITL Gate & Workload Identity)", "summary": "A developer built an AI agent skill for the Antigravity CLI that migrates applications from AWS to GKE. The agent scans cloud dependencies, spawns parallel subagents to refactor code and infrastructure, and validates changes on local Kubernetes before deploying with keyless Workload Identity, with mandatory human oversight before any production mutation.", "body_md": "Have you ever inherited a codebase where `import boto3`\n\nappears in 47 different files? Where AWS credentials live in hardcoded environment variables and file storage is a `file.save(\"/tmp/...\")`\n\nthat will blow up the moment it hits an ephemeral Kubernetes pod?\n\nI did. And instead of refactoring everything by hand, I built an AI agent to do it for me — with mandatory human oversight before any production mutation.\n\nThis article documents what I built: a **skill for the Antigravity CLI** (`agy`\n\n) that scans cloud dependencies, spawns parallel subagents to refactor code and infrastructure, and validates everything on local Kubernetes before deploying to GKE with keyless Workload Identity.\n\n`boto3`\n\nis the AWS SDK for Python. It seems harmless at first:\n\n``` python\n# Innocent on day 1\nimport boto3\ns3 = boto3.client('s3', region_name='us-east-1')\ns3.upload_fileobj(file, bucket_name, filename)\n```\n\nSix months later:\n\n``` python\n# examples/legacy-app/app.py — the real state after it grows\nimport os\nimport boto3\nfrom flask import Flask, request, jsonify\n\napp = Flask(__name__)\n\n# \"Temporary\" hardcoded since 2022\nDB_PASSWORD = os.getenv(\"DB_PASSWORD\", \"default-insecure-password\")\n\nS3_BUCKET = os.getenv(\"AWS_S3_BUCKET_NAME\")\nAWS_REGION = os.getenv(\"AWS_DEFAULT_REGION\", \"us-east-1\")\n\ns3_client = boto3.client(\n    's3',\n    aws_access_key_id=os.getenv(\"AWS_ACCESS_KEY_ID\"),\n    aws_secret_access_key=os.getenv(\"AWS_SECRET_ACCESS_KEY\"),\n    region_name=AWS_REGION\n)\n\n@app.route(\"/upload\", methods=[\"POST\"])\ndef upload_file():\n    file = request.files['file']\n    filename = file.filename\n    if S3_BUCKET:\n        s3_client.upload_fileobj(file, S3_BUCKET, filename)\n        return jsonify({\"message\": f\"Uploaded to AWS S3: {S3_BUCKET}\"})\n    else:\n        # Fallback to local disk — will break in K8s ephemeral pods\n        local_path = os.path.join(\"/tmp\", filename)\n        file.save(local_path)\n        return jsonify({\"message\": f\"Saved locally at {local_path}\"})\n```\n\nThree coupling problems in a single file: proprietary SDK (`boto3`\n\n), AWS-specific credentials, and local disk storage that doesn't survive ephemeral Kubernetes pods.\n\nNow multiply that by 10 services.\n\nA **skill** for the Antigravity CLI that adds two commands to the agent chat:\n\n```\n/agnostic-cluster-refactor:scan-deps\n/agnostic-cluster-refactor:spawn-refactor\n```\n\nThe complete flow:\n\nBut before diving into the code, let me introduce the players.\n\n`agy`\n\nis not a script. It's an LLM-powered agent — you describe what you want in the chat and it decides how to do it, using a toolset: `read_file`\n\n, `write_to_file`\n\n, `run_command`\n\n, `invoke_subagent`\n\n.\n\nThe difference from a web chatbot: `agy`\n\nhas access to your local filesystem, runs terminal commands, and operates in autonomous loops. It's an engineer working on your machine.\n\n| Script | Agent |\n|---|---|\n`sed 's/boto3/gcs/g'` across all files |\nAnalyzes the semantic context of each import and replaces it with the correct equivalent API |\n| Fails if the environment changed | Adapts to the current state |\n| Deterministic | Probabilistic + adaptive |\n\nA skill is a `SKILL.md`\n\nfile with YAML frontmatter that defines when and how the agent uses that capability. The agent reads the `description`\n\nfield and decides whether the skill is relevant to the current task.\n\n```\n---\nname: scan-deps\ndescription: Scans the project for cloud-provider dependencies and generates\n             dependency-map.json. Use when the user wants to map vendor lock-in\n             before migrating to GKE.\n---\n\n## Steps\n\n1. Ask which directory to scan\n2. Run: python3 .agents/skills/.../scan_deps.py <PATH>\n3. Present the DAG summary\n```\n\n💡\n\nKey distinction:skills in`.agents/skills/`\n\nare injected silently into context. To appear as a`/command`\n\nin autocomplete, you need aplugininstalled at`~/.gemini/config/plugins/<plugin>/`\n\n. More on that in Part 6.\n\nA subagent is a child agent with completely isolated context. It doesn't \"see\" the parent's history or the other subagent's — exactly what we want: the Backend agent can't get confused by the YAML the Infra agent is writing.\n\n```\n# Pseudocode — how agy orchestrates this\ninvoke_subagent(\n    name=\"backend-engine\",\n    system_prompt=\"You are an expert in migrating boto3 to GCS...\",\n    toolset=[\"read_file\", \"write_to_file\", \"run_command\"],\n    workspace=\"/path/to/shadow-worktree-backend\",\n    message=\"Refactor the files from dependency-map.json\"\n)\n# Subagent B is invoked in parallel — no blocking\ninvoke_subagent(\n    name=\"infra-engine\",\n    toolset=[\"write_to_file\"],  # write only — principle of least privilege\n    workspace=\"/path/to/shadow-worktree-infra\",\n    message=\"Generate serviceaccount.yaml, deployment.yaml, ingress.yaml for GKE\"\n)\n```\n\nEach subagent operates in an isolated **Git Worktree** — a physical copy of the repository in a separate directory, on a different branch. If Subagent A introduces a bug, `main`\n\nstays untouched.\n\nThe first step is mapping the problem. `scan_deps.py`\n\nwalks the project with `os.walk()`\n\n, applies regex patterns by category, and generates a DAG (Directed Acyclic Graph) as JSON.\n\n```\n# scripts/scan_deps.py\npatterns = {\n    \"storage\": [\n        r\"google\\.cloud\\.storage\",\n        r\"boto3.*s3\",         # AWS-coupled\n        r\"aws-sdk.*s3\"\n    ],\n    \"messaging\": [\n        r\"google\\.cloud\\.pubsub\",\n        r\"boto3.*sqs\",        # AWS-coupled\n        r\"kafka-python\",\n    ],\n    \"secrets\": [\n        r\"boto3.*secretsmanager\",\n        r\"python-dotenv\",\n    ],\n    \"databases\": [\n        r\"psycopg2\", r\"pymongo\"\n    ]\n}\n\nfor root, dirs, files in os.walk(path):\n    dirs[:] = [d for d in dirs if not d.startswith('.')\n               and d not in ['venv', 'node_modules', '__pycache__']]\n    for file in files:\n        if not file.endswith(('.py', '.js', '.yaml', '.tf')):\n            continue\n        with open(os.path.join(root, file)) as f:\n            content = f.read()\n            for dep_type, pattern_list in patterns.items():\n                for pattern in pattern_list:\n                    if re.search(pattern, content, re.IGNORECASE):\n                        dependencies[dep_type].append({\n                            \"file\": os.path.relpath(file_path, path),\n                            \"matched_pattern\": pattern\n                        })\n```\n\nThe output is a `dependency-map.json`\n\nwith the full dependency graph:\n\n```\n{\n  \"dependencies\": {\n    \"storage\": [\n      { \"file\": \"examples/legacy-app/app.py\", \"matched_pattern\": \"boto3.*s3\" },\n      { \"file\": \"examples/legacy-app/api.py\",  \"matched_pattern\": \"boto3.*s3\" }\n    ],\n    \"messaging\": [\n      { \"file\": \"examples/legacy-app/worker.py\", \"matched_pattern\": \"boto3.*sqs\" }\n    ]\n  },\n  \"architectural_dag\": {\n    \"nodes\": [\n      { \"id\": \"application\",   \"type\": \"component\" },\n      { \"id\": \"dep-storage\",   \"files\": [\"app.py\", \"api.py\"] },\n      { \"id\": \"provider-aws\",  \"type\": \"cloud-provider\" }\n    ],\n    \"edges\": [\n      { \"source\": \"application\",  \"target\": \"dep-storage\",  \"relation\": \"uses_storage\"   },\n      { \"source\": \"dep-storage\",  \"target\": \"provider-aws\", \"relation\": \"coupled_to_aws\" }\n    ]\n  },\n  \"recommended_action\": \"Execute '/spawn-refactor' targeting GCP GKE\"\n}\n```\n\n❓\n\nWhy a DAG and not a plain list?The graph reveals transitive relationships:`app.py`\n\nand`worker.py`\n\nboth depend on AWS via`boto3`\n\n— so they need to be refactored together. A list would only say \"these files have boto3.\"\n\nThis was the most important design decision: how do I ensure the agent doesn't refactor the wrong file without me seeing what's happening first?\n\nThe answer lives in two places.\n\nThe `.agents/hooks.json`\n\nfile registers a `PreToolUse`\n\nhook — a command that runs **before** any `write_to_file`\n\nthe agent attempts:\n\n```\n{\n  \"hitl-production-gate\": {\n    \"enabled\": true,\n    \"PreToolUse\": [\n      {\n        \"matcher\": \"write_to_file|replace_file_content|multi_replace_file_content\",\n        \"hooks\": [\n          {\n            \"type\": \"command\",\n            \"command\": \"python3 .agents/skills/agnostic-cluster-refactor/scripts/scan_deps.py --check-only\",\n            \"timeout\": 5\n          }\n        ]\n      }\n    ]\n  }\n}\n```\n\nThe hook receives a JSON payload via stdin and responds with a decision:\n\n```\n# scan_deps.py — --check-only mode\nSAFE_WRITE_PREFIXES = (\"examples/\", \"terraform/\", \".agents/\")\n\ndef check_only_hook():\n    payload = json.load(sys.stdin)\n    target = payload.get(\"toolCall\", {}).get(\"args\", {}).get(\"TargetFile\", \"\")\n    workspace_root = payload.get(\"workspacePaths\", [\".\"])[0]\n    rel_path = os.path.relpath(target, workspace_root)\n\n    if not any(rel_path.startswith(p) for p in SAFE_WRITE_PREFIXES):\n        print(json.dumps({\n            \"decision\": \"force_ask\",\n            \"reason\": f\"[HITL Gate] '{rel_path}' is outside safe directories. Confirm before proceeding.\"\n        }))\n    else:\n        print(json.dumps({\"decision\": \"allow\"}))\n```\n\nThree possible decisions the hook can return:\n\n| Decision | Effect |\n|---|---|\n`\"allow\"` |\nAgent proceeds automatically |\n`\"force_ask\"` |\nagy pauses and asks the human |\n`\"deny\"` |\nCompletely blocked, no prompt |\n\nTesting it from the command line:\n\n```\n# File OUTSIDE safe directories\necho '{\"toolCall\":{\"name\":\"write_to_file\",\"args\":{\"TargetFile\":\"/project/src/app.py\"}},\n      \"workspacePaths\":[\"/project\"]}' | python3 scripts/scan_deps.py --check-only\n# → {\"decision\": \"force_ask\", \"reason\": \"[HITL Gate] 'src/app.py' is outside safe directories...\"}\n\n# File INSIDE safe directories\necho '{\"toolCall\":{\"name\":\"write_to_file\",\"args\":{\"TargetFile\":\"/project/examples/k8s/deployment.yaml\"}},\n      \"workspacePaths\":[\"/project\"]}' | python3 scripts/scan_deps.py --check-only\n# → {\"decision\": \"allow\"}\n```\n\nBeyond the automatic hook, the `/spawn-refactor`\n\n`SKILL.md`\n\ninstructs the agent to always ask for explicit confirmation before spawning subagents:\n\n```\n## HITL Gate — mandatory before any mutation\n\nDisplay the list of files that will be changed and ask:\n\n  The following files will be modified:\n    - examples/legacy-app/app.py    (replace boto3 → GCS)\n    - examples/legacy-app/worker.py (replace SQS → Pub/Sub)\n\n  Type YES to confirm or NO to abort.\n\nHalt if the user does not confirm with YES.\n```\n\n🛡️ Two layers of protection: the hook catches any write automatically, and the SKILL.md forces you to see the full plan before anything moves.\n\nAfter Subagent A runs, `app.py`\n\ngoes from the boto3 mess above to this:\n\n``` python\n# examples/refactored-app/app.py\nimport os\nfrom flask import Flask, request, jsonify\n\napp = Flask(__name__)\n\nDB_PASSWORD = os.getenv(\"DB_PASSWORD\")\nif not DB_PASSWORD:\n    raise RuntimeError(\"DB_PASSWORD environment variable is required!\")\n\nGCS_BUCKET_NAME = os.getenv(\"GCS_BUCKET_NAME\", \"local-mock\")\n\n# LOCAL_MOCK=true → bypasses GCS; useful for K8s plumbing tests without real credentials\nLOCAL_MOCK = os.getenv(\"LOCAL_MOCK\", \"false\").lower() == \"true\"\n\nif LOCAL_MOCK:\n    storage_client = None\n    print(\"[LOCAL_MOCK] GCS disabled. Uploads will be simulated.\")\nelse:\n    from google.cloud import storage  # import only when we actually need GCS\n    storage_client = storage.Client()  # zero credentials — ADC via Workload Identity\n\n_mock_store: dict[str, bytes] = {}\n\n@app.route(\"/health\", methods=[\"GET\"])\ndef health():\n    return jsonify({\n        \"status\": \"healthy\",\n        \"platform\": \"local-k8s\" if LOCAL_MOCK else \"gcp-gke\",\n        \"gcs_bucket\": GCS_BUCKET_NAME,\n        \"mock_mode\": LOCAL_MOCK,\n    })\n\n@app.route(\"/upload\", methods=[\"POST\"])\ndef upload_file():\n    file = request.files[\"file\"]\n    filename = file.filename\n\n    if LOCAL_MOCK:\n        data = file.read()\n        _mock_store[filename] = data\n        return jsonify({\n            \"message\": f\"[LOCAL_MOCK] {filename} stored in memory ({len(data)} bytes)\",\n            \"gcs_uri\": f\"gs://local-mock/{filename}\",\n            \"files_in_mock\": list(_mock_store.keys()),\n        })\n\n    bucket = storage_client.bucket(GCS_BUCKET_NAME)\n    blob = bucket.blob(filename)\n    blob.upload_from_file(file)\n    return jsonify({\n        \"message\": f\"Uploaded {filename} to {GCS_BUCKET_NAME}\",\n        \"gcs_uri\": f\"gs://{GCS_BUCKET_NAME}/{filename}\",\n    })\n\n@app.route(\"/files\", methods=[\"GET\"])\ndef list_files():\n    if LOCAL_MOCK:\n        return jsonify({\"files\": list(_mock_store.keys()), \"source\": \"local-mock\"})\n    blobs = storage_client.list_blobs(GCS_BUCKET_NAME)\n    return jsonify({\"files\": [b.name for b in blobs], \"source\": f\"gs://{GCS_BUCKET_NAME}\"})\n\nif __name__ == \"__main__\":\n    app.run(host=\"0.0.0.0\", port=8080, debug=LOCAL_MOCK)\n```\n\n**What changed:**\n\n| Before | After |\n|---|---|\n`import boto3` |\n`from google.cloud import storage` (conditional) |\n`boto3.client('s3', aws_access_key_id=...)` |\n`storage.Client()` — zero credentials |\n`file.save(\"/tmp/...\")` |\n`blob.upload_from_file(file)` |\n`DB_PASSWORD` with insecure default |\n`RuntimeError` if missing |\n\n```\n# ❌ Wrong — crashes at startup without GCP credentials\nfrom google.cloud import storage\nstorage_client = storage.Client()   # RuntimeError before any request is handled\n\n# ✅ Correct — import only happens when we actually need it\nif LOCAL_MOCK:\n    storage_client = None\nelse:\n    from google.cloud import storage   # ← inside the else block\n    storage_client = storage.Client()\n```\n\n`from google.cloud import storage`\n\nexecutes when Python loads the module — before serving any request. Without GCP credentials, the app crashes at startup. Moving the import inside `else`\n\nfixes it: with `LOCAL_MOCK=true`\n\n, the module is never imported.\n\nI wanted to validate the entire K8s stack (Deployment, ConfigMap, Secret, Service, health checks, routing) locally using Docker Desktop — without needing real GCP credentials.\n\nThe solution was `LOCAL_MOCK=true`\n\ncombined with a Docker Desktop quirk that catches a lot of people off guard.\n\nDocker Desktop uses **two completely separate runtimes** that don't share images:\n\n```\n┌──────────────────────────────────────┐\n│  Docker daemon                       │  ← docker build, docker images\n│  (images here are NOT visible to K8s)│\n└──────────────────────────────────────┘\n\n┌──────────────────────────────────────┐\n│  containerd                          │  ← used by the Kubernetes cluster\n│  (separate namespace)                │\n└──────────────────────────────────────┘\n```\n\nWhen you run `docker build -t my-image .`\n\n, the image exists in the Docker daemon but **not** in containerd. With `imagePullPolicy: Never`\n\n, K8s looks in containerd and fails:\n\n```\nFailed to pull image \"my-image:local\": ErrImageNeverPull\n```\n\nThe fix: a **local registry** as the bridge between both runtimes.\n\n```\n# registry:2 on port 5001 (port 5000 is taken by macOS AirPlay)\ndocker run -d -p 5001:5000 --restart=always --name local-registry registry:2\n```\n\nNow the flow works end-to-end:\n\n```\ndocker build → Docker daemon\n      ↓\ndocker tag + push → localhost:5001 → registry:2\n      ↓\ncontainerd pulls from registry:2 ← K8s Pod starts successfully\n```\n\nThe `Makefile`\n\nhandles all of this in a single command:\n\n```\nREGISTRY       = localhost:5001\nREGISTRY_IMAGE = $(REGISTRY)/agnostic-cluster-refactor:local\n\nregistry-start:\n    @docker ps --filter name=local-registry --filter status=running | grep local-registry || \\\n        docker run -d -p 5001:5000 --restart=always --name local-registry registry:2\n\nbuild: registry-start\n    docker build -t agnostic-cluster-refactor:local .\n    docker tag agnostic-cluster-refactor:local $(REGISTRY_IMAGE)\n    docker push $(REGISTRY_IMAGE)\n    @echo \"Image available to K8s: $(REGISTRY_IMAGE)\"\n\nlocal-up:\n    kubectl config use-context docker-desktop\n    kubectl apply -f examples/k8s/local/secret-db.yaml\n    kubectl apply -f examples/k8s/local/configmap.local.yaml\n    kubectl apply -f examples/k8s/local/deployment.local.yaml\n    kubectl apply -f examples/k8s/local/service.local.yaml\n    kubectl rollout status deployment/agnostic-cluster-app --timeout=60s\n    @echo \"Access: http://localhost:8080/health\"\n# examples/k8s/local/deployment.local.yaml\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: agnostic-cluster-app\nspec:\n  replicas: 1\n  template:\n    spec:\n      containers:\n        - name: app\n          image: localhost:5001/agnostic-cluster-refactor:local\n          imagePullPolicy: Always   # always pull from local registry\n          envFrom:\n            - configMapRef:\n                name: app-config-local   # injects LOCAL_MOCK=true\n          env:\n            - name: DB_PASSWORD\n              valueFrom:\n                secretKeyRef:\n                  name: app-secrets\n                  key: db-password\n          readinessProbe:\n            httpGet:\n              path: /health\n              port: 8080\n            initialDelaySeconds: 5\n# examples/k8s/local/configmap.local.yaml\napiVersion: v1\nkind: ConfigMap\nmetadata:\n  name: app-config-local\ndata:\n  GCS_BUCKET_NAME: \"local-mock\"\n  GCP_PROJECT_ID: \"local-dev\"\n  LOCAL_MOCK: \"true\"    # ← activates the in-memory store\n```\n\nRunning it:\n\n```\nmake build      # build + push to local registry\nmake local-up   # apply all manifests\n\ncurl http://localhost:8080/health\n# {\"status\":\"healthy\",\"platform\":\"local-k8s\",\"mock_mode\":true,\"gcs_bucket\":\"local-mock\"}\n\ncurl -X POST http://localhost:8080/upload -F \"file=@package.json\"\n# {\"message\":\"[LOCAL_MOCK] package.json stored in memory (842 bytes)\",\n#  \"gcs_uri\":\"gs://local-mock/package.json\"}\n\ncurl http://localhost:8080/files\n# {\"files\":[\"package.json\"],\"source\":\"local-mock\"}\n\nmake local-down  # teardown\n```\n\n✅ Entire K8s stack validated — Deployment, ConfigMap, Secret, Service, health checks, routing — without a single GCP token.\n\nOn GKE, the story is completely different.\n\n**The naive approach:**\n\n```\nos.environ[\"GOOGLE_APPLICATION_CREDENTIALS\"] = \"/app/sa-key.json\"\nstorage_client = storage.Client()\n```\n\nThis requires a JSON key file inside the container, which means:\n\n**The Workload Identity approach:** annotate a Kubernetes Service Account (KSA) with a Google Service Account (GSA) email:\n\n```\n# examples/k8s/serviceaccount.yaml\napiVersion: v1\nkind: ServiceAccount\nmetadata:\n  name: refactored-app-ksa\n  annotations:\n    iam.gke.io/gcp-service-account: \"gke-app-sa@MY_PROJECT.iam.gserviceaccount.com\"\n```\n\nGKE's internal metadata server intercepts ADC calls from Pods, verifies the annotation, and returns a short-lived OAuth2 token:\n\nThe application code becomes:\n\n```\n# Zero credentials — works automatically on GKE\nstorage_client = storage.Client()\n```\n\nTerraform provisions the IAM binding automatically:\n\n```\n# terraform/iam.tf\nresource \"google_service_account_iam_member\" \"workload_identity\" {\n  service_account_id = google_service_account.app.name\n  role               = \"roles/iam.workloadIdentityUser\"\n  member = \"serviceAccount:${var.project_id}.svc.id.goog[default/refactored-app-ksa]\"\n}\n```\n\n🔐 This binding is the handshake between the Kubernetes world and GCP IAM. Without it, no token is issued —\n\n`storage.Client()`\n\nreturns a 403.\n\nWhen I first tested, `/scan-deps`\n\nand `/spawn-refactor`\n\n**did not appear** in the `agy`\n\nautocomplete. I spent a good chunk of time debugging this.\n\nThe discovery: `agy`\n\nhas three distinct skill-loading mechanisms:\n\n| Mechanism | Location | Shows in `/` autocomplete? |\n|---|---|---|\n| Project skill | `.agents/skills/<name>/SKILL.md` |\n❌ No |\n| Global contextual skill | `~/.gemini/antigravity-cli/skills/` |\n❌ No |\nPlugin with namespace |\n`~/.gemini/config/plugins/<plugin>/` |\n✅ Yes\n|\n\nTo make the commands appear, create the plugin structure:\n\n```\nmkdir -p ~/.gemini/config/plugins/agnostic-cluster-refactor/skills/scan-deps\nmkdir -p ~/.gemini/config/plugins/agnostic-cluster-refactor/skills/spawn-refactor\n\ncat > ~/.gemini/config/plugins/agnostic-cluster-refactor/plugin.json << 'EOF'\n{\n  \"name\": \"agnostic-cluster-refactor\",\n  \"version\": \"1.0.0\",\n  \"description\": \"Migrates apps from AWS to GCP GKE with Workload Identity.\"\n}\nEOF\n```\n\nAfter restarting `agy`\n\n, the autocomplete shows:\n\n```\n/agnostic-cluster-refactor:scan-deps\n/agnostic-cluster-refactor:spawn-refactor\n```\n\nThe namespace prevents collisions — two different plugins can both have a skill named `scan-deps`\n\nand they'll appear as `/plugin-a:scan-deps`\n\nand `/plugin-b:scan-deps`\n\n.\n\nWhen I ran `/agnostic-cluster-refactor:spawn-refactor`\n\nand confirmed the HITL Gate, Gemini (the `agy`\n\nengine) orchestrated:\n\n**Subagent A (Backend) — in shadow-worktree-backend:**\n\n`dependency-map.json`\n\nto identify boto3 files`import boto3`\n\n→ `from google.cloud import storage, pubsub_v1`\n\nin each file`boto3.client('s3', ...)`\n\n→ `storage.Client().bucket(...)`\n\nwith semantically equivalent calls`boto3.client('sqs', ...)`\n\n→ `pubsub_v1.SubscriberClient()`\n\n`requirements.txt`\n\n: removed `boto3==1.28.0`\n\n, added `google-cloud-storage==2.10.0`\n\nand `google-cloud-pubsub==2.18.0`\n\n**Subagent B (Infra) — in shadow-worktree-infra:**\n\n`serviceaccount.yaml`\n\nwith the `iam.gke.io/gcp-service-account`\n\nannotation`deployment.yaml`\n\nwith env vars via ConfigMap/Secret — no hardcoded credentials`ingress.yaml`\n\nwith `ingressClassName: gce`\n\n(the current format, not the deprecated annotation)All in isolated Git Worktrees, in parallel, without touching `main`\n\n.\n\n**1. The conditional import is intentional, not lazy.**\n\nWhen `LOCAL_MOCK=true`\n\n, `from google.cloud import storage`\n\nmust not run at module level. Without GCP credentials, it throws at startup before any request is served. Import conditionally.\n\n**2. Docker Desktop K8s and the Docker daemon live in separate worlds.**\n\n`imagePullPolicy: Never`\n\nbreaks with Docker Desktop because K8s uses containerd, not the daemon. Use a local registry on port 5001 (5000 is taken by macOS) and `imagePullPolicy: Always`\n\n.\n\n**3. .agents/workflows/ does not create slash commands in agy.**\n\nSkills in `.agents/skills/`\n\nare context injections, not interactive commands. The `/`\n\nautocomplete requires a plugin installed in `~/.gemini/config/plugins/`\n\n.\n\n**4. The HITL Gate needs two independent layers.**\n\nA hook catches unexpected writes automatically. But for `/spawn-refactor`\n\n— which modifies multiple files in parallel — explicit plan confirmation in the SKILL.md is non-negotiable. Without both layers, the agent can act before you understand the blast radius.\n\n**5. Workload Identity eliminates an entire security problem class.**\n\nNo JSON keys in containers means no credential leaks in logs, no manual rotation, no hardcoded keys in Dockerfiles, and no Secret volumes mounted on Pod disk. The Metadata Server's short-lived tokens are genuinely safer.\n\n```\n# Clone\ngit clone https://github.com/carlosrgomes/agnostic-cluster-refactor\ncd agnostic-cluster-refactor\n\n# Test locally without GCP (Docker Desktop K8s)\nmake build      # build + push to local registry\nmake local-up   # apply manifests to docker-desktop context\ncurl http://localhost:8080/health\n\n# Scan your own project's dependencies\npython3 scripts/scan_deps.py /path/to/your/project\ncat dependency-map.json | python3 -m json.tool\n\n# Validate the HITL Gate hook\necho '{\"toolCall\":{\"name\":\"write_to_file\",\"args\":{\"TargetFile\":\"/project/src/main.py\"}},\n      \"workspacePaths\":[\"/project\"]}' | python3 scripts/scan_deps.py --check-only\n# → {\"decision\": \"force_ask\", ...}\n\n# Teardown\nmake local-down\n```\n\nFor the full GKE deployment with Workload Identity, the project README includes the Terraform that provisions all the infrastructure.\n\nThe project started from a real problem (boto3 everywhere) and ended up with a surprisingly complete solution: automatic dependency scanning, parallel subagent refactoring, mandatory human oversight, local K8s testing without cloud credentials, and keyless production auth.\n\nWhat impressed me most wasn't the AI doing the refactoring — it was the **supervision system design**: hooks intercepting any write outside safe directories, SKILL.md with an explicit gate before destructive actions, and Git Worktrees ensuring `main`\n\nis never touched without human review.\n\nAn autonomous agent without oversight is a chaotic script. An agent with a well-designed HITL Gate is a trustworthy teammate.\n\nTutorial técnico completo: migração autônoma de aplicações acopladas à AWS para o Google Kubernetes Engine (GKE) usando o Antigravity CLI com Workload Identity, subagentes paralelos e HITL gate.\n\n`/scan-deps`\n\n)`/spawn-refactor`\n\n)Aplicações legadas acumulam acoplamentos…", "url": "https://wpnews.pro/news/agnostic-cluster-refactor-skill-for-antigrafity-cli-building-an-ai-agent-that-to", "canonical_source": "https://dev.to/gde/agnostic-cluster-refactor-skill-for-antigrafity-cli-building-an-ai-agent-that-migrates-apps-from-e0", "published_at": "2026-06-30 15:10:30+00:00", "updated_at": "2026-06-30 15:18:54.191269+00:00", "lang": "en", "topics": ["artificial-intelligence", "ai-agents", "developer-tools", "machine-learning"], "entities": ["Antigravity CLI", "AWS", "GKE", "Workload Identity", "boto3", "Python", "Kubernetes", "S3"], "alternates": {"html": "https://wpnews.pro/news/agnostic-cluster-refactor-skill-for-antigrafity-cli-building-an-ai-agent-that-to", "markdown": "https://wpnews.pro/news/agnostic-cluster-refactor-skill-for-antigrafity-cli-building-an-ai-agent-that-to.md", "text": "https://wpnews.pro/news/agnostic-cluster-refactor-skill-for-antigrafity-cli-building-an-ai-agent-that-to.txt", "jsonld": "https://wpnews.pro/news/agnostic-cluster-refactor-skill-for-antigrafity-cli-building-an-ai-agent-that-to.jsonld"}}