{"slug": "building-your-first-developer-agent-with-openai-agents-sdk", "title": "Building Your First Developer Agent With OpenAI Agents SDK", "summary": "A developer built a cautious first-generation developer agent using the OpenAI Agents SDK that reads GitHub issues, inspects codebases, creates implementation plans, suggests tests, and generates pull request summaries—but never writes code without human approval. The agent, implemented in Python with safety gates and restricted tools, follows a conservative workflow designed to build trust by acting as an advisory system before any file modifications occur. The project provides a learning template that demonstrates how to structure tools, safety rules, and approval gates rather than a production-ready autonomous engineer.", "body_md": "A developer agent should not start by writing code. That may sound strange, but if you are building an agent for real engineering work, the first version should be cautious. It should read an issue, inspect the codebase, create a plan, suggest tests, and generate a pull request summary. Only later should it edit files. That is how you build trust.\n\nIn this article, we will build a small developer agent using the OpenAI Agents SDK. The goal is not to create a fully autonomous engineer. The goal is to build a useful workflow executor with tools, safety rules, and approval gates.\n\nThe agent will be able to read a GitHub issue, search a local codebase, inspect relevant files, create an implementation plan, run tests, generate a pull request summary, and stop before risky actions.\n\nThe examples use Python because the OpenAI Agents SDK has a Python package. Keep the code as a learning template, not a production-ready drop-in system.\n\nThe workflow looks like this:\n\n```\nGitHub Issue\n    |\nRead issue details\n    |\nSearch codebase\n    |\nRead relevant files\n    |\nCreate implementation plan\n    |\nSuggest tests\n    |\nRun selected tests\n    |\nGenerate PR summary\n    |\nHuman approval before any write action\n```\n\nThat is intentionally conservative. A first developer agent should be advisory. It should help you understand and plan before it touches code.\n\nCreate a new Python project:\n\n```\nmkdir developer-agent-demo\ncd developer-agent-demo\n\npython -m venv .venv\nsource .venv/bin/activate\n\npip install openai-agents\n```\n\nSet your API key:\n\n```\nexport OPENAI_API_KEY=\"your-api-key\"\n```\n\nA simple project structure:\n\n```\ndeveloper-agent-demo/\n  agent.py\n  tools.py\n  github_client.py\n  safety.py\n  target-repo/\n```\n\nFor the tutorial, `target-repo/`\n\nis the repository your agent will inspect.\n\nTools are the most important part of an agent. The model should not get unlimited shell access. It should get specific tools.\n\nCreate `tools.py`\n\n:\n\n``` python\nfrom pathlib import Path\nimport subprocess\nfrom typing import List\n\nREPO_ROOT = Path(\"target-repo\").resolve()\n\ndef safe_path(relative_path: str) -> Path:\n    path = (REPO_ROOT / relative_path).resolve()\n\n    if not str(path).startswith(str(REPO_ROOT)):\n        raise ValueError(\"Path is outside the repository.\")\n\n    return path\n\ndef search_codebase(query: str, max_results: int = 20) -> List[str]:\n    \"\"\"\n    Search the repository using grep.\n    In production, you may replace this with ripgrep, embeddings,\n    or a code search service.\n    \"\"\"\n    result = subprocess.run(\n        [\"grep\", \"-R\", \"-n\", query, str(REPO_ROOT)],\n        text=True,\n        capture_output=True,\n    )\n\n    lines = result.stdout.splitlines()\n\n    return lines[:max_results]\n\ndef read_file(relative_path: str, max_chars: int = 12000) -> str:\n    path = safe_path(relative_path)\n\n    if not path.is_file():\n        raise FileNotFoundError(relative_path)\n\n    content = path.read_text(errors=\"replace\")\n\n    return content[:max_chars]\n\ndef run_tests(test_command: str) -> str:\n    \"\"\"\n    Run only approved test commands.\n    Keep this strict. Do not let the model run arbitrary shell commands.\n    \"\"\"\n    allowed_commands = {\n        \"pytest\": [\"pytest\"],\n        \"phpunit\": [\"vendor/bin/phpunit\"],\n        \"npm-test\": [\"npm\", \"test\"],\n    }\n\n    if test_command not in allowed_commands:\n        raise ValueError(f\"Test command is not allowed: {test_command}\")\n\n    result = subprocess.run(\n        allowed_commands[test_command],\n        cwd=REPO_ROOT,\n        text=True,\n        capture_output=True,\n        timeout=120,\n    )\n\n    return result.stdout + \"\\n\" + result.stderr\n```\n\nThis file shows the main safety idea:\n\n```\nThe agent can only do what your tools allow.\n```\n\nIt can search, read, and run approved test commands. It cannot delete files, push commits, or deploy.\n\nFor a real GitHub integration, you would call the GitHub REST API or use an SDK. For this tutorial, keep it simple.\n\nCreate `github_client.py`\n\n:\n\n``` python\nfrom dataclasses import dataclass\n\n@dataclass\nclass GitHubIssue:\n    number: int\n    title: str\n    body: str\n    labels: list[str]\n\ndef read_issue(issue_number: int) -> GitHubIssue:\n    \"\"\"\n    Demo implementation.\n\n    In production, replace this with a GitHub API call.\n    Keep authentication and rate limits in mind.\n    \"\"\"\n    return GitHubIssue(\n        number=issue_number,\n        title=\"Fix duplicate invoice reminder emails\",\n        body=(\n            \"Customers sometimes receive duplicate invoice reminder emails. \"\n            \"This seems to happen when the scheduled reminder command and \"\n            \"the invoice overdue event run close together.\"\n        ),\n        labels=[\"bug\", \"billing\"],\n    )\n```\n\nThis gives the agent a realistic issue.\n\nNow create `agent.py`\n\n. The exact API shape may evolve over time, so treat this as a practical learning example. The important concepts are stable: define instructions, expose tools, run the agent, and inspect the result.\n\n``` python\nfrom agents import Agent, Runner, function_tool\n\nfrom github_client import read_issue\nfrom tools import search_codebase, read_file, run_tests\n\n@function_tool\ndef get_github_issue(issue_number: int) -> str:\n    issue = read_issue(issue_number)\n\n    return f\"\"\"\nIssue #{issue.number}: {issue.title}\n\nLabels: {\", \".join(issue.labels)}\n\nBody:\n{issue.body}\n\"\"\"\n\n@function_tool\ndef search_repo(query: str) -> str:\n    results = search_codebase(query)\n\n    if not results:\n        return \"No matches found.\"\n\n    return \"\\n\".join(results)\n\n@function_tool\ndef read_repo_file(path: str) -> str:\n    return read_file(path)\n\n@function_tool\ndef run_approved_tests(command: str) -> str:\n    return run_tests(command)\n\ndeveloper_agent = Agent(\n    name=\"Developer Planning Agent\",\n    instructions=\"\"\"\nYou are a careful senior software engineering assistant.\n\nYour job is to analyze GitHub issues and create safe implementation plans.\n\nRules:\n- Do not edit files.\n- Do not invent files you have not inspected.\n- Use repository search before making claims about code.\n- Prefer tests before implementation.\n- Preserve public APIs unless the issue explicitly requires changing them.\n- Explain risks and assumptions.\n- Ask for human approval before any write action.\n- If context is missing, say what is missing.\n\nOutput format:\n## Issue Summary\n## Relevant Code Areas\n## Current Behavior Hypothesis\n## Implementation Plan\n## Tests To Add Or Run\n## Risks And Approval Gates\n## Pull Request Summary Draft\n\"\"\",\n    tools=[\n        get_github_issue,\n        search_repo,\n        read_repo_file,\n        run_approved_tests,\n    ],\n)\n\nif __name__ == \"__main__\":\n    result = Runner.run_sync(\n        developer_agent,\n        \"Analyze GitHub issue #123 and create an implementation plan.\",\n    )\n\n    print(result.final_output)\n```\n\nThis agent is intentionally not allowed to write files. That is the right first step.\n\nThe prompt you pass to the agent matters.\n\nWeak task:\n\n```\nFix issue #123.\n```\n\nBetter task:\n\n```\nAnalyze GitHub issue #123.\n\nSteps:\n1. Read the issue.\n2. Search the repository for related invoice reminder code.\n3. Inspect the most relevant files.\n4. Explain the likely duplicate email paths.\n5. Suggest characterization tests.\n6. Create a small implementation plan.\n7. Do not edit files.\n```\n\nThis keeps the workflow safe.\n\nA good output might look like this:\n\n```\n## Issue Summary\n\nIssue #123 reports duplicate invoice reminder emails. The description suggests\nthat both a scheduled command and an invoice overdue event may send reminders.\n\n## Relevant Code Areas\n\n- app/Console/Commands/SendInvoiceReminders.php\n- app/Listeners/InvoiceOverdueListener.php\n- app/Jobs/SendInvoiceReminderEmail.php\n- app/Models/Invoice.php\n\n## Current Behavior Hypothesis\n\nThe scheduled command scans overdue invoices and dispatches reminder jobs.\nThe event listener may also dispatch the same reminder when an invoice status\nchanges to overdue. If both paths run close together, the same invoice can\nreceive two reminder jobs.\n\n## Implementation Plan\n\n1. Add characterization tests for the scheduled command path.\n2. Add characterization tests for the event listener path.\n3. Introduce an idempotency check using `last_reminder_sent_at` or an existing reminder log.\n4. Ensure both paths use the same reminder service.\n5. Keep event names and email payload shape unchanged.\n6. Run billing-related tests.\n\n## Tests To Add Or Run\n\n- test_scheduled_command_does_not_send_duplicate_reminder\n- test_invoice_overdue_event_does_not_duplicate_recent_reminder\n- test_reminder_can_be_sent_again_after_allowed_interval\n\n## Risks And Approval Gates\n\nHigh risk: Billing/customer communication.\nApproval required before changing reminder send conditions.\n\n## Pull Request Summary Draft\n\nThis change centralizes invoice reminder duplicate prevention so scheduled\ncommands and invoice overdue events share the same idempotency rule.\n```\n\nThat is useful before any code is generated.\n\nNow let's define a simple approval gate. Create `safety.py`\n\n:\n\n``` python\nfrom dataclasses import dataclass\n\n@dataclass\nclass ApprovalRequest:\n    action: str\n    reason: str\n    risk_level: str\n\ndef require_human_approval(request: ApprovalRequest) -> bool:\n    print(\"\\nApproval required\")\n    print(f\"Action: {request.action}\")\n    print(f\"Reason: {request.reason}\")\n    print(f\"Risk level: {request.risk_level}\")\n\n    answer = input(\"Approve? Type 'yes' to continue: \")\n\n    return answer.strip().lower() == \"yes\"\n```\n\nIf you later add file-writing tools, wrap them:\n\n``` python\nfrom agents import function_tool\nfrom pathlib import Path\n\nfrom safety import ApprovalRequest, require_human_approval\nfrom tools import safe_path\n\n@function_tool\ndef write_repo_file(path: str, content: str, reason: str) -> str:\n    approved = require_human_approval(\n        ApprovalRequest(\n            action=f\"write file {path}\",\n            reason=reason,\n            risk_level=\"medium\",\n        )\n    )\n\n    if not approved:\n        return \"Write action was not approved.\"\n\n    file_path = safe_path(path)\n    file_path.write_text(content)\n\n    return f\"Updated {path}\"\n```\n\nThis is the line between helpful automation and dangerous automation. The agent may propose a write. Your software decides whether it is allowed.\n\nSome files should always require approval. For example:\n\n```\nHIGH_RISK_PATTERNS = [\n    \"database/migrations/\",\n    \"app/Auth/\",\n    \"app/Payments/\",\n    \"app/Billing/\",\n    \".github/workflows/\",\n]\n\ndef classify_file_risk(path: str) -> str:\n    if any(pattern in path for pattern in HIGH_RISK_PATTERNS):\n        return \"high\"\n\n    return \"medium\"\n```\n\nThen use it before writes:\n\n``` php\n@function_tool\ndef write_repo_file(path: str, content: str, reason: str) -> str:\n    risk_level = classify_file_risk(path)\n\n    approved = require_human_approval(\n        ApprovalRequest(\n            action=f\"write file {path}\",\n            reason=reason,\n            risk_level=risk_level,\n        )\n    )\n\n    if not approved:\n        return \"Write action was not approved.\"\n\n    file_path = safe_path(path)\n    file_path.write_text(content)\n\n    return f\"Updated {path}\"\n```\n\nThis is simple, but very practical. You can evolve it later into policy rules.\n\nA developer agent does not need to create the PR itself to be useful. It can generate the summary.\n\nPrompt:\n\n```\nBased on the implementation plan and test results, generate a pull request summary.\n\nInclude:\n- problem,\n- solution,\n- files changed,\n- behavior impact,\n- tests,\n- risks,\n- rollback notes.\n```\n\nExample summary:\n\n```\n## Problem\n\nInvoice reminder emails could be dispatched by both the scheduled reminder\ncommand and the invoice overdue event listener, causing duplicate customer emails.\n\n## Solution\n\nThis change centralizes reminder dispatch through a shared service and adds\nan idempotency check before sending reminder jobs.\n\n## Behavior Impact\n\nCustomers should receive at most one reminder within the configured reminder\nwindow. Existing email payloads and event names are unchanged.\n\n## Tests\n\n- Added coverage for scheduled command reminder dispatch.\n- Added coverage for invoice overdue event dispatch.\n- Added duplicate-prevention test for recent reminders.\n\n## Risks\n\nBilling communication behavior is customer-facing. This PR should be reviewed\ncarefully before deployment.\n\n## Rollback\n\nRevert the shared reminder service change and restore previous dispatch paths.\n```\n\nThis saves reviewer time. It also forces the agent to explain behavior impact.\n\nThe earlier `run_tests`\n\ntool only accepts named commands. That is not as flexible as a shell, but it is safer.\n\nYou can extend it:\n\n```\nALLOWED_TEST_TARGETS = {\n    \"billing\": [\"vendor/bin/phpunit\", \"tests/Feature/Billing\"],\n    \"unit\": [\"vendor/bin/phpunit\", \"tests/Unit\"],\n    \"frontend\": [\"npm\", \"test\"],\n}\n\ndef run_test_target(target: str) -> str:\n    if target not in ALLOWED_TEST_TARGETS:\n        raise ValueError(f\"Unknown test target: {target}\")\n\n    result = subprocess.run(\n        ALLOWED_TEST_TARGETS[target],\n        cwd=REPO_ROOT,\n        text=True,\n        capture_output=True,\n        timeout=180,\n    )\n\n    return result.stdout + \"\\n\" + result.stderr\n```\n\nThen expose `run_test_target`\n\nas a tool. The model can choose `\"billing\"`\n\nor `\"unit\"`\n\n, but it cannot run arbitrary commands like:\n\n```\nrm -rf /\n```\n\nThat is the whole point.\n\nA developer agent should be auditable. At minimum, log issue number, tools called, files read, tests run, approvals requested, final output, and errors.\n\nSimple example:\n\n``` python\nimport json\nfrom datetime import datetime\n\ndef log_agent_event(event_type: str, payload: dict) -> None:\n    record = {\n        \"time\": datetime.utcnow().isoformat(),\n        \"event_type\": event_type,\n        \"payload\": payload,\n    }\n\n    with open(\"agent.log\", \"a\") as file:\n        file.write(json.dumps(record) + \"\\n\")\n```\n\nUse it inside your tools:\n\n``` php\ndef search_codebase(query: str, max_results: int = 20) -> list[str]:\n    log_agent_event(\"search_codebase\", {\"query\": query})\n\n    ...\n```\n\nLogs turn agent behavior from mystery into inspectable workflow.\n\nDo not give your first developer agent broad shell access. Do not let it push branches without approval. Do not let it merge pull requests. Do not let it edit migrations, auth, payment, or security files without a human checkpoint. Do not let it access production secrets. Do not let it decide whether its own work is safe.\n\nStart with advisory behavior:\n\n``` php\nRead -> Analyze -> Plan -> Suggest Tests -> Summarize\n```\n\nThen slowly add:\n\n``` php\nEdit low-risk files -> Run tests -> Create draft PR\n```\n\nThen maybe:\n\n```\nOpen PR with human approval\n```\n\nThat progression is much safer.\n\nFor a real internal developer agent, you may eventually have:\n\n```\nFrontend UI\n  |\nAgent service\n  |\nPolicy engine\n  |\nTool layer\n  |\nGitHub API / Code search / CI / Docs / Ticket system\n  |\nAudit logs and traces\n```\n\nYou can also split responsibilities by purpose: an issue triage agent, a codebase search agent, a test planning agent, a PR summary agent, and a documentation update agent.\n\nA single giant agent is harder to control. Small agents with clear tools are easier to trust.\n\nYour first developer agent does not need to be autonomous. It needs to be useful. A useful first version can read an issue, inspect the code, explain likely behavior, suggest tests, create a safe implementation plan, and write a good PR summary. That alone can save real time.\n\nThe most important principle is simple:\n\n```\nGive the agent tools, but keep the boundaries.\n```\n\nAgents become powerful when they can act. They become safe when your software controls how they act. Start small. Log everything. Add approval gates. Treat the agent like a workflow executor, not a magical developer.\n\nThat is how you build something your team can actually use.\n\n*Originally published at nazarboyko.com.*", "url": "https://wpnews.pro/news/building-your-first-developer-agent-with-openai-agents-sdk", "canonical_source": "https://dev.to/nazar_boyko/building-your-first-developer-agent-with-openai-agents-sdk-5egg", "published_at": "2026-06-02 23:51:48+00:00", "updated_at": "2026-06-03 00:12:15.880027+00:00", "lang": "en", "topics": ["ai-agents", "artificial-intelligence", "ai-tools", "ai-safety", "large-language-models"], "entities": ["OpenAI Agents SDK", "GitHub", "Python"], "alternates": {"html": "https://wpnews.pro/news/building-your-first-developer-agent-with-openai-agents-sdk", "markdown": "https://wpnews.pro/news/building-your-first-developer-agent-with-openai-agents-sdk.md", "text": "https://wpnews.pro/news/building-your-first-developer-agent-with-openai-agents-sdk.txt", "jsonld": "https://wpnews.pro/news/building-your-first-developer-agent-with-openai-agents-sdk.jsonld"}}