Building Your First Developer Agent With OpenAI Agents SDK

A developer built a cautious first-generation developer agent using the OpenAI Agents SDK that reads GitHub issues, inspects codebases, creates implementation plans, suggests tests, and generates pull request summaries—but never writes code without human approval. The agent, implemented in Python with safety gates and restricted tools, follows a conservative workflow designed to build trust by acting as an advisory system before any file modifications occur. The project provides a learning template that demonstrates how to structure tools, safety rules, and approval gates rather than a production-ready autonomous engineer.

A developer agent should not start by writing code. That may sound strange, but if you are building an agent for real engineering work, the first version should be cautious. It should read an issue, inspect the codebase, create a plan, suggest tests, and generate a pull request summary. Only later should it edit files. That is how you build trust. In this article, we will build a small developer agent using the OpenAI Agents SDK. The goal is not to create a fully autonomous engineer. The goal is to build a useful workflow executor with tools, safety rules, and approval gates. The agent will be able to read a GitHub issue, search a local codebase, inspect relevant files, create an implementation plan, run tests, generate a pull request summary, and stop before risky actions. The examples use Python because the OpenAI Agents SDK has a Python package. Keep the code as a learning template, not a production-ready drop-in system. The workflow looks like this: GitHub Issue | Read issue details | Search codebase | Read relevant files | Create implementation plan | Suggest tests | Run selected tests | Generate PR summary | Human approval before any write action That is intentionally conservative. A first developer agent should be advisory. It should help you understand and plan before it touches code. Create a new Python project: mkdir developer-agent-demo cd developer-agent-demo python -m venv .venv source .venv/bin/activate pip install openai-agents Set your API key: export OPENAI API KEY="your-api-key" A simple project structure: developer-agent-demo/ agent.py tools.py github client.py safety.py target-repo/ For the tutorial, target-repo/ is the repository your agent will inspect. Tools are the most important part of an agent. The model should not get unlimited shell access. It should get specific tools. Create tools.py : python from pathlib import Path import subprocess from typing import List REPO ROOT = Path "target-repo" .resolve def safe path relative path: str - Path: path = REPO ROOT / relative path .resolve if not str path .startswith str REPO ROOT : raise ValueError "Path is outside the repository." return path def search codebase query: str, max results: int = 20 - List str : """ Search the repository using grep. In production, you may replace this with ripgrep, embeddings, or a code search service. """ result = subprocess.run "grep", "-R", "-n", query, str REPO ROOT , text=True, capture output=True, lines = result.stdout.splitlines return lines :max results def read file relative path: str, max chars: int = 12000 - str: path = safe path relative path if not path.is file : raise FileNotFoundError relative path content = path.read text errors="replace" return content :max chars def run tests test command: str - str: """ Run only approved test commands. Keep this strict. Do not let the model run arbitrary shell commands. """ allowed commands = { "pytest": "pytest" , "phpunit": "vendor/bin/phpunit" , "npm-test": "npm", "test" , } if test command not in allowed commands: raise ValueError f"Test command is not allowed: {test command}" result = subprocess.run allowed commands test command , cwd=REPO ROOT, text=True, capture output=True, timeout=120, return result.stdout + "\n" + result.stderr This file shows the main safety idea: The agent can only do what your tools allow. It can search, read, and run approved test commands. It cannot delete files, push commits, or deploy. For a real GitHub integration, you would call the GitHub REST API or use an SDK. For this tutorial, keep it simple. Create github client.py : python from dataclasses import dataclass @dataclass class GitHubIssue: number: int title: str body: str labels: list str def read issue issue number: int - GitHubIssue: """ Demo implementation. In production, replace this with a GitHub API call. Keep authentication and rate limits in mind. """ return GitHubIssue number=issue number, title="Fix duplicate invoice reminder emails", body= "Customers sometimes receive duplicate invoice reminder emails. " "This seems to happen when the scheduled reminder command and " "the invoice overdue event run close together." , labels= "bug", "billing" , This gives the agent a realistic issue. Now create agent.py . The exact API shape may evolve over time, so treat this as a practical learning example. The important concepts are stable: define instructions, expose tools, run the agent, and inspect the result. python from agents import Agent, Runner, function tool from github client import read issue from tools import search codebase, read file, run tests @function tool def get github issue issue number: int - str: issue = read issue issue number return f""" Issue {issue.number}: {issue.title} Labels: {", ".join issue.labels } Body: {issue.body} """ @function tool def search repo query: str - str: results = search codebase query if not results: return "No matches found." return "\n".join results @function tool def read repo file path: str - str: return read file path @function tool def run approved tests command: str - str: return run tests command developer agent = Agent name="Developer Planning Agent", instructions=""" You are a careful senior software engineering assistant. Your job is to analyze GitHub issues and create safe implementation plans. Rules: - Do not edit files. - Do not invent files you have not inspected. - Use repository search before making claims about code. - Prefer tests before implementation. - Preserve public APIs unless the issue explicitly requires changing them. - Explain risks and assumptions. - Ask for human approval before any write action. - If context is missing, say what is missing. Output format: Issue Summary Relevant Code Areas Current Behavior Hypothesis Implementation Plan Tests To Add Or Run Risks And Approval Gates Pull Request Summary Draft """, tools= get github issue, search repo, read repo file, run approved tests, , if name == " main ": result = Runner.run sync developer agent, "Analyze GitHub issue 123 and create an implementation plan.", print result.final output This agent is intentionally not allowed to write files. That is the right first step. The prompt you pass to the agent matters. Weak task: Fix issue 123. Better task: Analyze GitHub issue 123. Steps: 1. Read the issue. 2. Search the repository for related invoice reminder code. 3. Inspect the most relevant files. 4. Explain the likely duplicate email paths. 5. Suggest characterization tests. 6. Create a small implementation plan. 7. Do not edit files. This keeps the workflow safe. A good output might look like this: Issue Summary Issue 123 reports duplicate invoice reminder emails. The description suggests that both a scheduled command and an invoice overdue event may send reminders. Relevant Code Areas - app/Console/Commands/SendInvoiceReminders.php - app/Listeners/InvoiceOverdueListener.php - app/Jobs/SendInvoiceReminderEmail.php - app/Models/Invoice.php Current Behavior Hypothesis The scheduled command scans overdue invoices and dispatches reminder jobs. The event listener may also dispatch the same reminder when an invoice status changes to overdue. If both paths run close together, the same invoice can receive two reminder jobs. Implementation Plan 1. Add characterization tests for the scheduled command path. 2. Add characterization tests for the event listener path. 3. Introduce an idempotency check using last reminder sent at or an existing reminder log. 4. Ensure both paths use the same reminder service. 5. Keep event names and email payload shape unchanged. 6. Run billing-related tests. Tests To Add Or Run - test scheduled command does not send duplicate reminder - test invoice overdue event does not duplicate recent reminder - test reminder can be sent again after allowed interval Risks And Approval Gates High risk: Billing/customer communication. Approval required before changing reminder send conditions. Pull Request Summary Draft This change centralizes invoice reminder duplicate prevention so scheduled commands and invoice overdue events share the same idempotency rule. That is useful before any code is generated. Now let's define a simple approval gate. Create safety.py : python from dataclasses import dataclass @dataclass class ApprovalRequest: action: str reason: str risk level: str def require human approval request: ApprovalRequest - bool: print "\nApproval required" print f"Action: {request.action}" print f"Reason: {request.reason}" print f"Risk level: {request.risk level}" answer = input "Approve? Type 'yes' to continue: " return answer.strip .lower == "yes" If you later add file-writing tools, wrap them: python from agents import function tool from pathlib import Path from safety import ApprovalRequest, require human approval from tools import safe path @function tool def write repo file path: str, content: str, reason: str - str: approved = require human approval ApprovalRequest action=f"write file {path}", reason=reason, risk level="medium", if not approved: return "Write action was not approved." file path = safe path path file path.write text content return f"Updated {path}" This is the line between helpful automation and dangerous automation. The agent may propose a write. Your software decides whether it is allowed. Some files should always require approval. For example: HIGH RISK PATTERNS = "database/migrations/", "app/Auth/", "app/Payments/", "app/Billing/", ".github/workflows/", def classify file risk path: str - str: if any pattern in path for pattern in HIGH RISK PATTERNS : return "high" return "medium" Then use it before writes: php @function tool def write repo file path: str, content: str, reason: str - str: risk level = classify file risk path approved = require human approval ApprovalRequest action=f"write file {path}", reason=reason, risk level=risk level, if not approved: return "Write action was not approved." file path = safe path path file path.write text content return f"Updated {path}" This is simple, but very practical. You can evolve it later into policy rules. A developer agent does not need to create the PR itself to be useful. It can generate the summary. Prompt: Based on the implementation plan and test results, generate a pull request summary. Include: - problem, - solution, - files changed, - behavior impact, - tests, - risks, - rollback notes. Example summary: Problem Invoice reminder emails could be dispatched by both the scheduled reminder command and the invoice overdue event listener, causing duplicate customer emails. Solution This change centralizes reminder dispatch through a shared service and adds an idempotency check before sending reminder jobs. Behavior Impact Customers should receive at most one reminder within the configured reminder window. Existing email payloads and event names are unchanged. Tests - Added coverage for scheduled command reminder dispatch. - Added coverage for invoice overdue event dispatch. - Added duplicate-prevention test for recent reminders. Risks Billing communication behavior is customer-facing. This PR should be reviewed carefully before deployment. Rollback Revert the shared reminder service change and restore previous dispatch paths. This saves reviewer time. It also forces the agent to explain behavior impact. The earlier run tests tool only accepts named commands. That is not as flexible as a shell, but it is safer. You can extend it: ALLOWED TEST TARGETS = { "billing": "vendor/bin/phpunit", "tests/Feature/Billing" , "unit": "vendor/bin/phpunit", "tests/Unit" , "frontend": "npm", "test" , } def run test target target: str - str: if target not in ALLOWED TEST TARGETS: raise ValueError f"Unknown test target: {target}" result = subprocess.run ALLOWED TEST TARGETS target , cwd=REPO ROOT, text=True, capture output=True, timeout=180, return result.stdout + "\n" + result.stderr Then expose run test target as a tool. The model can choose "billing" or "unit" , but it cannot run arbitrary commands like: rm -rf / That is the whole point. A developer agent should be auditable. At minimum, log issue number, tools called, files read, tests run, approvals requested, final output, and errors. Simple example: python import json from datetime import datetime def log agent event event type: str, payload: dict - None: record = { "time": datetime.utcnow .isoformat , "event type": event type, "payload": payload, } with open "agent.log", "a" as file: file.write json.dumps record + "\n" Use it inside your tools: php def search codebase query: str, max results: int = 20 - list str : log agent event "search codebase", {"query": query} ... Logs turn agent behavior from mystery into inspectable workflow. Do not give your first developer agent broad shell access. Do not let it push branches without approval. Do not let it merge pull requests. Do not let it edit migrations, auth, payment, or security files without a human checkpoint. Do not let it access production secrets. Do not let it decide whether its own work is safe. Start with advisory behavior: php Read - Analyze - Plan - Suggest Tests - Summarize Then slowly add: php Edit low-risk files - Run tests - Create draft PR Then maybe: Open PR with human approval That progression is much safer. For a real internal developer agent, you may eventually have: Frontend UI | Agent service | Policy engine | Tool layer | GitHub API / Code search / CI / Docs / Ticket system | Audit logs and traces You can also split responsibilities by purpose: an issue triage agent, a codebase search agent, a test planning agent, a PR summary agent, and a documentation update agent. A single giant agent is harder to control. Small agents with clear tools are easier to trust. Your first developer agent does not need to be autonomous. It needs to be useful. A useful first version can read an issue, inspect the code, explain likely behavior, suggest tests, create a safe implementation plan, and write a good PR summary. That alone can save real time. The most important principle is simple: Give the agent tools, but keep the boundaries. Agents become powerful when they can act. They become safe when your software controls how they act. Start small. Log everything. Add approval gates. Treat the agent like a workflow executor, not a magical developer. That is how you build something your team can actually use. Originally published at nazarboyko.com.