# I Built an AI Agent That Hunts Jobs Autonomously — Here's What Actually Worked

> Source: <https://dev.to/tushar_sangwan_25f0bd5499/i-built-an-ai-agent-that-hunts-jobs-autonomously-heres-what-actually-worked-48id>
> Published: 2026-06-27 15:47:39+00:00

Six months ago I started building an AI agent to automate job searching. Not a CV optimiser, not a job board aggregator — an agent that opens a browser, searches multiple job boards, reads each listing, scores it against a candidate's CV, and manages the whole pipeline without human input. This is the story of what broke, what surprised me, and what the architecture looks like now.

If you're working on AI job search automation or agentic systems in general, some of this will be familiar. Some of it will save you a week.

The initial prototype was embarrassingly naive. I fed a job description and a CV into a prompt and asked GPT to give a match score. It confidently returned 87% for a senior staff engineer role when the candidate had two years of experience. The problem was obvious in retrospect: LLMs are optimistic. Without hard constraints, they find reasons to match rather than reasons to reject.

Version two introduced structured scoring. I broke the evaluation into five weighted dimensions: skills overlap, seniority fit, location, salary band, and role type. Each dimension scored independently before an aggregate was calculated. This alone dropped false positives by around 60%.

But the bigger problem wasn't scoring. It was data quality.

Job boards do not want to be scraped. They use dynamic rendering, honeypot fields, login walls, and CAPTCHAs. Naive `fetch`

calls return skeleton HTML or bot detection pages.

The approach that actually worked: Playwright with human-like interaction patterns. Bezier curve mouse movements, randomised delays between keystrokes, realistic viewport sizes, and session cookies that persist across requests. Listings scan first (grab URLs and metadata), then a detail scrape runs on each URL. Staggering these requests matters. Hitting 50 job pages in 10 seconds will get you blocked. Spreading those same 50 across 8 minutes with variance in delay does not.

The browser automation runs on a separate server process at `:3001`

/`:3002`

. The main Next.js app at `:3000`

queues scrape jobs via BullMQ. This decoupling was essential — scraping is slow and stateful, and you do not want it blocking your API layer. A single failed Playwright instance should not take down your chat interface.

I run three processes simultaneously:

```
npm run dev           # Next.js :3000
npm run browser-server  # Browser automation :3001/:3002
npm run workers       # BullMQ job/gmail queues
```

Keeping them separate means I can restart the browser server mid-session without disrupting anything else.

Each job goes through a pipeline before the user ever sees it. Here's the simplified flow:

`(company + title + location)`

to avoid re-scoring the same job twiceThe scoring prompt is where most of the iteration happened. Early versions used a single long prompt with instructions, CV text, and job description all concatenated. Context got messy. The fix was to cap CV text at 6,000 characters (most CVs are 3–5KB; the 12,000 character cap I started with was wasting roughly 1,500 tokens per turn) and to load the CV context only when the trigger is `cv_review`

, not on every single turn.

At 1,000 concurrent users doing 30 turns per day, every kilobyte cut from the prompt saves millions of tokens daily. Prompt efficiency is not a nice-to-have at scale — it is the cost model.

**Failure 1: The hallucination pipeline.** An early version of the `get_pipeline`

tool returned results from the LLM's text output rather than the actual database query. Users saw jobs that did not exist. The fix was simple but embarrassing: always render the real tool result, never let the LLM narrate its own version of structured data.

**Failure 2: Uncapped memory files.** The agent writes to per-user memory files (context notes, long-term memory, evidence logs). Without caps, these grew without bound. After three weeks of testing, some files had thousands of lines. I capped them at 50–100 entries with a sliding window. The files are per-user, namespaced under `agents/atlas/users/{userId}/`

, so there is no cross-user leakage. But unbounded growth will kill performance quietly.

**Failure 3: Synchronous Vertex AI calls in request handlers.** Early API routes called Vertex AI directly and blocked the response. A slow model response (sometimes 8–12 seconds) locked the handler. Everything expensive now goes through BullMQ. The API enqueues, the worker processes, the result lands in the DB, and the client polls. Boring but correct.

You can read more about how the agent handles these edge cases in the [Atlas Job OS guide on AI job application automation](https://atlasjob.tech/guides/ai-job-application-bot-uk) — I went into more depth on the bot detection side there.

If you are building something similar, here is what I would tell you to do differently from the start:

**Separate your browser automation from your API layer immediately.** Do not run Playwright inside your Next.js API routes. It will cause you pain.

**Pessimistic scoring beats optimistic scoring.** Build in reasons to reject first. A high false-positive rate destroys user trust faster than a conservative match rate does.

**Cap all LLM context aggressively.** Every token in the prompt is a cost that compounds with users and turns. Audit your prompts with real numbers, not vibes.

**Redis for rate limiting and per-user state.** Every user action that mutates anything should go through a rate limiter. Every Redis key should be namespaced `{scope}:{userId}:{key}`

with a TTL. No exceptions.

**Human-like browser behaviour is not optional.** If you are scraping job boards at scale, you are in an arms race with detection systems. Bezier mouse curves and randomised delays are the minimum viable approach.

The tool I built for this — [Atlas Job OS](https://atlasjob.tech) — is a live SaaS product now. It is not finished. The scoring is still not where I want it, outreach automation is in progress, and the CV generation is rough. But the core loop works: search, score, pipeline, outreach. Building it taught me more about agentic system design than any tutorial I read.

The hardest part was not the AI. It was the plumbing.

Tushar is the founder of [Atlas Job OS](https://atlasjob.tech), an AI agent platform for autonomous job searching, CV scoring, and application management. He builds in public and writes about agentic systems, LLM architecture, and the realities of scaling AI products. You can reach him at [team@atlasjob.tech](mailto:team@atlasjob.tech).
