{"slug": "polling-agents-in-ai-assistants-11-implementation-patterns", "title": "Polling Agents in AI Assistants: 11 Implementation Patterns", "summary": "A new guide outlines 11 implementation patterns for polling agents in AI assistants, enabling proactive background monitoring of sources like inboxes, task lists, and GitHub issues. The patterns emphasize using deterministic infrastructure for time and state management while reserving language models for semantic interpretation. This approach transforms AI assistants from reactive chatbots into proactive background processes that act on users' behalf.", "body_md": "# Polling Agents in AI Assistants: 11 Implementation Patterns\n\nReliable polling patterns for AI agents.\n\nPolling agents are one of the least glamorous parts of AI assistant architecture, but they are also one of the most useful.\n\nA normal chat assistant waits for the user to ask something. A polling agent keeps watching. It checks a source, notices changes, decides whether anything matters, and then acts. That action may be a notification, a summary, a draft, a tool call, or a full workflow.\n\nThis is how an assistant moves from “answer my question” to “keep an eye on this for me.” Instead of being reactive, it becomes a background process that notices things on the user’s behalf and acts when conditions are met.\n\nThe important design point is simple: do not make the language model responsible for time, state, retries, or locking. Use normal backend infrastructure for that. Use the model where it is valuable: interpreting messy context, making semantic judgments, and producing useful language.\n\n## What Is a Polling Agent?\n\nA polling agent is a background process that repeatedly checks a source and triggers an assistant action when a condition is met. In the broader [AI Systems](https://www.glukhov.org/ai-systems/) stack — where the assistant combines an LLM, memory, tooling, routing, and observability — the polling layer is what makes the assistant proactive rather than purely reactive. For the full five-layer picture, see [AI Assistant Architecture: LLM, Memory, Tools, Routing, Observability](https://www.glukhov.org/ai-systems/architecture/ai-assistant-architecture/).\n\nExamples:\n\n- Check an inbox every morning and summarize important messages.\n- Watch a Notion task list and execute the next todo item.\n- Monitor a GitHub issue until it changes status.\n- Poll a long-running AI job until the result is ready.\n- Check a booking slot until one becomes available.\n- Watch a supplier portal until a document appears.\n- Scan new research papers once per week and summarize relevant ones.\n\nA practical polling agent has five responsibilities:\n\n- Wake up at the right time.\n- Read from the source.\n- Remember what it has already seen.\n- Decide whether the new state matters.\n- Act once, safely, without repeating itself.\n\nA typical production flow looks like this:\n\n``` php\nscheduler\n  -> polling worker\n  -> source system\n  -> state store\n  -> deterministic filters\n  -> optional LLM evaluation\n  -> assistant action\n```\n\nThis structure is boring in the best possible way. Boring systems are easier to debug at 2 AM.\n\n## The State Every Polling Agent Needs\n\nPolling agents need durable state. Conversation history is not enough. The assistant may remember the conversation, but the system needs a reliable operational record.\n\nA good polling state record usually contains:\n\n```\n{\n  \"poll_id\": \"poll_123\",\n  \"user_id\": \"user_456\",\n  \"source_type\": \"notion\",\n  \"source_ref\": \"database_tasks\",\n  \"condition\": \"take one task in Todo state and execute it\",\n  \"interval_seconds\": 600,\n  \"last_run_at\": \"2026-06-19T01:00:00Z\",\n  \"next_run_at\": \"2026-06-19T01:10:00Z\",\n  \"last_seen_cursor\": \"cursor_or_timestamp\",\n  \"last_result_hash\": \"b64e8a...\",\n  \"failure_count\": 0,\n  \"status\": \"active\"\n}\n```\n\nThe exact schema depends on the source, but most systems need these concepts.\n\n### Poll Definition\n\nThis describes what the agent is watching and why.\n\n```\npoll_id\nuser_id\nworkspace_id\nsource_type\nsource_ref\ncondition_text\npriority\nstatus\n```\n\nFor example:\n\n```\nsource_type: notion\nsource_ref: Tasks database\ncondition_text: Find one Todo task, claim it, execute it, mark it Complete.\n```\n\n### Schedule\n\nThis describes when the agent should run.\n\n```\ninterval_seconds\ncron_expression\ntimezone\nlast_run_at\nnext_run_at\njitter\n```\n\nFor a Hermes agent that checks Notion every 10 minutes:\n\n```\ninterval_seconds: 600\ntimezone: Australia/Melbourne\n```\n\n### Cursor or Snapshot\n\nThis helps the agent avoid reprocessing the same data.\n\nDepending on the source, this may be:\n\n```\nlast_seen_id\nlast_seen_timestamp\napi_cursor\netag\nversion\ncontent_hash\n```\n\nFor a Notion task queue, the cursor may be less important than task status and claim fields. For Gmail, GitHub, or a sync API, the cursor is usually critical.\n\n### Claim or Lease\n\nThis prevents two workers from taking the same job.\n\n```\nclaimed_by\nclaimed_at\nclaim_expires_at\nrun_id\n```\n\nFor example, a Notion task can be changed from:\n\n```\nStatus: Todo\n```\n\nto:\n\n```\nStatus: InProgress\nClaimedBy: hermes\nClaimedAt: 2026-06-19T01:00:00Z\nClaimExpiresAt: 2026-06-19T01:30:00Z\nRunId: run_789\n```\n\nThis is the difference between “I hope only one worker picks it” and “the system has a claim protocol.”\n\n### Execution Record\n\nThis records what happened during a run.\n\n```\nrun_id\npoll_id\nsource_object_id\nstarted_at\nfinished_at\nstatus\nitems_checked\nitems_changed\ndecision_summary\nerror\n```\n\nThe execution record should live in the assistant backend, not only in Notion or another external tool. Notion is good for human visibility. It is not ideal as your only execution log.\n\n### Dedupe Record\n\nThis prevents duplicate notifications or repeated actions.\n\n```\ndedupe_key\npoll_id\nsource_object_id\ncondition_version\naction_type\ndelivered_at\n```\n\nFor example:\n\n```\nuser_456:poll_123:notion_page_999:execute:v1\n```\n\nIf the same action is attempted again, the system can suppress it.\n\n## Method 1: Scheduled Polling Worker\n\nThis is the simplest reliable pattern.\n\nA scheduler wakes up every fixed interval and calls a worker. The worker reads the source, updates state, and triggers an assistant action if required.\n\n``` php\nscheduler\n  -> worker\n  -> source API\n  -> database\n  -> assistant action\n```\n\n### How It Runs\n\nThe scheduler is responsible for time. It might be cron, a cloud scheduler, a Kubernetes CronJob, or a small internal scheduler.\n\nEvery interval, it starts a worker run. The worker loads its configuration, queries the target source, compares the result with stored state, and acts if needed.\n\nFor a simple assistant, this is often enough. A single scheduler and a lightweight worker process can handle dozens of daily checks without requiring queues, leases, or distributed coordination.\n\n### State Model\n\nThe scheduler stores very little. Usually it only knows when to trigger a job.\n\nThe application database stores the important state:\n\n```\npoll definition\nschedule\ncursor or snapshot\nlast run time\nfailure count\nstatus\n```\n\nThe worker should be stateless. It can hold temporary data while running, but the durable truth belongs in the database.\n\n### Example Flow\n\n```\nEvery 10 minutes:\n  trigger Hermes polling worker\n\nWorker:\n  load active poll configuration\n  query source\n  compare with previous state\n  run deterministic checks\n  call LLM only if needed\n  update state\n  emit assistant event\n```\n\n### Best Fit\n\nUse scheduled polling workers for:\n\n- Daily summaries.\n- Hourly checks.\n- Small internal automations.\n- Simple “watch this” tasks.\n- Low to medium volume assistant jobs.\n\n### Weaknesses\n\nScheduled polling is easy to understand, but it can become fragile at scale. If many polls run at the same time, you may overload your workers or hit provider rate limits. Retries can also become messy if the scheduler directly starts the work.\n\n## Method 2: Queue-Based Polling Workers\n\nQueue-based polling is usually the best default for production AI assistants.\n\nThe scheduler does not execute the poll directly. It puts a job on a queue. Worker processes consume jobs from the queue.\n\n``` php\nscheduler\n  -> queue\n  -> worker pool\n  -> source API\n  -> state store\n  -> assistant action\n```\n\n### How It Runs\n\nA scheduler scans for due polls and enqueues jobs. Workers pull jobs when they have capacity.\n\nThis gives you backpressure. If the system is busy, jobs wait in the queue instead of overwhelming the source API or the LLM provider.\n\n### State Model\n\nThe database stores the poll state:\n\n```\npoll_id\nuser_id\nsource_ref\ncondition_text\nnext_run_at\ncursor\nstatus\nfailure_count\n```\n\nThe queue message should stay small:\n\n```\n{\n  \"poll_id\": \"poll_123\",\n  \"scheduled_for\": \"2026-06-19T01:10:00Z\",\n  \"attempt\": 1\n}\n```\n\nThe worker loads the full state from the database when it starts.\n\n### Example Flow\n\n```\nEvery minute:\n  scheduler finds polls where next_run_at <= now\n  scheduler enqueues jobs\n\nWorkers:\n  pull jobs from queue\n  lock or lease the poll\n  query the source\n  update state\n  emit assistant action if needed\n  set next_run_at\n```\n\n### Best Fit\n\nUse queue-based polling for:\n\n- Multi-user AI assistants.\n- Many simultaneous polls.\n- Integrations with rate limits.\n- Retriable background work.\n- Jobs that may take different amounts of time.\n- SaaS products where reliability matters.\n\n### Weaknesses\n\nQueues add infrastructure. You need dead letter handling, idempotency, visibility timeouts, and retry policies. This is worth it for production systems, but probably excessive for a small prototype.\n\n## Method 3: External Tool as a Task Queue\n\nThis is the pattern in the Notion plus Hermes example.\n\nThe external tool is not just a data source. It becomes the human-facing task queue. The agent periodically checks the tool, claims one task, executes it, and updates the task status.\n\n``` php\nscheduler\n  -> Hermes worker\n  -> Notion database\n  -> claim one task\n  -> execute task\n  -> update Notion status\n```\n\n### How It Runs\n\nEvery 10 minutes, Hermes queries the Notion database for one task in `Todo`\n\nstate. It chooses the next task, usually by priority and creation time. Then it claims the task by setting it to `InProgress`\n\n.\n\nAfter that, Hermes executes the task. If execution succeeds, it marks the task as `Complete`\n\n. If execution fails, it marks the task as `Failed`\n\nor returns it to `Todo`\n\nwith a retry count.\n\n### State Model\n\nNotion stores the human-facing task state:\n\n```\nTitle\nDescription\nStatus: Todo | InProgress | Complete | Failed\nPriority\nCreatedAt\nClaimedBy\nClaimedAt\nClaimExpiresAt\nRunId\nRetryCount\nLastError\nCompletedAt\n```\n\nHermes backend stores the operational execution state:\n\n```\nrun_id\nnotion_page_id\nstarted_at\nfinished_at\nexecution_status\ntool_calls\nLLM trace\nerror details\nidempotency_key\n```\n\nThis split matters. Notion is excellent for visibility and manual editing. Hermes backend is better for logs, retries, dedupe, and audit history.\n\n### Example Flow\n\n```\nEvery 10 minutes:\n  Hermes wakes up\n\nHermes:\n  query Notion for one task where Status = Todo\n  sort by Priority, CreatedAt\n  update selected task to InProgress\n  set ClaimedBy, ClaimedAt, ClaimExpiresAt, RunId\n  execute the task\n  write execution log\n  set task to Complete or Failed\n```\n\n### Best Fit\n\nUse this pattern when:\n\n- Humans already manage work in Notion, Jira, Linear, Trello, or another tool.\n- You want the assistant to process visible tasks.\n- The task board is the user interface.\n- You need a simple human-in-the-loop automation model.\n\n### Weaknesses\n\nExternal tools are rarely perfect queues. Atomic claims may be limited. Query consistency may lag. Rate limits may apply. If the agent can run in multiple instances, you need a careful claim or lease strategy.\n\nThe practical recommendation is to use Notion as the human-facing task inbox while keeping all execution logs, retry records, traces, and idempotency keys in Hermes. Notion gives users visibility; Hermes keeps the system reliable. For the dispatcher and concurrency mechanics that sit behind this pattern in Hermes, see [Kanban in Hermes Agent for Self Hosted LLM Workflows](https://www.glukhov.org/ai-systems/hermes/kanban-in-hermes/).\n\n## Method 4: Long-Running Worker Loop\n\nA long-running loop is the simplest implementation.\n\n```\nwhile True:\n    due_polls = db.find_due_polls()\n    for poll in due_polls:\n        run_poll(poll)\n    sleep(30)\n```\n\nThis pattern combines scheduling and execution in one service, which makes it the simplest possible starting point for background agent work.\n\n### How It Runs\n\nThe worker process runs continuously. Every few seconds or minutes, it checks the database for due polls and executes them. It is easy to build, easy to reason about, and fast to iterate on during development.\n\n### State Model\n\nThe database still stores durable state:\n\n```\npoll configuration\nnext_run_at\ncursor\nlast result\nfailure count\nstatus\n```\n\nThe process memory should only contain temporary state:\n\n```\ncurrent batch\nshort-lived cache\nin-flight run\n```\n\nNever store important progress only in memory. If the process crashes, any state that was not written to durable storage is gone, and the next run will have no way to know where things left off.\n\n### Best Fit\n\nUse long-running loops for:\n\n- Prototypes.\n- Local development.\n- Internal tools.\n- Single-tenant systems.\n- Low-volume agents.\n\n### Weaknesses\n\nThis pattern becomes risky with multiple replicas. Without leases, two workers may run the same poll. It also lacks the operational features of a real queue or workflow engine.\n\nA long-running loop is not wrong as a starting point, but it is not a distributed scheduler and should not be treated as one. As soon as you need multiple replicas or stronger reliability guarantees, you will need to move to one of the more structured patterns above.\n\n## Method 5: Webhook-First With Polling Fallback\n\nIf the source supports webhooks, use them. Polling should often be the backup, not the primary mechanism.\n\n``` php\nexternal system\n  -> webhook endpoint\n  -> event store\n  -> assistant action\n\nreconciliation poll\n  -> source API\n  -> compare with event store\n  -> repair missed events\n```\n\n### How It Runs\n\nThe external system sends events to your webhook endpoint when something changes. Your system stores the event and processes it asynchronously.\n\nA slower reconciliation poll runs every few hours or once per day. It checks whether any events were missed.\n\n### State Model\n\nThe event store records incoming webhooks:\n\n```\nevent_id\nsource_type\nsource_object_id\nevent_type\nreceived_at\npayload_hash\nprocessed_at\nsignature_valid\n```\n\nThe reconciliation poll stores:\n\n```\nlast_reconciliation_at\nlast_seen_cursor\nlast_seen_version\n```\n\nThe source object table stores the latest known state:\n\n```\nexternal_id\ncurrent_status\nexternal_updated_at\nlast_processed_event_id\n```\n\n### Best Fit\n\nUse webhook-first architecture for:\n\n- GitHub events.\n- Stripe events.\n- Slack events.\n- CRM updates.\n- Deployment notifications.\n- Ticketing systems.\n\n### Weaknesses\n\nWebhooks require a public endpoint, signature validation, replay protection, and event dedupe. Some providers also send incomplete events, so you may still need to fetch the full object.\n\nEven so, if good webhooks exist, polling every minute is usually wasteful.\n\n## Method 6: Provider-Side Background Job Polling\n\nSometimes the thing being polled is the AI job itself.\n\nThe application starts a long-running provider job, stores the job ID, and checks later whether it has completed.\n\n``` php\napp\n  -> start AI background job\n  -> store provider job id\n  -> poll status\n  -> fetch result\n  -> notify user\n```\n\n### How It Runs\n\nThe assistant starts a job with the provider. The provider returns an ID. Your backend stores that ID and checks its status until the job succeeds, fails, expires, or times out.\n\n### State Model\n\nYour backend stores:\n\n```\nassistant_task_id\nprovider_job_id\nuser_id\nstatus\ncreated_at\nlast_checked_at\nexpires_at\nresult_ref\n```\n\nThe provider stores the temporary job state and output.\n\nIf the output matters, copy it into your own durable storage as soon as the job completes. Provider-side result storage has short retention windows and is not a substitute for a proper archive in your own system.\n\n### Best Fit\n\nUse provider-side background job polling for:\n\n- Long AI research tasks.\n- Large document processing.\n- Codebase analysis.\n- Report generation.\n- Data extraction jobs.\n- Tasks that exceed normal HTTP request timeouts.\n\n### Weaknesses\n\nThis pattern solves one problem: waiting for a long provider job. It does not replace your workflow engine, scheduler, queue, or business state store.\n\n## Method 7: Durable Workflow Engine\n\nA durable workflow engine manages long-running execution, timers, retries, and recovery. Temporal is the most common choice for Go and Python-based assistant backends; for a full implementation guide see [Implementing Workflow Applications with Temporal in Go](https://www.glukhov.org/app-architecture/integration-patterns/workflow-applications-temporal-in-go/).\n\nInstead of manually wiring every wait and retry, you model the process as a workflow.\n\n``` php\nworkflow engine\n  -> activity: check source\n  -> timer: wait\n  -> activity: evaluate result\n  -> activity: notify user\n```\n\n### How It Runs\n\nThe workflow starts once and then controls its own waiting. It can sleep for minutes, days, or weeks. If the worker process crashes, the workflow engine can resume from the recorded state.\n\n### State Model\n\nThe workflow engine stores:\n\n```\nworkflow_id\nexecution history\ntimer state\nactivity attempts\nretry policy\ncurrent workflow state\n```\n\nYour application database stores:\n\n```\nuser-facing poll definition\nauthorization references\nbusiness records\nnotification records\n```\n\nThe workflow engine owns process state — execution history, timers, retries, and activity attempts. Your database owns business state — user configurations, authorization records, notifications, and audit logs. Keeping these separate prevents each layer from becoming a confused hybrid of both.\n\n### Best Fit\n\nUse durable workflows for:\n\n- Multi-step business processes.\n- Long-running automations.\n- Human approval flows.\n- Reliable retries.\n- Auditable background work.\n- Processes that must resume after failure.\n\n### Weaknesses\n\nWorkflow engines add concepts and infrastructure. They are excellent when the process is important, but heavy for simple hourly checks.\n\n## Method 8: Persistent Agent Runtime\n\nSome agent frameworks can persist agent state, checkpoint execution, and resume later.\n\nThis is useful when the agent itself has a multi-step reasoning process.\n\n``` php\nscheduler or workflow\n  -> agent runtime\n  -> load checkpoint\n  -> call tools\n  -> save checkpoint\n  -> resume later\n```\n\n### How It Runs\n\nAn external scheduler or workflow starts the agent. The agent runtime loads previous state, runs the next step, calls tools if needed, and writes a checkpoint.\n\nThe agent runtime should not be your only scheduler. It is better treated as the reasoning layer inside a larger backend architecture.\n\n### State Model\n\nAgent checkpoint storage contains:\n\n```\ncurrent node\nmessages\ntool outputs\nintermediate reasoning state\npending action\n```\n\nLong-term memory contains:\n\n```\nstable user preferences\nfacts\nproject context\nsource references\n```\n\nOperational state still belongs elsewhere:\n\n```\npoll schedule\ncursor\nstatus\nretry count\ndedupe records\n```\n\nA useful rule: memory is not a cursor, and a checkpoint is not a queue. Agent memory stores what the model knows; operational state tracks where the process is and what it has done. Conflating the two leads to subtle bugs that only appear under concurrency or after a restart. The full design space for working memory, durable state, and retrieval layers is covered in [Memory Systems in AI Assistants](https://www.glukhov.org/ai-systems/memory/memory-systems-in-ai-assistants/).\n\n### Best Fit\n\nUse persistent agent runtime for:\n\n- Multi-step research.\n- Agents that pause and resume.\n- Human-in-the-loop work.\n- Tool-heavy reasoning.\n- Tasks where context accumulates over time.\n\n### Weaknesses\n\nAgent persistence is not the same as operational reliability. You still need scheduling, locking, retries, rate limits, and audit logs.\n\n## Method 9: Database Sync Plus Change Evaluation\n\nIn this pattern, polling is used to sync external data into your own database. The assistant then reacts to local database changes rather than querying external APIs directly on every evaluation cycle.\n\n``` php\nsync poller\n  -> external API\n  -> local database\n  -> change evaluator\n  -> assistant action\n```\n\nThis separates data synchronization from assistant intelligence. The sync worker is responsible for keeping local records current; the evaluator is responsible for deciding what to do about changes. Each layer can be tested, monitored, and scaled independently.\n\n### How It Runs\n\nThe sync worker periodically fetches external changes and writes normalized records into your database. A second worker or change stream detects updated rows and decides whether the assistant should act.\n\n### State Model\n\nThe sync table stores:\n\n```\nexternal_id\nsource_type\nraw_payload\nnormalized_fields\nexternal_updated_at\nsynced_at\nversion\ncontent_hash\n```\n\nThe sync state stores:\n\n```\nsource_cursor\nlast_sync_at\nrate_limit_status\nfailure_count\n```\n\nThe assistant evaluation table stores:\n\n```\nobject_id\nevaluation_status\nlast_evaluated_hash\ndecision\nnotification_id\n```\n\n### Best Fit\n\nUse this pattern for:\n\n- CRM sync.\n- Ticketing systems.\n- Accounting documents.\n- Product inventory.\n- Compliance review.\n- Search indexing.\n- Internal dashboards.\n\n### Weaknesses\n\nSyncing everything can be expensive and unnecessary. It may also create privacy and retention obligations. Use this pattern when local data has value beyond a single assistant action.\n\n## Method 10: Adaptive Polling\n\nAdaptive polling changes frequency based on state, urgency, or recent activity.\n\n```\nactive object: poll every 1 minute\nwaiting object: poll every 1 hour\nstale object: poll once per day\ncompleted object: stop polling\n```\n\n### How It Runs\n\nAfter each run, the worker decides when the next run should happen.\n\nIf the object changed recently, poll sooner. If nothing has changed for a long time, slow down. If the task is complete, stop.\n\n### State Model\n\nThe poll state includes:\n\n```\ncurrent_interval\nminimum_interval\nmaximum_interval\nbackoff_policy\nlast_activity_at\npriority\nstop_condition\n```\n\nThe source snapshot includes:\n\n```\nstatus\nupdated_at\nactivity_level\nexpected_next_change\n```\n\n### Best Fit\n\nUse adaptive polling for:\n\n- Deployment status.\n- Delivery tracking.\n- Calendar slot availability.\n- Price monitoring.\n- Build jobs.\n- Long-running provider tasks.\n- Any source with bursty updates.\n\n### Weaknesses\n\nAdaptive polling can be harder to reason about. If a task must run at a strict time, keep it strict. Do not make compliance jobs clever.\n\n## Method 11: Semantic Polling With an LLM Evaluator\n\nSemantic polling is used when the condition is fuzzy.\n\nCode can answer:\n\n```\nIs status equal to Complete?\nIs price below 100?\nIs there a new message?\n```\n\nAn LLM can help answer:\n\n```\nDoes this email sound urgent?\nIs this customer likely unhappy?\nIs this research paper relevant?\nDoes this change require my attention?\n```\n\n### How It Runs\n\nThe worker first applies cheap deterministic filters. Only candidate items go to the LLM.\n\n```\nnew item?\nmatches source filters?\nnot already processed?\nnot obviously irrelevant?\n```\n\nThen the LLM evaluates the smaller candidate set and returns structured output.\n\n```\n{\n  \"should_notify\": true,\n  \"urgency\": \"high\",\n  \"reason\": \"The customer reports a production outage.\"\n}\n```\n\n### State Model\n\nThe poll definition stores:\n\n```\nsemantic_condition\nexamples\nnegative_examples\nuser_preference_summary\nmodel_config\n```\n\nThe evaluation log stores:\n\n```\ninput_reference\nmodel\nprompt_version\nstructured_output\nconfidence\ncost\nlatency\n```\n\nThe poll state stores:\n\n```\nlast_seen_ids\nlast_evaluated_hashes\nlast_decision\nlast_decision_reason\n```\n\n### Best Fit\n\nUse semantic polling for:\n\n- Important email detection.\n- Customer sentiment monitoring.\n- Research alerts.\n- Sales opportunity detection.\n- Security triage.\n- Executive briefings.\n\n### Weaknesses\n\nLLM calls cost money and add latency. They can also be inconsistent if prompts and schemas are loose. Use deterministic filters first. Ask the model only when judgment is actually needed.\n\n## Decision Table: Choosing a Polling Agent Method\n\n| Method | Best Application | Pros | Cons |\n|---|---|---|---|\n| Scheduled polling worker | Simple recurring assistant tasks | Easy to build, easy to debug, minimal infrastructure | Limited scaling, basic retries, can overload workers if many polls fire together |\n| Queue-based polling workers | Production SaaS assistants with many users | Scalable, resilient, supports retries and backpressure | Requires queue infrastructure, idempotency, dead letter handling |\n| External tool as task queue | Notion, Jira, Linear, Trello based task execution | Human-friendly, easy to inspect, works with existing workflows | External tools are not perfect queues, atomic claim may be difficult |\n| Long-running worker loop | Prototypes and internal tools | Very simple, fast to implement, few moving parts | Weak reliability, poor multi-replica behavior, limited operational control |\n| Webhook-first with polling fallback | Event-driven integrations | Fast reaction, fewer API calls, reconciliation catches missed events | Needs public endpoint, event validation, dedupe, provider webhook support |\n| Provider-side background job polling | Long-running AI provider jobs | Handles slow AI tasks, simple status model, good for async UX | Only manages provider job status, not full business workflow |\n| Durable workflow engine | Long-running multi-step processes | Strong retries, timers, audit history, recovery after crashes | More infrastructure and concepts, heavy for simple polling |\n| Persistent agent runtime | Multi-step reasoning agents | Preserves agent context, supports pause and resume, good for tool-heavy tasks | Not a scheduler or queue replacement, still needs operational backend |\n| Database sync plus change evaluation | Systems where external data has local value | Clean separation, local reporting, fewer repeated external calls | More storage, more sync complexity, possible privacy and retention concerns |\n| Adaptive polling | Bursty sources or variable urgency tasks | Reduces cost, respects rate limits, reacts faster when activity is high | Harder to reason about, not ideal for strict schedules |\n| Semantic polling with LLM evaluator | Fuzzy conditions requiring judgment | Handles natural language intent, useful summaries, flexible decisions | Cost, latency, prompt quality risk, should not replace simple code checks |\n\n## Recommended Default Architecture\n\nFor most production AI assistants, start with this:\n\n``` php\npolls table\n  -> scheduler\n  -> queue\n  -> stateless workers\n  -> deterministic filters\n  -> optional LLM evaluator\n  -> notification or assistant action\n```\n\nA minimal schema:\n\n```\nCREATE TABLE polls (\n    id TEXT PRIMARY KEY,\n    user_id TEXT NOT NULL,\n    source_type TEXT NOT NULL,\n    source_ref TEXT NOT NULL,\n    condition_text TEXT NOT NULL,\n    schedule_type TEXT NOT NULL,\n    interval_seconds INTEGER,\n    timezone TEXT,\n    next_run_at TIMESTAMP NOT NULL,\n    last_run_at TIMESTAMP,\n    cursor_value TEXT,\n    last_hash TEXT,\n    status TEXT NOT NULL,\n    failure_count INTEGER NOT NULL DEFAULT 0,\n    last_error TEXT,\n    created_at TIMESTAMP NOT NULL,\n    updated_at TIMESTAMP NOT NULL\n);\n\nCREATE TABLE poll_runs (\n    id TEXT PRIMARY KEY,\n    poll_id TEXT NOT NULL,\n    started_at TIMESTAMP NOT NULL,\n    finished_at TIMESTAMP,\n    status TEXT NOT NULL,\n    items_checked INTEGER,\n    items_matched INTEGER,\n    decision_summary TEXT,\n    error TEXT\n);\n\nCREATE TABLE notifications (\n    id TEXT PRIMARY KEY,\n    poll_id TEXT NOT NULL,\n    user_id TEXT NOT NULL,\n    dedupe_key TEXT NOT NULL,\n    title TEXT NOT NULL,\n    body TEXT NOT NULL,\n    delivered_at TIMESTAMP,\n    UNIQUE (dedupe_key)\n);\n```\n\nThis gives you a clean separation:\n\n```\nscheduler owns time\nqueue owns buffering\nworker owns execution\ndatabase owns state\nLLM owns semantic judgment\nassistant owns user interaction\n```\n\nThat separation is the heart of a reliable polling agent.\n\n## Example: Hermes Agent Processing Notion Tasks\n\nNow let us apply the architecture to a concrete case.\n\nAssume a Notion database contains tasks. Hermes should run every 10 minutes, take one task in `Todo`\n\nstate, set it to `InProgress`\n\n, execute it, and then mark it `Complete`\n\n.\n\nThis is best described as:\n\n```\nexternal tool as task queue\n+\nscheduled polling worker\n+\nclaim or lease based execution\n```\n\nFor a production version, it becomes:\n\n```\nqueue-based polling with Notion as the human-facing task inbox\n```\n\n### Notion Task Properties\n\nThe Notion database should contain fields like:\n\n```\nName\nStatus: Todo | InProgress | Complete | Failed\nPriority\nCreatedAt\nClaimedBy\nClaimedAt\nClaimExpiresAt\nRunId\nRetryCount\nLastError\nCompletedAt\n```\n\nThe important fields are `ClaimedAt`\n\n, `ClaimExpiresAt`\n\n, and `RunId`\n\n. They make the task claim visible and recoverable.\n\n### Hermes Execution State\n\nHermes should also keep its own execution record:\n\n```\nrun_id\nnotion_page_id\nstarted_at\nfinished_at\nstatus\ninput_snapshot\ntool_calls\nresult_summary\nerror\nidempotency_key\n```\n\nThis protects you if Notion is edited manually, if an API call fails, or if you need to audit what Hermes actually did.\n\n### Execution Flow\n\n```\nEvery 10 minutes:\n  Hermes scheduler creates a run\n\nHermes worker:\n  finds one Notion task where Status = Todo\n  sorts by Priority and CreatedAt\n  claims the task by setting Status = InProgress\n  writes ClaimedBy, ClaimedAt, ClaimExpiresAt, and RunId\n  executes the task\n  writes execution logs to Hermes backend\n  sets Notion Status = Complete on success\n  sets Notion Status = Failed on failure\n```\n\nIf Hermes crashes after claiming a task, the lease can expire:\n\n```\nStatus = InProgress\nClaimExpiresAt < now\n```\n\nA future run can then recover the task or mark it as failed.\n\n### Failure Handling\n\nOn success:\n\n```\nStatus = Complete\nCompletedAt = now\nLastError = empty\n```\n\nOn recoverable failure:\n\n```\nStatus = Todo\nRetryCount = RetryCount + 1\nLastError = short error message\n```\n\nOn non-recoverable failure:\n\n```\nStatus = Failed\nLastError = clear explanation\n```\n\nFor safety, Hermes should also use an idempotency key:\n\n```\nnotion_page_id + task_version + action_type\n```\n\nThis prevents the same task from being executed twice if a retry happens at the wrong time.\n\n### Why This Is Not Just Polling\n\nThe polling part is only the wake-up mechanism. The real architecture is task claiming and reliable execution.\n\nA naive implementation says:\n\n```\nEvery 10 minutes, find a Todo task and do it.\n```\n\nA reliable implementation says:\n\n```\nEvery 10 minutes, claim exactly one eligible task, record the run, execute idempotently, and move the task to a terminal state.\n```\n\nThat is the difference between a demo and an agent you can trust.\n\n## Common Polling Agent Mistakes\n\n### Mistake 1: No Claim Protocol\n\nIf two workers can see the same task, they can both execute it.\n\nUse:\n\n```\nClaimedBy\nClaimedAt\nClaimExpiresAt\nRunId\n```\n\nEven if you currently run one worker, design as if a second worker might appear later.\n\n### Mistake 2: No Dedupe Key\n\nEvery external action should have a dedupe key.\n\n```\nuser_id + poll_id + source_object_id + action_type + condition_version\n```\n\nThis prevents repeated notifications, repeated emails, repeated task execution, and repeated tool calls. The broader principles behind scoping, storing, and testing these keys apply equally here — see [Idempotency in Distributed Systems That Actually Works](https://www.glukhov.org/app-architecture/integration-patterns/idempotency-in-distributed-systems/).\n\n### Mistake 3: Calling the LLM Too Early\n\nDo not ask the model to do database filtering.\n\nBad:\n\n```\nSend all tasks to the LLM and ask which one is Todo.\n```\n\nBetter:\n\n```\nUse the Notion API filter to fetch Todo tasks.\nThen use the LLM only if task interpretation is needed.\n```\n\n### Mistake 4: Treating Notion as the Only Backend\n\nNotion is a good human interface. It is not a complete execution backend.\n\nKeep execution logs, retries, traces, and idempotency records in Hermes.\n\n### Mistake 5: Infinite Polling\n\nEvery poll should have a stop condition.\n\nExamples:\n\n```\nstop after success\nstop after date\nstop after max retries\nstop when user disables it\nstop after repeated authorization failure\n```\n\nA polling agent without a stop condition is a quiet cost leak.\n\n### Mistake 6: No Observability\n\nYou should be able to answer:\n\n```\nWhat did the agent run?\nWhy did it run?\nWhat did it read?\nWhat did it change?\nWhy did it fail?\nDid it notify the user?\nDid it run twice?\n```\n\nIf you cannot answer those questions, the system is not ready for important work.\n\n## Observability Checklist\n\nTrack metrics such as:\n\n```\npolls_due\npolls_started\npolls_succeeded\npolls_failed\ntasks_claimed\ntasks_completed\ntasks_failed\nclaim_expired_count\nduplicate_suppressed_count\nllm_calls\nllm_cost\nrate_limit_count\naverage_run_duration\n```\n\nLog fields such as:\n\n```\npoll_id\nrun_id\nsource_type\nsource_object_id\nclaim_id\ncursor_before\ncursor_after\ndecision\ndedupe_key\nerror\n```\n\nBuild an admin view for:\n\n```\nactive polls\nstuck InProgress tasks\nrecent failures\nhigh retry tasks\ndead letter jobs\nexpensive LLM evaluations\ndisabled integrations\n```\n\nPolling agents run in the background, where failures are quiet and problems can compound before anyone notices. Background systems need visibility built in from the start, not added as an afterthought when something goes wrong. For the full observability stack for AI and LLM-backed systems — metrics, traces, structured logs, and SLOs — see [Observability for LLM Systems: Metrics, Traces, Logs, and Testing in Production](https://www.glukhov.org/observability/observability-for-llm-systems/).\n\n## Final Recommendation\n\nFor a serious AI assistant, start with queue-based polling workers and a durable state store. Add webhooks where providers support them. Use adaptive polling when rate limits matter. Use a durable workflow engine when the process is long-running and multi-step. Use persistent agent runtime when the agent needs to reason over time.\n\nFor the Hermes and Notion example, the right architecture is:\n\n```\nNotion as the human-facing task inbox\nHermes scheduler every 10 minutes\nHermes worker with claim or lease logic\nHermes backend for execution logs and idempotency\nNotion status updates for visibility\n```\n\nThe polling interval is not the hard part. The hard part is making sure the agent claims one task, runs it once, records what happened, and leaves the system in a state humans can understand.\n\nThat is what turns a polling script into a reliable AI assistant — not the interval, not the model, but the discipline around claiming work, recording it, and leaving the system in a state that humans and future runs can both understand.", "url": "https://wpnews.pro/news/polling-agents-in-ai-assistants-11-implementation-patterns", "canonical_source": "https://www.glukhov.org/ai-systems/architecture/polling-agents-ai-assistants-implementation-patterns/", "published_at": "2026-06-24 10:19:38+00:00", "updated_at": "2026-06-24 11:52:52.483315+00:00", "lang": "en", "topics": ["ai-agents", "ai-tools", "ai-infrastructure", "large-language-models", "developer-tools"], "entities": ["Notion", "GitHub", "Hermes"], "alternates": {"html": "https://wpnews.pro/news/polling-agents-in-ai-assistants-11-implementation-patterns", "markdown": "https://wpnews.pro/news/polling-agents-in-ai-assistants-11-implementation-patterns.md", "text": "https://wpnews.pro/news/polling-agents-in-ai-assistants-11-implementation-patterns.txt", "jsonld": "https://wpnews.pro/news/polling-agents-in-ai-assistants-11-implementation-patterns.jsonld"}}