{"slug": "anthropic-batch-api-for-asynchronous-multi-tenant-ai-processing-cutting-claude", "title": "Anthropic Batch API for Asynchronous Multi-Tenant AI Processing: Cutting Claude Costs by 50% Without Sacrificing Feature Responsiveness", "summary": "Anthropic's Batch API can cut Claude AI costs by 50% for asynchronous workloads, according to a developer at CitizenApp. The team reduced their monthly Claude bill from approximately $8,000 to $4,000 by offloading 65% of their document summarization, compliance classification, and policy analysis requests to batch processing. The architecture separates synchronous user-facing features from background jobs, enabling the cost savings without sacrificing responsiveness for requests that need immediate responses.", "body_md": "I'm going to be blunt: if you're running a SaaS that uses Claude, and you haven't implemented the Batch API yet, you're hemorrhaging money. Not metaphorically—literally 50% off the per-token cost for any request that doesn't need a response in the next 60 seconds.\n\nAt CitizenApp, we process thousands of document summaries, compliance classifications, and policy analyses daily. Our first instinct was synchronous: user uploads document → Claude responds in real-time → dashboard updates. It felt responsive. It also felt like burning cash.\n\nThen we realized: 80% of those requests don't actually need synchronous responses. A user uploads a document for summarization? They're fine waiting 5–30 minutes. A tenant wants to bulk-classify 500 policies? That's explicitly an async job. The Batch API is built for this, and it's absurdly underutilized.\n\nHere's how I wired it into CitizenApp—and why you should too.\n\nThe 50% cost reduction is the headline, but the real win is architectural. Batching forces you to separate concerns:\n\nThis separation is *healthy*. You stop cramming everything through Claude's sync API and actually think about what needs to be urgent.\n\nIn CitizenApp's case, we reduced our monthly Claude bill from ~$8k to ~$4k while actually *increasing* feature coverage. The key was offloading 65% of our workload to batches.\n\nHere's the flow:\n\n```\nUser Request\n    ↓\nFastAPI Endpoint (Validate, Enqueue)\n    ↓\nPostgreSQL Queue Table\n    ↓\nBatch Processor (reads queue, submits to Anthropic)\n    ↓\nAnthropic Batch Job (runs in background)\n    ↓\nWebhook/Polling Handler (gets results)\n    ↓\nPostgreSQL LISTEN/NOTIFY\n    ↓\nWebSocket → React 19 Dashboard (real-time update)\n```\n\nLet's build it.\n\n``` python\n# models.py\nfrom sqlalchemy import Column, String, Integer, Text, DateTime, Enum, ForeignKey\nfrom sqlalchemy.orm import declarative_base\nfrom datetime import datetime\nimport enum\n\nBase = declarative_base()\n\nclass AIJobStatus(str, enum.Enum):\n    QUEUED = \"queued\"\n    SUBMITTED = \"submitted\"\n    PROCESSING = \"processing\"\n    COMPLETED = \"completed\"\n    FAILED = \"failed\"\n\nclass AIJob(Base):\n    __tablename__ = \"ai_jobs\"\n\n    id = Column(String, primary_key=True)\n    tenant_id = Column(String, ForeignKey(\"tenants.id\"), nullable=False, index=True)\n    user_id = Column(String, nullable=False)\n    job_type = Column(String, nullable=False)  # \"summarize\", \"classify\", etc.\n\n    status = Column(Enum(AIJobStatus), default=AIJobStatus.QUEUED, index=True)\n\n    input_data = Column(Text, nullable=False)  # JSON stringified\n    result = Column(Text, nullable=True)  # Result from Claude\n\n    batch_id = Column(String, nullable=True, index=True)  # Anthropic batch ID\n    request_id = Column(String, nullable=True)  # Within batch\n\n    created_at = Column(DateTime, default=datetime.utcnow, index=True)\n    completed_at = Column(DateTime, nullable=True)\n\n    error_message = Column(Text, nullable=True)\npython\n# api/ai.py\nfrom fastapi import APIRouter, Depends, HTTPException\nfrom pydantic import BaseModel\nimport uuid\nimport json\nfrom sqlalchemy.orm import Session\nfrom database import get_db\nfrom models import AIJob, AIJobStatus\n\nrouter = APIRouter()\n\nclass SummarizeRequest(BaseModel):\n    document_text: str\n    max_length: int = 500\n\n@router.post(\"/api/ai/summarize\")\nasync def queue_summarize(\n    request: SummarizeRequest,\n    db: Session = Depends(get_db),\n    tenant_id: str = Depends(get_tenant_id),\n    user_id: str = Depends(get_user_id),\n):\n    \"\"\"\n    Queue a document for summarization (async).\n    Returns immediately with job ID.\n    User receives result via WebSocket when batch completes.\n    \"\"\"\n\n    job_id = str(uuid.uuid4())\n\n    job = AIJob(\n        id=job_id,\n        tenant_id=tenant_id,\n        user_id=user_id,\n        job_type=\"summarize\",\n        input_data=json.dumps({\n            \"document_text\": request.document_text,\n            \"max_length\": request.max_length,\n        }),\n        status=AIJobStatus.QUEUED,\n    )\n\n    db.add(job)\n    db.commit()\n\n    return {\n        \"job_id\": job_id,\n        \"status\": \"queued\",\n        \"message\": \"Your request is queued. You'll receive results in 5-30 minutes.\",\n    }\n```\n\nI prefer a separate worker process (via Celery or APScheduler) that runs periodically. Here's APScheduler for simplicity:\n\n``` python\npython\n# workers/batch_processor.py\nfrom anthropic import Anthropic\nfrom sqlalchemy.orm import Session\nfrom database import SessionLocal\nfrom models import AIJob, AIJobStatus\nimport json\nfrom datetime import datetime\n\nclient = Anthropic()\n\ndef process_batch_jobs():\n    \"\"\"\n    Run every 5 minutes (via APScheduler).\n    Collects queued jobs and submits to Anthropic Batch API.\n    \"\"\"\n    db = SessionLocal()\n\n    # Get all queued jobs (batch size: 10k requests per batch max)\n    queued_jobs = db.query(AIJob).filter(\n        AIJob.status == AIJobStatus.QUEUED\n    ).limit(100).all()\n\n    if not queued_jobs:\n        db.close()\n        return\n\n    # Build batch request\n    requests = []\n    job_map = {}\n\n    for job in queued_jobs:\n        input_data = json.loads(job.input_data)\n\n        if job.job_type == \"summarize\":\n            message = f\"Summarize the following document in {input_data['max_length']} words:\\n\\n{input_data['document_text']}\"\n        elif job.job_type == \"classify\":\n            message = f\"Classify this text into one of: {input_data['categories']}\\n\\nText: {input_data['text']}\"\n        else:\n            continue\n\n        request_id = job.id\n        job_map[request_id] = job.id\n\n        requests.append({\n            \"custom_id\": request_id,\n            \"params\": {\n                \"model\": \"claude-3-5-sonnet-20241022\",\n                \"max_tokens\": 1024,\n                \"messages\": [{\"role\": \"user\", \"content\": message}],\n            },\n        })\n\n    # Submit batch to Anthropic\n    batch = client.beta.messages.batches.create(\n        requests=requests,\n        betas=[\"batch-2024-09-24\"],\n    )\n\n    # Mark all jobs as submitted\n    for job in queued_jobs:\n        job.status = AIJobStatus.SUBMITTED\n        job.batch_id = batch.id\n\n    db.commit()\n    print(f\"Submitted batch {batch.id} with {len(requests)} requests\")\n    db.close()\n\ndef poll_batch_results():\n    \"\"\"\n    Run every 30 seconds.\n    Checks submitted batches for completion, stores results.\n    \"\"\"\n    db = SessionLocal()\n\n    # Get all submitted jobs\n    submitted_jobs = db.query(AIJob).filter(\n        AIJob.status == AIJobStatus.SUBMITTED\n    ).all()\n\n    batch_ids = set(job.batch_id for job in submitted_jobs)\n\n    for batch_id in batch_ids:\n        batch = client.beta.messages.batches.retrieve(batch_id, betas=[\"batch-2024-09-24\"])\n\n        if batch.processing_status == \"in_progress\":\n            continue\n\n        if batch.processing_status == \"expired\":\n            # Mark jobs as failed\n            for job in submitted_jobs:\n                if job.batch_id == batch_id:\n                    job.status = AIJobStatus.FAILED\n                    job.error_message = \"Batch expired\"\n            db.commit()\n            continue\n\n        # Batch complete — fetch results\n        results = client.beta.messages.batches.results(batch_id, betas=[\"batch-2024-09-24\"])\n\n        for result in results:\n            request_id = result.custom_id\n            job = db.query(AIJob).filter(AIJob.id == request_id).first()\n\n            if not job:\n                continue\n\n            if result.result.type == \"succeeded\":\n                job.status = AIJobStatus.COMPLETED\n                job.result = result.result.message.content[0].text\n            elif result.result.type == \"errored\":\n                job.status = AIJobStatus.FAILED\n                job.error_message = result.result.error.message\n\n            job.completed_at = datetime.utcnow()\n\n        db.commit()\n        # Notify connected clients (next step)\n        notify_clients_batch_complete(batch_id)\n\n    db.close()\n```\n\n", "url": "https://wpnews.pro/news/anthropic-batch-api-for-asynchronous-multi-tenant-ai-processing-cutting-claude", "canonical_source": "https://dev.to/uaslimcreate/anthropic-batch-api-for-asynchronous-multi-tenant-ai-processing-cutting-claude-costs-by-50-18gb", "published_at": "2026-06-06 08:47:58+00:00", "updated_at": "2026-06-06 09:11:35.841193+00:00", "lang": "en", "topics": ["large-language-models", "ai-products", "ai-tools", "ai-infrastructure", "ai-startups"], "entities": ["Anthropic", "Claude", "CitizenApp", "Batch API"], "alternates": {"html": "https://wpnews.pro/news/anthropic-batch-api-for-asynchronous-multi-tenant-ai-processing-cutting-claude", "markdown": "https://wpnews.pro/news/anthropic-batch-api-for-asynchronous-multi-tenant-ai-processing-cutting-claude.md", "text": "https://wpnews.pro/news/anthropic-batch-api-for-asynchronous-multi-tenant-ai-processing-cutting-claude.txt", "jsonld": "https://wpnews.pro/news/anthropic-batch-api-for-asynchronous-multi-tenant-ai-processing-cutting-claude.jsonld"}}