# Spidra API tutorial: complete guide to web scraping with the Spidra API

> Source: <https://spidra.io/blog/spidra-api-tutorial>
> Published: 2026-06-09 00:00:00+00:00

Getting data from websites programmatically has always involved more work than it should. You write selectors, they break when the site updates. You try a headless browser, anti-bot protection blocks you. You get the data, but it is raw HTML and you still have to parse it into something useful.

The [Spidra API](https://spidra.io/products/spidra-api) is designed to solve all three of those problems in one place. You send a URL, describe what you want, and get back structured data. The browser rendering, CAPTCHA solving, proxy rotation, and AI extraction all happen on Spidra's side.

This guide walks through the entire API from authentication to crawling. By the end you will know how every endpoint works, what the response structure looks like, and how to build a real scraping pipeline around it.

## Before you start

You need a Spidra account and an API key.

Sign up at [spidra.io](https://spidra.io/). The free plan includes 300 credits with no credit card required. Once you are in, go to **app.spidra.io → Settings → API Keys** and create a key.

Keep it somewhere safe. Every request you make to the API includes this key in the header.

## How the API works

The Spidra API is a REST API with one base URL:

```
https://api.spidra.io/api
```

Every request is authenticated by including your API key in the `x-api-key`

header. There are no bearer tokens, no OAuth flows, just a header on every request.

```
curl -X POST https://api.spidra.io/api/scrape \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"urls": [{"url": "https://example.com"}]}'
```

One important thing to understand before you make your first request: **Spidra jobs are asynchronous**. When you submit a scrape, you do not get the data back immediately. You get a job ID. You then poll a status endpoint every few seconds until the job is complete and the data is ready.

This is by design. Browser rendering, CAPTCHA solving, and AI extraction take a few seconds. The async pattern means you are not holding a connection open the whole time.

The flow for every job type looks like this:

- Submit the job. Receive a job ID in the response.
- Poll the status endpoint every 2 to 5 seconds.
- When
`status`

is`completed`

, read your results.

Now let us go through each part of the API.

## Authentication

Every request needs the `x-api-key`

header. That is it.

```
-H "x-api-key: YOUR_API_KEY"
```

If the key is missing or invalid, the API returns a `401`

. If your credits are exhausted, you get a `403`

.

Here is the full set of response codes you will encounter:

| Code | What it means |
|---|---|
`200` | Request completed successfully |
`202` | Job queued successfully. Poll for results. |
`400` | Bad request. Missing or invalid parameters. |
`401` | API key missing, invalid, or expired |
`403` | Credits exhausted or plan limit reached |
`404` | Job or resource not found |
`429` | Rate limit hit. Back off and retry. |
`500` | Something went wrong on Spidra's side |

All errors come back in the same format:

```
{
  "status": "error",
  "message": "Detailed explanation of what went wrong"
}
```

## Scraping a single page

The scrape endpoint is where most people start. You give it one to three URLs and it returns structured data from each one.

**Endpoint:** `POST /api/scrape`

### The minimal request

The only required field is `urls`

, which takes an array of URL objects. Each URL object requires a `url`

field and optionally takes an `actions`

array for browser interactions.

```
curl -X POST https://api.spidra.io/api/scrape \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [{"url": "https://news.ycombinator.com"}]
  }'
```

Response:

```
{
  "status": "queued",
  "jobId": "550e8400-e29b-41d4-a716-446655440000",
  "message": "Scrape job has been queued. Poll /api/scrape/550e8400... to get the result."
}
```

Save that `jobId`

. You need it to check on the job.

### Polling for results

Call `GET /api/scrape/{jobId}`

every few seconds until the status changes.

```
curl https://api.spidra.io/api/scrape/550e8400-e29b-41d4-a716-446655440000 \
  -H "x-api-key: YOUR_API_KEY"
```

While the job is running, you will see something like this:

```
{
  "status": "active",
  "progress": {
    "message": "Processing content with AI...",
    "progress": 0.6
  },
  "result": null,
  "error": null
}
```

The `progress`

field goes from 0 to 1 as the job moves through its stages: loading the browser, executing actions, solving CAPTCHAs, running AI extraction.

When it finishes:

```
{
  "status": "completed",
  "progress": {
    "message": "Scrape completed successfully",
    "progress": 1
  },
  "result": {
    "content": "...",
    "data": [
      {
        "url": "https://news.ycombinator.com",
        "title": "Hacker News",
        "markdownContent": "...",
        "success": true,
        "screenshotUrl": null
      }
    ],
    "screenshots": [],
    "ai_extraction_failed": false,
    "stats": {
      "durationMs": 4200,
      "captchaSolvedCount": 0,
      "inputTokens": 312,
      "outputTokens": 84,
      "totalTokens": 396
    }
  },
  "error": null
}
```

The `result.content`

field is the main output. What it contains depends on what you asked for:

- If you passed a
`prompt`

,`content`

is the AI-extracted result - If you did not pass a
`prompt`

,`content`

is the raw page content as Markdown

`result.data`

is an array with one entry per URL. Each entry has the page title, the full Markdown content for that URL, whether it succeeded, and a screenshot URL if you requested one.

`result.stats`

tells you how long the job took, how many CAPTCHAs were solved, and how many tokens the AI extraction used.

### A polling loop in Python

``` python
import requests
import time

API_KEY = "YOUR_API_KEY"
BASE_URL = "https://api.spidra.io/api"
HEADERS = {"x-api-key": API_KEY, "Content-Type": "application/json"}

def scrape(url):
    # Submit the job
    response = requests.post(
        f"{BASE_URL}/scrape",
        headers=HEADERS,
        json={"urls": [{"url": url}]}
    )
    response.raise_for_status()
    job_id = response.json()["jobId"]

    # Poll until complete
    while True:
        status_response = requests.get(
            f"{BASE_URL}/scrape/{job_id}",
            headers=HEADERS
        )
        data = status_response.json()

        if data["status"] == "completed":
            return data["result"]
        elif data["status"] == "failed":
            raise Exception(f"Scrape failed: {data['error']}")

        time.sleep(3)

result = scrape("https://news.ycombinator.com")
print(result["content"])
```

The same in Node.js:

``` js
const API_KEY = "YOUR_API_KEY";
const BASE_URL = "https://api.spidra.io/api";
const HEADERS = {
  "x-api-key": API_KEY,
  "Content-Type": "application/json"
};

async function scrape(url) {
  const submitRes = await fetch(`${BASE_URL}/scrape`, {
    method: "POST",
    headers: HEADERS,
    body: JSON.stringify({ urls: [{ url }] })
  });
  const { jobId } = await submitRes.json();

  while (true) {
    const statusRes = await fetch(`${BASE_URL}/scrape/${jobId}`, {
      headers: HEADERS
    });
    const data = await statusRes.json();

    if (data.status === "completed") return data.result;
    if (data.status === "failed") throw new Error(data.error);

    await new Promise(r => setTimeout(r, 3000));
  }
}

const result = await scrape("https://news.ycombinator.com");
console.log(result.content);
```

## AI extraction with prompts

The plain scrape above gives you raw Markdown. Most of the time you want something more specific. That is where the `prompt`

field comes in.

Add a `prompt`

and Spidra reads the rendered page and extracts exactly what you described. The AI understands context. It knows a number next to a currency symbol is a price, that a short bold line near the top of a product page is probably the title, and that a block of longer text is likely a description. You describe the output you want and it figures out where to find it.

```
curl -X POST https://api.spidra.io/api/scrape \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [{"url": "https://news.ycombinator.com"}],
    "prompt": "Extract the top 10 post titles and their point scores",
    "output": "json"
  }'
```

When the job completes, `result.content`

contains the AI-extracted data as JSON:

```
[
  {"title": "Show HN: I built a thing", "points": 342},
  {"title": "Ask HN: What are you working on?", "points": 289}
]
```

The `output`

field controls the format. It defaults to `"json"`

but you can set it to `"markdown"`

if you want the extracted content as formatted text instead of structured data.

One thing to know: if you set `output: "json"`

without a `prompt`

, Spidra still runs a default AI extraction pass. If you want the raw page content with no AI processing at all, omit both `output`

and `prompt`

.

If AI extraction fails for any reason (a near-empty page, a heavily obfuscated site), Spidra falls back to returning the raw page Markdown and sets `ai_extraction_failed: true`

in the response so your code can detect and handle it.

## Structured output with JSON schema

Prompts are flexible but they are not predictable. The AI decides what fields to return and what to call them. For production pipelines where downstream systems expect a specific shape, that is a problem.

The `schema`

field solves this. Pass a JSON Schema object and the AI must return data that matches it exactly. Required fields always appear in the output, as `null`

if the page does not have that value. Field names match exactly what you defined. The structure never varies between runs.

```
curl -X POST https://api.spidra.io/api/scrape \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [{"url": "https://jobs.example.com/senior-engineer"}],
    "prompt": "Extract the job details. Normalize salary to a number in USD.",
    "schema": {
      "type": "object",
      "required": ["title", "company", "remote", "employment_type"],
      "properties": {
        "title":           {"type": "string"},
        "company":         {"type": "string"},
        "remote":          {"type": ["boolean", "null"]},
        "salary_min":      {"type": ["number", "null"]},
        "salary_max":      {"type": ["number", "null"]},
        "employment_type": {
          "type": ["string", "null"],
          "enum": ["full_time", "part_time", "contract", null]
        }
      }
    }
  }'
```

The response will always have `title`

, `company`

, `remote`

, and `employment_type`

because they are in `required`

. If the page does not mention a salary, `salary_min`

and `salary_max`

come back as `null`

rather than being omitted.

When you provide a `schema`

, `output`

is automatically set to `"json"`

. You do not need to set it yourself.

The schema is validated before the job is queued. If it is malformed, the API returns a `422`

with descriptive errors. Non-fatal issues like unsupported keywords come back as `schema_warnings`

in the response.

Schema limits to be aware of: maximum nesting depth is 5 levels, maximum schema size is 10 KB.

## Browser actions

Some pages do not show you the data you want until you interact with them first. Cookie banners blocking content. A "Load More" button that reveals the next batch of results. A search form you need to fill before anything appears. Tabs that hide content by default.

The `actions`

array on each URL object lets you interact with the page before extraction runs. Actions execute in order, inside a real browser, before Spidra runs your prompt.

Here is an example that dismisses a cookie banner, fills a search form, and waits for results to load:

```
{
  "urls": [{
    "url": "https://example.com/search",
    "actions": [
      {"type": "click", "value": "Accept cookies button"},
      {"type": "type", "selector": "input[name='q']", "value": "wireless headphones"},
      {"type": "click", "selector": "button[type='submit']"},
      {"type": "wait", "duration": 1500},
      {"type": "scroll", "to": "80%"}
    ]
  }],
  "prompt": "Extract all product names and prices from the search results",
  "output": "json"
}
```

Notice that for the first `click`

, the `value`

field is a plain English description of the element. For the second `click`

, the `selector`

field is a CSS selector. Both approaches work and you can mix them in the same actions array.

For any `click`

, `check`

, or `uncheck`

action:

- Use
`selector`

for a CSS selector or XPath expression like`"#accept-cookies"`

or`".submit-btn"`

- Use
`value`

for a plain English description like`"Accept cookies button"`

and Spidra's AI will find the element for you

Both are equally valid. Use whichever makes more sense for the page you are working with.

### Available actions

| Action | What it does | Key fields |
|---|---|---|
`click` | Clicks a button, link, tab, or any element | `selector` or `value` |
`type` | Types text into an input or search field | `selector` , `value` |
`check` | Checks a checkbox | `selector` or `value` |
`uncheck` | Unchecks a checkbox | `selector` or `value` |
`wait` | Pauses for a number of milliseconds | `duration` |
`scroll` | Scrolls to a percentage of the page height | `to` (e.g. `"80%"` ) |
`forEach` | Finds matching elements and processes each one | `value` , `mode` |

### The forEach action

`forEach`

is the most powerful action in the API. It finds a set of repeating elements on the page (product cards, search result links, accordion rows, directory listings) and processes each one individually, then combines all the results into a single output.

It supports three modes:

`inline`

reads the content of each matched element directly. Use this for product cards, table rows, or any content that lives inside the element itself.

`navigate`

follows each element as a link, loads the destination page, and scrapes it. Use this when the data you want is on detail pages that you need to navigate into.

`click`

clicks each element to expand or reveal content, then scrapes what appears. Use this for accordions, modals, or expandable sections.

```
{
  "urls": [{
    "url": "https://directory.example.com/companies",
    "actions": [
      {"type": "click", "value": "Accept cookies"},
      {
        "type": "forEach",
        "value": "Find all company listing cards",
        "mode": "navigate",
        "maxItems": 20,
        "itemPrompt": "Extract company name, website, and industry",
        "pagination": {
          "nextSelector": "a.next-page",
          "maxPages": 3
        }
      }
    ]
  }],
  "output": "json"
}
```

This dismisses the cookie banner, finds every company card on the page, navigates into each one, extracts the company details, and repeats across 3 pages of pagination. All in a single API call.

## Proxy and geo-targeting

Some sites block traffic from cloud IP ranges. Others serve different content based on location. The `useProxy`

and `proxyCountry`

fields route your requests through residential proxies to handle both situations.

```
{
  "urls": [{"url": "https://amazon.de/dp/B123456"}],
  "prompt": "Extract the product price",
  "output": "json",
  "useProxy": true,
  "proxyCountry": "de"
}
```

Setting `useProxy: true`

routes the request through the residential proxy network. `proxyCountry`

accepts:

- A two-letter ISO country code like
`"us"`

,`"de"`

,`"gb"`

,`"fr"`

`"eu"`

to rotate randomly across all 27 EU member states`"global"`

or omit it entirely for no country preference

Proxy usage is billed from your bandwidth quota, not your credits. There is no credit multiplier for using proxies.

## Additional options

### Extract content only

Strip navigation, headers, footers, and sidebars before extraction. Useful when you only want the main content of a page and want to reduce noise.

```
{
  "urls": [{"url": "https://blog.example.com/article"}],
  "prompt": "Summarize this article",
  "extractContentOnly": true
}
```

### Screenshots

Capture screenshots of scraped pages for debugging, archival, or visual monitoring.

```
{
  "urls": [{"url": "https://example.com"}],
  "screenshot": true,
  "fullPageScreenshot": true
}
```

`screenshot: true`

captures the visible viewport. `fullPageScreenshot: true`

captures the entire scrollable page. The screenshot URLs are returned in `result.screenshots`

and in each item's `screenshotUrl`

field.

### Authenticated scraping

Pass session cookies to access pages behind a login. Get the cookies from your browser's DevTools after logging in manually, then include them in your request.

```
{
  "urls": [{"url": "https://app.example.com/dashboard"}],
  "prompt": "Extract the account summary",
  "cookies": "session_id=abc123; auth_token=xyz789"
}
```

Standard cookie format (`name=value; name2=value2`

) and Chrome DevTools paste format both work. Cookies are passed ephemerally to the browser worker and never stored.

## Batch scraping

When you have a list of URLs to process, the batch endpoint handles up to 50 at a time in parallel. Each URL runs in its own independent worker.

**Endpoint:** `POST /api/batch/scrape`

``` python
import requests
import time

API_KEY = "YOUR_API_KEY"
BASE_URL = "https://api.spidra.io/api"
HEADERS = {"x-api-key": API_KEY, "Content-Type": "application/json"}

urls = [
    "https://example.com/product/1",
    "https://example.com/product/2",
    "https://example.com/product/3",
]

# Submit the batch
response = requests.post(
    f"{BASE_URL}/batch/scrape",
    headers=HEADERS,
    json={
        "urls": urls,
        "prompt": "Extract the product name, price, and availability",
        "output": "json",
    }
)
batch_id = response.json()["batchId"]

# Poll until complete
while True:
    status = requests.get(
        f"{BASE_URL}/batch/scrape/{batch_id}",
        headers=HEADERS
    ).json()

    if status["status"] in ("completed", "failed", "partial"):
        break

    time.sleep(3)

# Process results
for item in status["items"]:
    if item["status"] == "completed":
        print(f"{item['url']}: {item['result']['content']}")
    else:
        print(f"Failed: {item['url']} — {item['error']}")
```

The batch response includes a `status`

for the overall batch and an `items`

array with one entry per URL. Each item has its own `status`

, `result`

, and `error`

so you can see exactly which URLs succeeded and which failed.

Credits are reserved upfront when you submit and reconciled per item when processing completes. If a URL fails, credits for that item are returned.

### Batch with structured output

Everything that works in single scrape works in batch. Pass a schema and every item in the batch returns data matching that shape:

```
requests.post(
    f"{BASE_URL}/batch/scrape",
    headers=HEADERS,
    json={
        "urls": urls,
        "prompt": "Extract the product details",
        "schema": {
            "type": "object",
            "required": ["name", "price"],
            "properties": {
                "name":      {"type": "string"},
                "price":     {"type": ["number", "null"]},
                "currency":  {"type": ["string", "null"]},
                "available": {"type": ["boolean", "null"]}
            }
        }
    }
)
```

### Managing batches

Beyond submitting and polling, the batch API has a few more endpoints worth knowing:

| Endpoint | What it does |
|---|---|
`GET /api/batch/scrape` | List all your batch jobs with status and credit usage |
`DELETE /api/batch/scrape/{batchId}` | Cancel a running or pending batch. Credits for unprocessed items are refunded. |
`POST /api/batch/scrape/{batchId}/retry` | Re-queue only the failed items in a completed batch without resubmitting the ones that already succeeded. |

The retry endpoint is particularly useful for large batches where a handful of items fail due to transient issues. You do not need to resubmit the full batch, just the failures.

## Crawling

Batch scraping works when you already know the URLs. Crawling is for when you want Spidra to discover pages for you.

You give it a starting URL, describe which pages to follow, and describe what to extract from each one. Spidra loads the base URL, finds links matching your crawl instruction, visits each one up to your `maxPages`

limit, and applies your transform instruction to every page it visits.

**Endpoint:** `POST /api/crawl`

```
response = requests.post(
    f"{BASE_URL}/crawl",
    headers=HEADERS,
    json={
        "baseUrl": "https://docs.example.com",
        "crawlInstruction": "Follow all documentation pages. Skip changelog and login pages.",
        "transformInstruction": "Extract the page title and full body text as clean Markdown. Preserve all headings and code examples.",
        "maxPages": 20,
        "useProxy": False
    }
)
job_id = response.json()["jobId"]
```

Three fields are required: `baseUrl`

, `crawlInstruction`

, and `transformInstruction`

. Everything else is optional.

`maxPages`

defaults to 5 and goes up to 20. The crawl discovers links from the base URL first, then works through them in order of discovery.

Poll `GET /api/crawl/{jobId}`

for status. When complete, results are available through several endpoints:

| Endpoint | What it returns |
|---|---|
`GET /api/crawl/{jobId}` | Overall status and summary |
`GET /api/crawl/{jobId}/pages` | All crawled pages with extracted data and signed URLs to the original HTML and Markdown |
`GET /api/crawl/{jobId}/download` | ZIP archive of all results |
`POST /api/crawl/{jobId}/extract` | Run a new extraction on already-crawled pages without re-crawling |
`GET /api/crawl/history` | Paginated list of your past crawl jobs |

The `extract`

endpoint is worth highlighting. If you crawl a site and later decide you want to extract different fields, you can run a new extraction on the cached pages without making a single new browser request. That saves time and credits.

### A complete crawl example

``` python
import requests
import time
import json

API_KEY = "YOUR_API_KEY"
BASE_URL = "https://api.spidra.io/api"
HEADERS = {"x-api-key": API_KEY, "Content-Type": "application/json"}

# Submit the crawl
job = requests.post(
    f"{BASE_URL}/crawl",
    headers=HEADERS,
    json={
        "baseUrl": "https://blog.example.com",
        "crawlInstruction": "Follow all blog post pages. Skip tag pages, author pages, and the homepage.",
        "transformInstruction": "Extract the article title, author, publish date, and full body text.",
        "maxPages": 15
    }
).json()

job_id = job["jobId"]
print(f"Crawl job started: {job_id}")

# Poll until complete
while True:
    status = requests.get(
        f"{BASE_URL}/crawl/{job_id}",
        headers=HEADERS
    ).json()

    print(f"Status: {status['status']}")

    if status["status"] == "completed":
        break
    elif status["status"] == "failed":
        raise Exception("Crawl failed")

    time.sleep(5)

# Fetch all crawled pages
pages = requests.get(
    f"{BASE_URL}/crawl/{job_id}/pages",
    headers=HEADERS
).json()

# Save as JSONL
with open("crawl_results.jsonl", "w") as f:
    for page in pages["pages"]:
        f.write(json.dumps({
            "url": page["url"],
            "data": page["data"]
        }) + "\n")

print(f"Saved {len(pages['pages'])} pages")
```

## Monitoring and logs

The Spidra API keeps a log of every scrape job you run. This is useful for debugging, auditing, and understanding your credit consumption.

```
# List recent scrape logs
logs = requests.get(
    f"{BASE_URL}/scrape-logs",
    headers=HEADERS
).json()

for log in logs["data"]:
    print(f"{log['started_at']} — {log['status']} — {log['latency_ms']}ms — {log['tokens_used']} tokens")

# Get full details of a specific log
log_detail = requests.get(
    f"{BASE_URL}/scrape-logs/{log['uuid']}",
    headers=HEADERS
).json()
```

### Usage statistics

Track your credit consumption over time:

```
usage = requests.get(
    f"{BASE_URL}/account/usage",
    headers=HEADERS
).json()

print(usage)
```

This returns time-series data covering requests, tokens, crawls, and credit consumption over a configurable period.

## Putting it all together: a real pipeline

Here is a complete example that combines scraping, batch processing, and structured output into a pipeline that collects job listings from multiple pages and saves them to a JSONL file:

``` python
import requests
import time
import json

API_KEY = "YOUR_API_KEY"
BASE_URL = "https://api.spidra.io/api"
HEADERS = {"x-api-key": API_KEY, "Content-Type": "application/json"}

JOB_SCHEMA = {
    "type": "object",
    "required": ["title", "company", "location"],
    "properties": {
        "title":           {"type": "string"},
        "company":         {"type": "string"},
        "location":        {"type": ["string", "null"]},
        "remote":          {"type": ["boolean", "null"]},
        "salary_min":      {"type": ["number", "null"]},
        "salary_max":      {"type": ["number", "null"]},
        "employment_type": {
            "type": ["string", "null"],
            "enum": ["full_time", "part_time", "contract", None]
        }
    }
}

def collect_job_urls(board_url):
    """Use forEach to collect job listing URLs from a board page."""
    response = requests.post(
        f"{BASE_URL}/scrape",
        headers=HEADERS,
        json={
            "urls": [{
                "url": board_url,
                "actions": [
                    {"type": "click", "value": "Accept cookies"},
                    {
                        "type": "forEach",
                        "value": "Find all job listing links",
                        "mode": "navigate",
                        "maxItems": 50,
                        "itemPrompt": "Extract job title, company, location, remote status, salary range, and employment type",
                        "pagination": {
                            "nextSelector": "a.next-page",
                            "maxPages": 3
                        }
                    }
                ]
            }],
            "output": "json",
            "schema": JOB_SCHEMA
        }
    )
    job_id = response.json()["jobId"]

    while True:
        status = requests.get(
            f"{BASE_URL}/scrape/{job_id}",
            headers=HEADERS
        ).json()

        if status["status"] == "completed":
            return status["result"]["content"]
        elif status["status"] == "failed":
            raise Exception(status["error"])

        time.sleep(3)

# Collect from multiple job boards
boards = [
    "https://jobs.example.com/engineering",
    "https://careers.anothersite.com/remote",
]

all_jobs = []
for board in boards:
    print(f"Collecting from {board}...")
    jobs = collect_job_urls(board)
    if isinstance(jobs, list):
        all_jobs.extend(jobs)
    print(f"  Got {len(jobs) if isinstance(jobs, list) else 0} jobs")

# Save results
with open("jobs.jsonl", "w") as f:
    for job in all_jobs:
        f.write(json.dumps(job) + "\n")

print(f"\nTotal: {len(all_jobs)} jobs saved to jobs.jsonl")
```

## Error handling

Wrap your API calls properly and handle the cases that actually come up in production.

``` python
import requests

def safe_scrape(url, prompt):
    try:
        response = requests.post(
            f"{BASE_URL}/scrape",
            headers=HEADERS,
            json={
                "urls": [{"url": url}],
                "prompt": prompt,
                "output": "json"
            }
        )

        if response.status_code == 401:
            raise Exception("Invalid API key. Check your x-api-key header.")

        if response.status_code == 403:
            raise Exception("Credits exhausted or plan limit reached.")

        if response.status_code == 429:
            raise Exception("Rate limit hit. Wait before retrying.")

        response.raise_for_status()
        return response.json()["jobId"]

    except requests.exceptions.ConnectionError:
        raise Exception("Could not connect to the Spidra API.")
```

For polling loops, always handle the `failed`

status and check `ai_extraction_failed`

in the result:

```
if status["status"] == "completed":
    result = status["result"]

    if result.get("ai_extraction_failed"):
        # AI extraction failed, content is raw Markdown fallback
        print("AI extraction failed, using raw content")
        content = result["data"][0]["markdownContent"]
    else:
        content = result["content"]
```

## API reference summary

| Method | Endpoint | Purpose |
|---|---|---|
`POST` | `/api/scrape` | Submit a scrape job (1 to 3 URLs) |
`GET` | `/api/scrape/{jobId}` | Poll for job status and results |
`POST` | `/api/batch/scrape` | Submit a batch job (up to 50 URLs) |
`GET` | `/api/batch/scrape/{batchId}` | Poll batch status and per-item results |
`GET` | `/api/batch/scrape` | List all your batch jobs |
`DELETE` | `/api/batch/scrape/{batchId}` | Cancel a batch and refund unused credits |
`POST` | `/api/batch/scrape/{batchId}/retry` | Retry only the failed items in a batch |
`POST` | `/api/crawl` | Submit a crawl job |
`GET` | `/api/crawl/{jobId}` | Poll crawl status |
`GET` | `/api/crawl/{jobId}/pages` | Get all crawled pages with extracted data |
`POST` | `/api/crawl/{jobId}/extract` | Re-extract from crawled pages without re-crawling |
`GET` | `/api/crawl/{jobId}/download` | Download crawl results as ZIP |
`GET` | `/api/crawl/history` | List your past crawl jobs |
`GET` | `/api/scrape-logs` | List recent scrape logs |
`GET` | `/api/scrape-logs/{id}` | Get full details of a single log |
`GET` | `/api/account/usage` | Get usage statistics |

## What next

You now have a working understanding of every part of the Spidra API. Here are the natural next steps depending on what you are building:

If you want to go deeper on browser actions and `forEach`

, read the [Browser Actions Guide](https://docs.spidra.io/features/actions) in the docs. It covers every option for each action type with real examples.

If you are building something that needs guaranteed output shapes, read the [Structured Output Guide](https://docs.spidra.io/features/structured-output) for full details on schemas, nullable fields, Zod and Pydantic integration, and schema limits.

If you are using an SDK in a specific language, each one has its own guide: [Node.js](https://docs.spidra.io/sdks/node), [Python](https://docs.spidra.io/sdks/python), [Go](https://docs.spidra.io/sdks/go), [PHP](https://docs.spidra.io/sdks/php), [Ruby](https://docs.spidra.io/sdks/ruby), [Rust](https://docs.spidra.io/sdks/rust), [.NET](https://docs.spidra.io/sdks/dotnet), [Elixir](https://docs.spidra.io/sdks/elixir), [Java](https://docs.spidra.io/sdks/java), and [Swift](https://docs.spidra.io/sdks/swift).

Get your API key at [app.spidra.io](https://app.spidra.io/). The free plan has 300 credits and no card required.
