{"slug": "spidra-api-node-js-tutorial-scrape-any-website-with-javascript-and-typescript", "title": "Spidra API Node.js tutorial: scrape any website with JavaScript and TypeScript", "summary": "Spidra released a Node.js SDK that enables developers to scrape any website using plain English descriptions instead of CSS selectors or browser automation tools. The SDK handles browser rendering, anti-bot bypass, CAPTCHA solving, and AI extraction on Spidra's infrastructure, allowing developers to extract structured data from JavaScript-heavy and protected websites with a single API call. The TypeScript-native package supports scraping individual pages, batch processing up to 50 URLs, and crawling entire websites without additional configuration.", "body_md": "Web scraping in Node.js has a familiar progression. You start with `axios`\n\nor `node-fetch`\n\nfor static pages. Then a modern site returns an empty HTML shell and you reach for Puppeteer. Then Cloudflare blocks you and you spend an evening on stealth plugins. Then the page structure changes and your selectors are worthless again.\n\nSpidra's [Node.js SDK](https://docs.spidra.io/sdks/node) (`spidra-js`\n\n) cuts across all of that. You describe what you want from a page in plain English, and the SDK returns structured data. The browser rendering, anti-bot bypass, CAPTCHA solving, and AI extraction all run on Spidra's infrastructure. Your code just handles the result.\n\nThis tutorial covers the full SDK, from installation through crawling an entire website. The SDK is TypeScript-native so you get complete type safety out of the box. Every example works as-is with no additional configuration.\n\n## Prerequisites\n\n- Node.js 18 or higher\n\n## Installation\n\n```\nnpm install spidra-js\n```\n\nThe package includes TypeScript types. You do not need a separate `@types/spidra-js`\n\npackage.\n\nStore your API key as an environment variable. Never hardcode it in source files.\n\n```\nexport SPIDRA_API_KEY=\"spd_YOUR_API_KEY\"\n```\n\n## Setting up the client\n\nImport `SpidraClient`\n\nand initialise it with your API key.\n\n**TypeScript / ESM:**\n\n``` js\nimport { SpidraClient } from 'spidra-js'\n\nconst spidra = new SpidraClient({ apiKey: process.env.SPIDRA_API_KEY! })\n```\n\n**CommonJS:**\n\n``` js\nconst { SpidraClient } = require('spidra-js')\n\nconst spidra = new SpidraClient({ apiKey: process.env.SPIDRA_API_KEY })\n```\n\nThe client exposes five namespaces:\n\n| Namespace | What it handles |\n|---|---|\n`spidra.scrape` | Scraping one to three URLs with browser automation and AI extraction |\n`spidra.batch` | Processing up to 50 URLs in parallel |\n`spidra.crawl` | Discovering and scraping pages across an entire website |\n`spidra.logs` | History of every scrape your API key has made |\n`spidra.usage` | Credit and request consumption statistics |\n\nEvery method is `async`\n\nand returns a `Promise`\n\n. The examples below use top-level `await`\n\nfor clarity. If your project does not support top-level `await`\n\n, wrap the calls in an `async`\n\nfunction.\n\n## Scraping a page\n\n### Your first scrape\n\n``` js\nimport { SpidraClient } from 'spidra-js'\n\nconst spidra = new SpidraClient({ apiKey: process.env.SPIDRA_API_KEY! })\n\nconst job = await spidra.scrape.run({\n  urls: [{ url: 'https://news.ycombinator.com' }],\n})\n\nconsole.log(job.result.content)\n```\n\nWithout a `prompt`\n\n, Spidra loads the page in a real browser, executes all JavaScript, and returns the full rendered content as Markdown. That is what ends up in `job.result.content`\n\n.\n\n### How the job lifecycle works\n\n`run()`\n\nsubmits the job and polls in the background until it completes. From your side it looks like a single `await`\n\n. Under the hood, the job moves through these states:\n\n```\nwaiting → active → completed (or failed)\n```\n\nIf you want to submit a job and check on it yourself rather than waiting, use `submit()`\n\nand `get()`\n\nseparately:\n\n``` js\n// Submit and get a job ID immediately\nconst queued = await spidra.scrape.submit({\n  urls: [{ url: 'https://example.com' }],\n  prompt: 'Extract the main headline',\n})\n\nconsole.log(`Job submitted: ${queued.jobId}`)\n\n// Check later\nawait new Promise(r => setTimeout(r, 5000))\nconst status = await spidra.scrape.get(queued.jobId)\n\nif (status.status === 'completed') {\n  console.log(status.result.content)\n} else if (status.status === 'failed') {\n  console.error(`Failed: ${status.error}`)\n}\n```\n\n## Extracting data with prompts\n\nAdd a `prompt`\n\nand Spidra uses AI to extract exactly what you described from the rendered page. You do not need to know the page structure or write any selectors.\n\n``` js\nconst job = await spidra.scrape.run({\n  urls: [{ url: 'https://news.ycombinator.com' }],\n  prompt: 'Extract the top 10 post titles and their point scores',\n  output: 'json',\n})\n\nconsole.log(job.result.content)\n// [{ \"title\": \"Show HN: I built a thing\", \"points\": 342 }, ...]\n```\n\nSetting `output: 'json'`\n\ntells the AI to return structured JSON. The default is `'markdown'`\n\n.\n\nThe AI understands context. It knows a number next to a currency symbol is a price, a short bold line at the top of a product page is probably the title, and a longer block of text is likely a description. You describe the result you want and it finds it on the page.\n\nThat said, the SDK also fully supports CSS selectors and XPath for browser interactions when you want to be precise. We will cover that in the browser actions section.\n\n## Enforcing output shape with JSON schema\n\nPlain prompts are flexible but not predictable. The AI decides what fields to return and what to call them. That works for exploration but causes problems in production when a database or another service expects a consistent shape every single time.\n\nThe `schema`\n\nfield solves this. Pass a JSON Schema object and the AI must match it exactly. Fields in `required`\n\nalways appear in the output, as `null`\n\nif the page does not have that value.\n\n``` js\nconst job = await spidra.scrape.run({\n  urls: [{ url: 'https://jobs.example.com/senior-engineer' }],\n  prompt: 'Extract the job listing details. Normalize salary to a USD number.',\n  output: 'json',\n  schema: {\n    type: 'object',\n    required: ['title', 'company', 'remote'],\n    properties: {\n      title:           { type: 'string' },\n      company:         { type: 'string' },\n      remote:          { type: ['boolean', 'null'] },\n      salary_min:      { type: ['number', 'null'] },\n      salary_max:      { type: ['number', 'null'] },\n      employment_type: {\n        type: ['string', 'null'],\n        enum: ['full_time', 'part_time', 'contract', null],\n      },\n      skills: { type: 'array', items: { type: 'string' } },\n    },\n  },\n})\n\nconsole.log(job.result.content)\n// {\n//   title: \"Senior Software Engineer\",\n//   company: \"Acme Corp\",\n//   remote: true,\n//   salary_min: 120000,\n//   salary_max: 160000,\n//   employment_type: \"full_time\",\n//   skills: [\"TypeScript\", \"PostgreSQL\", \"AWS\"]\n// }\n```\n\nSince the SDK is TypeScript-native, you can type the result directly:\n\n```\ninterface JobListing {\n  title: string\n  company: string\n  remote: boolean | null\n  salary_min: number | null\n  salary_max: number | null\n  employment_type: 'full_time' | 'part_time' | 'contract' | null\n  skills: string[]\n}\n\nconst content = job.result.content as JobListing\nconsole.log(`${content.title} at ${content.company}`)\n```\n\nIf you use [Zod](https://www.npmjs.com/package/zod) for runtime validation, generate the schema from your existing Zod type and pass it directly:\n\n``` js\nimport { z } from 'zod'\nimport { zodToJsonSchema } from 'zod-to-json-schema'\n\nconst JobListingSchema = z.object({\n  title:           z.string(),\n  company:         z.string(),\n  remote:          z.boolean().nullable(),\n  salary_min:      z.number().nullable(),\n  salary_max:      z.number().nullable(),\n  employment_type: z.enum(['full_time', 'part_time', 'contract']).nullable(),\n  skills:          z.array(z.string()),\n})\n\nconst job = await spidra.scrape.run({\n  urls: [{ url: 'https://jobs.example.com/senior-engineer' }],\n  prompt: 'Extract the job listing details',\n  schema: zodToJsonSchema(JobListingSchema),\n})\n\nconst listing = JobListingSchema.parse(job.result.content)\n```\n\nOne schema definition in your codebase that handles both runtime validation and scraping output shape.\n\n## Browser actions\n\nSome pages require interaction before the content you want is visible. A cookie banner blocking everything. A search form that needs filling. Lazy-loaded content that only appears after scrolling. Tabs that hide data by default.\n\nPass an `actions`\n\narray inside the URL object and those [actions](https://docs.spidra.io/features/actions) execute in order inside a real browser before extraction runs.\n\n``` js\nconst job = await spidra.scrape.run({\n  urls: [\n    {\n      url: 'https://example.com/products',\n      actions: [\n        { type: 'click', selector: '#accept-cookies' },\n        { type: 'wait', duration: 1000 },\n        { type: 'scroll', to: '80%' },\n      ],\n    },\n  ],\n  prompt: 'Extract all product names and prices visible on the page',\n})\n```\n\nFor `click`\n\n, `check`\n\n, and `uncheck`\n\nactions, you have two options for targeting an element:\n\n`selector`\n\nfor a CSS selector or XPath expression like`'#accept-cookies'`\n\nor`'.submit-btn'`\n\n`value`\n\nfor a plain English description like`'Accept cookies button'`\n\nand Spidra locates it using AI\n\nBoth are valid, and you can mix them in the same actions array:\n\n```\nactions: [\n  { type: 'click', selector: '#accept-cookies' },  // CSS selector\n  { type: 'click', value: 'Search button' },         // plain English\n]\n```\n\nUse whichever is more convenient. If the element has a clean, stable ID or class, use `selector`\n\n. If the page is complex or you want the action to survive layout changes, use `value`\n\n.\n\n### All available actions\n\n| Action | What it does | Key fields |\n|---|---|---|\n`click` | Clicks a button, link, or any element | `selector` or `value` |\n`type` | Types text into an input field | `selector` , `value` |\n`check` | Checks a checkbox | `selector` or `value` |\n`uncheck` | Unchecks a checkbox | `selector` or `value` |\n`wait` | Pauses for a number of milliseconds | `duration` |\n`scroll` | Scrolls to a percentage of the page height | `to` (e.g. `'80%'` ) |\n`forEach` | Finds matching elements and processes each one | `value` , `mode` |\n\n### The forEach action\n\n[ forEach](https://docs.spidra.io/features/actions#foreach-process-every-element-on-a-page) is the most powerful action in the SDK. It finds a set of matching elements on the page and processes each one individually, combining all the results into a single output.\n\nThree modes:\n\n`inline`\n\nreads the content of each matched element directly. For product cards, table rows, or content that lives inside the element itself.`navigate`\n\nfollows each element as a link, loads the destination page, and scrapes it. For detail pages you need to click into.`click`\n\nclicks each element to expand or reveal content, then scrapes what appears. For accordions, modals, or expandable sections.\n\n``` js\nconst job = await spidra.scrape.run({\n  urls: [\n    {\n      url: 'https://directory.example.com/companies',\n      actions: [\n        { type: 'click', value: 'Accept cookies' },\n        {\n          type: 'forEach',\n          value: 'Find all company listing cards',\n          mode: 'navigate',\n          maxItems: 20,\n          itemPrompt: 'Extract company name, website, and industry',\n          pagination: {\n            nextSelector: 'a.next-page',\n            maxPages: 3,\n          },\n        },\n      ],\n    },\n  ],\n  output: 'json',\n})\n```\n\nThis dismisses the cookie banner, finds every company card on the page, navigates into each company profile, extracts the company details, and repeats across three pages of pagination. One request, one `await`\n\n.\n\n## Proxy and geo-targeting\n\nSome sites block cloud infrastructure IP ranges or serve different content based on location. Set `useProxy: true`\n\nto route through a [residential proxy](https://docs.spidra.io/features/stealth-mode).\n\n``` js\nconst job = await spidra.scrape.run({\n  urls: [{ url: 'https://www.amazon.de/gp/bestsellers' }],\n  prompt: 'List the top 10 products with name and price',\n  useProxy: true,\n  proxyCountry: 'de',\n})\n```\n\n`proxyCountry`\n\naccepts:\n\n- A two-letter ISO country code like\n`'us'`\n\n,`'de'`\n\n,`'gb'`\n\n,`'fr'`\n\n,`'jp'`\n\n`'eu'`\n\nto rotate randomly across all 27 EU member states`'global'`\n\nor omit it for no country preference\n\nProxy usage is billed from your bandwidth quota, not your credits.\n\n## Scraping pages behind a login\n\nPass session cookies to access authenticated content. Log in through your browser, open DevTools, copy the `Cookie`\n\nheader from any [authenticated request](https://docs.spidra.io/features/authenticated-scraping), and pass it as a string.\n\n``` js\nconst job = await spidra.scrape.run({\n  urls: [{ url: 'https://app.example.com/dashboard' }],\n  prompt: 'Extract the monthly revenue and active user count',\n  cookies: 'session=abc123; auth_token=xyz789',\n})\n```\n\nStandard cookie format (`name=value; name2=value2`\n\n) and Chrome DevTools paste format both work.\n\n## Stripping boilerplate\n\n`extractContentOnly`\n\nstrips navigation, headers, footers, and sidebars before extraction runs. Useful for articles, documentation pages, and any page where the main content is surrounded by heavy navigation.\n\n``` js\nconst job = await spidra.scrape.run({\n  urls: [{ url: 'https://blog.example.com/long-article' }],\n  prompt: 'Summarize this article in three sentences',\n  extractContentOnly: true,\n})\n```\n\n## Screenshots\n\nCapture screenshots of pages for debugging, monitoring, or archival.\n\n``` js\nconst job = await spidra.scrape.run({\n  urls: [{ url: 'https://example.com' }],\n  screenshot: true,\n  fullPageScreenshot: true,\n})\n\nconsole.log(job.result.screenshots)  // array of URLs\n```\n\n`screenshot: true`\n\ncaptures the visible viewport. `fullPageScreenshot: true`\n\ncaptures the entire scrollable page.\n\n## Batch scraping\n\nWhen you have a list of URLs, the [batch endpoint](https://docs.spidra.io/features/batch-scraping) processes up to 50 at a time in parallel. Each URL runs in its own independent worker.\n\n``` js\nconst batch = await spidra.batch.run({\n  urls: [\n    'https://shop.example.com/product/1',\n    'https://shop.example.com/product/2',\n    'https://shop.example.com/product/3',\n  ],\n  prompt: 'Extract the product name, price, and whether it is in stock',\n  output: 'json',\n})\n\nconsole.log(`${batch.completedCount}/${batch.totalUrls} completed`)\n\nfor (const item of batch.items) {\n  if (item.status === 'completed') {\n    console.log(item.url, item.result)\n  } else {\n    console.error(`Failed: ${item.url} — ${item.error}`)\n  }\n}\n```\n\n### Processing large URL lists\n\nThe batch endpoint caps at 50 URLs per request. For larger lists, chunk them:\n\n``` js\nasync function scrapeAll(urls: string[], prompt: string) {\n  const results: Array<{ url: string; data: unknown }> = []\n  const chunkSize = 50\n\n  for (let i = 0; i < urls.length; i += chunkSize) {\n    const chunk = urls.slice(i, i + chunkSize)\n    const batchNum = Math.floor(i / chunkSize) + 1\n    const totalBatches = Math.ceil(urls.length / chunkSize)\n\n    console.log(`Processing batch ${batchNum} of ${totalBatches}...`)\n\n    const batch = await spidra.batch.run({\n      urls: chunk,\n      prompt,\n      output: 'json',\n    })\n\n    for (const item of batch.items) {\n      if (item.status === 'completed') {\n        results.push({ url: item.url, data: item.result })\n      } else {\n        console.warn(`Failed: ${item.url}`)\n      }\n    }\n  }\n\n  return results\n}\n\nconst urls = Array.from(\n  { length: 200 },\n  (_, i) => `https://example.com/product/${i + 1}`\n)\n\nconst results = await scrapeAll(urls, 'Extract product name and price')\n```\n\n### Managing batches\n\n**Retry failed items** without resubmitting the ones that already succeeded:\n\n```\nif (batch.failedCount > 0) {\n  await spidra.batch.retry(batch.batchId)\n}\n```\n\n**Cancel a running batch** and get credits refunded for items that have not started yet:\n\n``` js\nconst response = await spidra.batch.cancel(batchId)\nconsole.log(`Cancelled ${response.cancelledItems} items, refunded ${response.creditsRefunded} credits`)\n```\n\n## Crawling entire websites\n\nBatch scraping works when you already know the URLs. Crawling is for when you want Spidra to discover them for you.\n\nGive it a starting URL, describe which links to follow, and describe what to extract from each page. Spidra loads the base URL, finds matching links, visits each one up to your `maxPages`\n\nlimit, and applies your `transformInstruction`\n\nto every page it visits.\n\n``` js\nimport { SpidraClient } from 'spidra-js'\n\nconst spidra = new SpidraClient({ apiKey: process.env.SPIDRA_API_KEY! })\n\nconst job = await spidra.crawl.run({\n  baseUrl: 'https://competitor.com/blog',\n  crawlInstruction: 'Follow links to blog posts only. Skip tag pages, category pages, and the homepage.',\n  transformInstruction: 'Extract the post title, author name, publish date, and a one-sentence summary.',\n  maxPages: 30,\n  useProxy: true,\n})\n\nfor (const page of job.result) {\n  console.log(page.url, page.data)\n}\n```\n\nThree fields are required: `baseUrl`\n\n, `crawlInstruction`\n\n, and `transformInstruction`\n\n. `maxPages`\n\ndefaults to 5 and can be set up to 20.\n\nFor larger crawls that take more time, the default 120-second timeout may not be enough. If you are hitting timeouts, fire the crawl with `submit()`\n\nand poll with `get()`\n\nyourself:\n\n``` js\nconst queued = await spidra.crawl.submit({\n  baseUrl: 'https://docs.example.com',\n  crawlInstruction: 'Follow all documentation pages. Skip changelog and login pages.',\n  transformInstruction: 'Extract the page title and full body text.',\n  maxPages: 20,\n})\n\n// Poll every 10 seconds\nlet status = await spidra.crawl.get(queued.jobId)\n\nwhile (status.status !== 'completed' && status.status !== 'failed') {\n  await new Promise(r => setTimeout(r, 10000))\n  status = await spidra.crawl.get(queued.jobId)\n  console.log(`Status: ${status.status}`)\n}\n\nfor (const page of status.result ?? []) {\n  console.log(page.url, page.data)\n}\n```\n\n### Re-extracting with a different prompt\n\nIf you crawled a site and want to pull out different information, use `extract()`\n\nto run a new AI pass over the already-crawled content without making new browser requests:\n\n``` js\nconst queued = await spidra.crawl.extract(\n  completedJobId,\n  'Extract only product SKUs and prices as structured JSON',\n)\n\nconst result = await spidra.crawl.get(queued.jobId)\n```\n\n## Using the SDK in different environments\n\n### Next.js API route\n\n``` js\n// app/api/scrape/route.ts\nimport { SpidraClient } from 'spidra-js'\nimport { NextResponse } from 'next/server'\n\nconst spidra = new SpidraClient({ apiKey: process.env.SPIDRA_API_KEY! })\n\nexport async function POST(request: Request) {\n  const { url, prompt } = await request.json()\n\n  try {\n    const job = await spidra.scrape.run({\n      urls: [{ url }],\n      prompt,\n      output: 'json',\n    })\n\n    return NextResponse.json({ data: job.result.content })\n  } catch (error) {\n    return NextResponse.json({ error: 'Scrape failed' }, { status: 500 })\n  }\n}\n```\n\n### Express\n\n``` python\nimport express from 'express'\nimport { SpidraClient } from 'spidra-js'\n\nconst app = express()\nconst spidra = new SpidraClient({ apiKey: process.env.SPIDRA_API_KEY! })\n\napp.use(express.json())\n\napp.post('/scrape', async (req, res) => {\n  const { url, prompt } = req.body\n\n  try {\n    const job = await spidra.scrape.run({\n      urls: [{ url }],\n      prompt,\n      output: 'json',\n    })\n    res.json({ data: job.result.content })\n  } catch (err) {\n    res.status(500).json({ error: 'Scrape failed' })\n  }\n})\n\napp.listen(3000)\n```\n\n### Bun\n\nThe SDK works with Bun out of the box. No changes needed.\n\n```\nbun add spidra-js\njs\nimport { SpidraClient } from 'spidra-js'\n\nconst spidra = new SpidraClient({ apiKey: Bun.env.SPIDRA_API_KEY! })\n\nconst job = await spidra.scrape.run({\n  urls: [{ url: 'https://example.com' }],\n  prompt: 'Extract the main headline',\n})\n\nconsole.log(job.result.content)\n```\n\n## Error handling\n\nEvery API error maps to a typed exception class. Catch exactly what you care about and let everything else propagate.\n\n``` js\nimport {\n  SpidraError,\n  SpidraAuthenticationError,\n  SpidraInsufficientCreditsError,\n  SpidraRateLimitError,\n  SpidraServerError,\n} from 'spidra-js'\n\ntry {\n  const job = await spidra.scrape.run({\n    urls: [{ url: 'https://example.com' }],\n    prompt: 'Extract the main headline',\n  })\n  console.log(job.result.content)\n\n} catch (err) {\n  if (err instanceof SpidraAuthenticationError) {\n    console.error('API key is missing or invalid. Check your SPIDRA_API_KEY.')\n  } else if (err instanceof SpidraInsufficientCreditsError) {\n    console.error('Account is out of credits. Top up at app.spidra.io.')\n  } else if (err instanceof SpidraRateLimitError) {\n    console.warn('Rate limit hit. Slow down and retry.')\n  } else if (err instanceof SpidraServerError) {\n    console.error(`Server error (${err.status}): ${err.message}. Retry is usually safe.`)\n  } else if (err instanceof SpidraError) {\n    console.error(`API error ${err.status}: ${err.message}`)\n  } else {\n    throw err\n  }\n}\n```\n\n| Exception | HTTP status | When it fires |\n|---|---|---|\n`SpidraAuthenticationError` | 401 | API key missing or invalid |\n`SpidraInsufficientCreditsError` | 403 | No credits remaining |\n`SpidraRateLimitError` | 429 | Too many requests |\n`SpidraServerError` | 500 | Unexpected error on Spidra's side |\n`SpidraError` | any | Base class for all exceptions |\n\nAlso check the `ai_extraction_failed`\n\nflag in the result. If AI extraction fails for any reason, Spidra falls back to raw Markdown and sets this flag:\n\n``` js\nconst job = await spidra.scrape.run({\n  urls: [{ url: 'https://example.com' }],\n  prompt: 'Extract the main headline',\n})\n\nif (job.result.ai_extraction_failed) {\n  // Raw Markdown fallback is in the data array\n  const raw = job.result.data[0]?.markdownContent\n  console.warn('AI extraction failed, using raw content')\n} else {\n  console.log(job.result.content)\n}\n```\n\n## Putting it all together: a complete pipeline\n\nA full example that uses `forEach`\n\nwith pagination to collect job listings from a directory, enforces a schema on the output, handles errors, and saves results to a JSONL file:\n\n``` js\nimport { SpidraClient, SpidraError, SpidraInsufficientCreditsError } from 'spidra-js'\nimport { writeFileSync } from 'fs'\nimport * as os from 'os'\n\nconst spidra = new SpidraClient({ apiKey: process.env.SPIDRA_API_KEY! })\n\nconst JOB_SCHEMA = {\n  type: 'object',\n  required: ['title', 'company', 'location'],\n  properties: {\n    title:           { type: 'string' },\n    company:         { type: 'string' },\n    location:        { type: ['string', 'null'] },\n    remote:          { type: ['boolean', 'null'] },\n    salary_min:      { type: ['number', 'null'] },\n    salary_max:      { type: ['number', 'null'] },\n    employment_type: {\n      type: ['string', 'null'],\n      enum: ['full_time', 'part_time', 'contract', null],\n    },\n  },\n}\n\nasync function collectListings(boardUrl: string) {\n  try {\n    const job = await spidra.scrape.run({\n      urls: [\n        {\n          url: boardUrl,\n          actions: [\n            { type: 'click', value: 'Accept cookies' },\n            {\n              type: 'forEach',\n              value: 'Find all job listing cards',\n              mode: 'navigate',\n              maxItems: 50,\n              itemPrompt: 'Extract job title, company, location, remote status, salary range, and employment type',\n              pagination: {\n                nextSelector: 'a.next-page',\n                maxPages: 3,\n              },\n            },\n          ],\n        },\n      ],\n      output: 'json',\n      schema: JOB_SCHEMA,\n    })\n\n    if (job.result.ai_extraction_failed) {\n      console.warn(`AI extraction failed for ${boardUrl}`)\n      return []\n    }\n\n    const content = job.result.content\n    return Array.isArray(content) ? content : [content]\n\n  } catch (err) {\n    if (err instanceof SpidraInsufficientCreditsError) {\n      throw err // bubble up — stop processing\n    }\n    if (err instanceof SpidraError) {\n      console.error(`Error scraping ${boardUrl}: ${err.message}`)\n      return []\n    }\n    throw err\n  }\n}\n\nconst boards = [\n  'https://jobs.example.com/engineering',\n  'https://careers.anothersite.com/remote',\n]\n\nconst allJobs: unknown[] = []\n\nfor (const board of boards) {\n  console.log(`Collecting from ${board}...`)\n  const listings = await collectListings(board)\n  allJobs.push(...listings)\n  console.log(`  Got ${listings.length} listings`)\n}\n\nconst jsonl = allJobs.map(job => JSON.stringify(job)).join(os.EOL)\nwriteFileSync('jobs.jsonl', jsonl)\n\nconsole.log(`\\nDone. ${allJobs.length} jobs saved to jobs.jsonl`)\n```\n\n## All scrape options\n\n| Option | Type | Description |\n|---|---|---|\n`urls` | array | Up to 3 URL objects. Each takes a `url` and optional `actions` . |\n`prompt` | string | What to extract, in plain English |\n`output` | string | `'markdown'` (default) or `'json'` |\n`schema` | object | JSON Schema for a guaranteed output shape |\n`useProxy` | boolean | Route through a residential proxy |\n`proxyCountry` | string | Two-letter country code or `'eu'` / `'global'` |\n`extractContentOnly` | boolean | Strip nav, ads, and boilerplate before extraction |\n`screenshot` | boolean | Capture a viewport screenshot |\n`fullPageScreenshot` | boolean | Capture a full-page screenshot |\n`cookies` | string | Raw `Cookie` header string for authenticated pages |\n\n## What to read next\n\n[Browser actions guide](https://docs.spidra.io/features/actions)covers every option for each action type including all`forEach`\n\nparameters[Structured output guide](https://docs.spidra.io/features/structured-output)covers schemas in depth including Zod integration and schema limits[Stealth mode guide](https://docs.spidra.io/features/stealth-mode)has the full country list and proxy options[Python SDK tutorial](https://claude.ai/blog/spidra-api-python-tutorial)if you are working in Python[Full API reference](https://claude.ai/blog/spidra-api-tutorial)if you want to use the REST API directly\n\nGet your API key at [app.spidra.io](https://app.spidra.io/). The free plan has 300 credits and no card required.", "url": "https://wpnews.pro/news/spidra-api-node-js-tutorial-scrape-any-website-with-javascript-and-typescript", "canonical_source": "https://spidra.io/blog/spidra-api-nodejs-tutorial", "published_at": "2026-06-12 00:00:00+00:00", "updated_at": "2026-06-12 09:09:54.944717+00:00", "lang": "en", "topics": ["ai-tools", "ai-products", "ai-infrastructure"], "entities": ["Spidra", "Node.js", "TypeScript", "Puppeteer", "Cloudflare", "SpidraClient", "spidra-js"], "alternates": {"html": "https://wpnews.pro/news/spidra-api-node-js-tutorial-scrape-any-website-with-javascript-and-typescript", "markdown": "https://wpnews.pro/news/spidra-api-node-js-tutorial-scrape-any-website-with-javascript-and-typescript.md", "text": "https://wpnews.pro/news/spidra-api-node-js-tutorial-scrape-any-website-with-javascript-and-typescript.txt", "jsonld": "https://wpnews.pro/news/spidra-api-node-js-tutorial-scrape-any-website-with-javascript-and-typescript.jsonld"}}