{"slug": "open-source-tech-jobs-portal-and-database", "title": "Open Source tech jobs portal and database", "summary": "Caio, an open-source tech jobs platform, launched a public job index and search portal at caio-jobs.com to reduce the manual burden of job hunting for software professionals. The project, currently a searchable database with a Phoenix/Elixir web app and Ruby on Rails crawler, plans to evolve into a supervised job-search agent that automatically finds relevant roles, tailors application materials, and tracks outcomes. The early-stage repository prioritizes a working crawler and simple deployment over polished architecture, using SQLite for speed while candidates remain responsible for filtering and applying.", "body_md": "Caio is an open-source attempt to make job hunting less manual for software professionals.\n\nThe current product is a public tech-job index. That is useful on its own, but it is not the end goal. The job board is the data layer and acquisition surface for a larger product: a supervised job-search agent that can continuously find relevant roles, adapt application material, and help a candidate apply without spending hours repeating the same search/forms/CV-tweaking loop.\n\nThis repo is early, practical, and intentionally boring in places. It favors a working crawler, simple deploys, server-rendered pages, and observable user flows over a polished distributed architecture.\n\nLive site: [caio-jobs.com](https://caio-jobs.com)\n\nJob boards mostly move the filtering burden onto candidates. The painful part is not only finding jobs; it is repeatedly deciding whether a role is relevant, editing the same CV for each posting, filling forms, tracking what happened, and doing it again tomorrow.\n\nCaio starts with the searchable job corpus because the agent needs one. From there, the useful product becomes:\n\n- Keep finding fresh jobs that match a candidate profile.\n- Explain why a job is or is not a good fit.\n- Tailor CV/application material to the role.\n- Track applications and outcomes.\n- Let the user supervise the workflow instead of manually doing every step.\n\nCaio is a monorepo with two main apps:\n\n``` php\npublic job sources -> crawler workers -> SQLite job_posts -> Phoenix portal\n                                      -> leads + job_interests tracking\n```\n\n`crawler/`\n\n: Ruby on Rails plus Sidekiq workers for collecting, normalizing, deduplicating, and storing public job postings.`portal/`\n\n: Phoenix/Elixir web app for the public search experience, profile unlock flow, GitHub login, analytics, and apply-click tracking.`deploy/`\n\n: production deployment scripts and systemd units for a single Google Cloud VM.`marketing/`\n\n: launch copy and social-post drafts.\n\nThe current production-friendly setup intentionally uses SQLite. That is not a claim that SQLite is the final architecture; it is just the fastest path to a small, understandable system while the product is still being shaped. The natural next step is Postgres plus separate crawler/web machines.\n\n- Shared SQLite database between Rails ingestion and Phoenix serving.\n- SQLite FTS5 index maintained by Phoenix migrations and triggers.\n- Rails/Sidekiq crawler split into source fanout, fetch, detail, and write queues.\n- Normalization for salary, location, source keys, canonical URLs, and job quality.\n- Server-rendered Phoenix UI with minimal JavaScript.\n- GitHub OAuth and email unlock flow feeding a simple\n`leads`\n\ntable. - PostHog events for search, unlock, login, job detail views, and apply clicks.\n- Single-VM production deployment with systemd, Caddy, Redis, and SQLite backups.\n\n- Public landing page with SEO and social sharing metadata.\n- Full-text search across title, company, location, tags, category, and description.\n- Guest preview with a free unlock flow.\n- GitHub OAuth login.\n- Lead/profile capture with email, optional LinkedIn URL, target role, target location, and job-help consent.\n- Apply-click tracking before redirecting users to the original job source.\n- Company stats based on the number of visible open jobs in Caio.\n- PostHog analytics hooks for page views, unlocks, GitHub login, and apply clicks.\n\n- Some crawler paths still reprocess old pages instead of storing complete cursor state per paged source.\n- Import metrics currently blur inserts and updates in some paths.\n- SQLite is acceptable for this stage, but it will need a more deliberate data architecture as write volume grows.\n- The agent layer is not here yet; today this is the search/indexing foundation.\n- Source adapters need ongoing maintenance because public job endpoints change, rate-limit, or disappear.\n\n```\n.\n├── bin/                  # Local orchestration helpers\n├── crawler/              # Rails + Sidekiq ingestion system\n├── deploy/google-cloud/  # VM bootstrap, Caddy, systemd, backup docs\n├── marketing/            # Launch assets and copy\n└── portal/               # Phoenix web interface\n```\n\n- Ruby with Bundler\n- Redis for Sidekiq\n- Elixir/Erlang, preferably via\n`.tool-versions`\n\nand`mise`\n\n- SQLite with FTS5 support\n- Docker, if you use the local stack helper\n\nFrom the repository root:\n\n```\ncp .env.example .env\nbin/run_local_stack --restart\n```\n\nThis starts:\n\n- Docker Redis as\n`caio-redis`\n\n- Sidekiq writer/fetch/source workers\n- Rails Sidekiq UI at\n`http://localhost:3001/sidekiq`\n\n- Phoenix portal at\n`http://localhost:4000`\n\nYou can also start pieces independently:\n\n```\nbin/run_local_stack portal\nbin/run_local_stack sidekiq-web\nbin/run_local_stack workers\n```\n\nIf Redis is loading a large persisted queue, increase the startup wait:\n\n```\nREDIS_READY_TIMEOUT=900 bin/run_local_stack --restart\n```\n\nRun crawler setup:\n\n```\ncd crawler\nbundle install\nbin/rails db:migrate\nbundle exec sidekiq -C config/sidekiq_sources.yml\n```\n\nRun the portal:\n\n```\ncd portal\nmix setup\nmix ecto.migrate\nmix phx.server\n```\n\nOpen:\n\n```\nhttp://127.0.0.1:4000\n```\n\nIn development, the portal reads the crawler database at:\n\n```\ncrawler/db/development.sqlite3\n```\n\nUse `.env.example`\n\nas the local template. Do not commit real secrets.\n\nCommon local variables:\n\n```\nGITHUB_CLIENT_ID=\nGITHUB_CLIENT_SECRET=\nGITHUB_REDIRECT_URI=http://localhost:4000/auth/github/callback\n\nPOSTHOG_ENABLED=false\nPOSTHOG_PUBLIC_KEY=\nPOSTHOG_HOST=https://us.i.posthog.com\nPOSTHOG_SESSION_REPLAY=true\n```\n\nImportant production variables:\n\n```\nPHX_HOST=caio-jobs.com\nSECRET_KEY_BASE=...\nDATABASE_PATH=/var/lib/caio/caio.sqlite3\nJOB_CRAWLER_DATABASE=/var/lib/caio/caio.sqlite3\nGITHUB_REDIRECT_URI=https://caio-jobs.com/auth/github/callback\n```\n\nPortal:\n\n```\ncd portal\nmix compile\nmix test\nmix format\nmix assets.deploy\nMIX_ENV=prod mix release --overwrite\n```\n\nCrawler:\n\n```\ncd crawler\nbundle exec rails db:migrate\nbundle exec sidekiq -C config/sidekiq_fetch.yml\nbundle exec sidekiq -C config/sidekiq_writer.yml\nbundle exec sidekiq -C config/sidekiq_sources.yml\n```\n\nQueue inspection:\n\n```\nredis-cli LLEN queue:source_fetchers\nredis-cli LLEN queue:linkedin_pages\nredis-cli LLEN queue:job_writes\nredis-cli ZCARD retry\nredis-cli ZCARD dead\n```\n\nThe current deployment path is a single Google Cloud VM running:\n\n- Phoenix release\n- Rails/Sidekiq crawler workers\n- Redis\n- Caddy\n- SQLite database on persistent disk\n\nSee [deploy/google-cloud/README.md](/danicuki/caio/blob/main/deploy/google-cloud/README.md) for the full\nVM bootstrap, systemd, Caddy, release, and backup workflow.\n\nThe short deploy loop after pulling changes is:\n\n```\ncd /srv/caio/crawler\nbundle install\nRAILS_ENV=production JOB_CRAWLER_DATABASE=/var/lib/caio/caio.sqlite3 bundle exec rails db:migrate\n\ncd /srv/caio/portal\nmix deps.get --only prod\nMIX_ENV=prod mix assets.deploy\nMIX_ENV=prod DATABASE_PATH=/var/lib/caio/caio.sqlite3 mix ecto.migrate\nMIX_ENV=prod mix release --overwrite\n\nsudo systemctl restart caio-portal caio-sidekiq-writer caio-sidekiq-fetch caio-sidekiq-sources\n```\n\nGenerated data stays out of git:\n\n- SQLite databases and WAL/SHM files\n- Redis dumps\n- logs\n- Phoenix\n`_build`\n\n,`deps`\n\n, and compiled assets - generated crawler indexes and large crawl artifacts\n\nCommit source code, migrations, small config data, docs, and launch assets.\n\n- Never commit OAuth secrets, PostHog keys, production database files, or backups.\n- Keep user contact collection explicit and transparent.\n- The analytics wrapper strips sensitive property names such as email, token, and secret before sending server-side events.\n- Apply clicks are tracked in\n`job_interests`\n\nbefore redirecting to the original job source.\n\n- Add stateful crawler cursors for every paged source so production resumes from known progress instead of reprocessing old pages.\n- Split crawler import metrics into inserted vs updated counts.\n- Move from SQLite to Postgres when write volume or operational needs require it.\n- Add company profile enrichment, including async external reputation data where allowed.\n- Build the job-agent layer: saved profiles, tailored application material, job matching, and supervised automated application workflows.\n\nNo license has been added yet. Until a license is present, all rights are reserved by the repository owner.", "url": "https://wpnews.pro/news/open-source-tech-jobs-portal-and-database", "canonical_source": "https://github.com/danicuki/caio", "published_at": "2026-05-26 14:47:46+00:00", "updated_at": "2026-05-26 15:09:37.894553+00:00", "lang": "en", "topics": ["ai-agents", "ai-tools", "ai-products", "artificial-intelligence", "natural-language-processing"], "entities": ["Caio", "Caio Jobs"], "alternates": {"html": "https://wpnews.pro/news/open-source-tech-jobs-portal-and-database", "markdown": "https://wpnews.pro/news/open-source-tech-jobs-portal-and-database.md", "text": "https://wpnews.pro/news/open-source-tech-jobs-portal-and-database.txt", "jsonld": "https://wpnews.pro/news/open-source-tech-jobs-portal-and-database.jsonld"}}