{"slug": "how-this-site-is-built", "title": "How This Site Is Built", "summary": "A developer built a static site using Astro, hosted on S3 with CloudFront, and DNS through Cloudflare, achieving a total cost under $5/month. The setup includes a 15-line CloudFront Function to fix subdirectory routing, an auto-generated `llms-full.txt` file for AI agents, and immutable caching for hashed assets. The infrastructure avoids managed platforms entirely, with secrets resolved at CI runtime via 1Password CLI.", "body_md": "`.ai`\n\ndomain), email via SES + Lambda forward — total cost under $5/month with no managed platform dependency.`DefaultRootObject`\n\nonly applies to `/`\n\n, not subdirectories — fix is a 15-line CloudFront Function that appends `index.html`\n\nbefore S3 sees the request.`llms-full.txt`\n\nauto-generated at build time concatenates every post into one Markdown file — any agent gets the complete site corpus in a single HTTP GET, zero crawling required.`max-age=31536000,immutable`\n\n; HTML and `llms*.txt`\n\nget no-cache — because Astro hashes asset filenames but not HTML.`load-secrets-action`\n\nresolves `op://`\n\nreferences at CI runtime; raw keys stay out of git entirely.I spent a day building this site from scratch. Not because there aren't easier options — there are plenty — but because I wanted to own the infrastructure and understand every layer. Here's what I built and why.\n\nI didn't want a managed platform. Ghost, Squarespace, Substack — they're all fine until they're not. Pricing changes. Features get enshittified. Export formats break. The moment you depend on someone else's persistence layer for your writing, you're renting, not owning.\n\nThe goal: my content in plain markdown files, my infrastructure, my control. Total cost under $5/month.\n\n**Static site generator: Astro**\n\nAstro compiles everything to static HTML at build time. No server, no runtime, no database. The site is just files. I used the Astro Paper theme as a base — dark mode default, clean typography, built-in search.\n\nThe build script does three things: generates `llms-full.txt`\n\n(more on that below), runs the Astro build, then generates the search index with Pagefind. One command.\n\n**Hosting: S3 + CloudFront**\n\nThe built files go into an S3 bucket with public access completely disabled. CloudFront sits in front of it using Origin Access Control — only CloudFront can read from S3, nothing else. ACM handles the SSL cert.\n\n**DNS: Cloudflare**\n\nDNS is on Cloudflare, not Route 53. Route 53 charges $129/year for a `.ai`\n\ndomain; Cloudflare charges $80. That's the entire reason for the split. The apex domain and www point at the CloudFront distribution via Cloudflare's DNS — the hosting layer doesn't change. Cloudflare's token model also makes it easy to give external tools (MCP servers, automation) scoped API access without handing over full account credentials.\n\nResult: HTTPS enforced, HTTP redirects automatically, global CDN edge caching, and the bucket itself is locked down.\n\nOne gotcha I hit immediately: every post URL returned a 404. The files were in S3, the deploy worked fine — but `/posts/my-post/`\n\nreturned nothing.\n\nThe problem is subtle. Astro (like most static site generators) builds each post as `/posts/my-post/index.html`\n\n— an `index.html`\n\nfile inside a subdirectory. S3 doesn't resolve directories to index files. CloudFront's `DefaultRootObject`\n\nsetting only applies to the apex `/`\n\n— it doesn't cascade to subdirectories. So when CloudFront asked S3 for `/posts/my-post/`\n\n, S3 found no object at that exact key, returned a 403, and CloudFront served the 404 page.\n\nThe fix is a CloudFront Function — a small JavaScript function that runs at the CDN edge on every incoming request before it reaches S3. It checks the URI: if it ends with `/`\n\n, append `index.html`\n\n. If it has no file extension, append `/index.html`\n\n. That's it — about 15 lines of code.\n\n``` js\nfunction handler(event) {\n  var request = event.request;\n  var uri = request.uri;\n\n  if (uri.endsWith('/')) {\n    request.uri += 'index.html';\n  } else if (!uri.includes('.')) {\n    request.uri += '/index.html';\n  }\n\n  return request;\n}\n```\n\nAttach it to the distribution as a viewer-request handler and every URL on the site resolves correctly. This is a one-time infrastructure fix — not something you repeat per deploy.\n\nIf you're building a static site on S3 + CloudFront with OAC, add this function before you go live. It's the kind of thing that works fine locally (dev servers handle it automatically) and breaks silently in production.\n\n**Email: SES + Lambda**\n\n(Note: email forwarding infra was set up for the domain, but the address is not currently listed as a public contact method — use X or LinkedIn instead. The setup details below are kept for the build history.)\n\nI wanted `info@artificialcuriositylabs.ai`\n\nto work as a real email address without running a mail server. The setup:\n\nIt took about an hour to wire up. DKIM CNAME records and MX record in Cloudflare DNS, pointing at SES inbound. The Lambda function is about 80 lines of Node.js. Works exactly like having an inbox without actually having one.\n\nThere's an emerging standard for AI-readable content — `llms.txt`\n\nas an index file, similar to `robots.txt`\n\n, that tells AI crawlers what's on the site and where. I added two files:\n\n`/llms.txt`\n\n— a curated index of pages, topics, and permissions`/llms-full.txt`\n\n— every blog post concatenated into a single file, auto-generated at build timeThe second one is the interesting one. Any AI system that fetches `llms-full.txt`\n\ngets the complete text of everything I've published, in one request, structured and clean. It's a better interface for AI consumption than crawling individual HTML pages.\n\nThe generator script reads from the blog content directory, strips frontmatter, and concatenates everything with headers separating posts. Runs in under a second as part of the normal build.\n\nI don't know exactly how this will get used — but making the content machine-readable is a zero-cost decision with asymmetric upside.\n\n**Production deploy is GitHub Actions.** Push to `main`\n\ntriggers `.github/workflows/deploy.yml`\n\nwhen content or build inputs change (`src/**`\n\n, `public/**`\n\n, `scripts/**`\n\n, `astro.config.ts`\n\n, `package.json`\n\n). The workflow:\n\n`load-secrets-action`\n\n`op://`\n\nreferences stored in GitHub secrets (not raw keys in the repo)`npm run build`\n\n`llms-full.txt`\n\n, runs the Astro build, generates the Pagefind search index`max-age=31536000,immutable`\n\n); HTML, XML, and `llms*.txt`\n\nget no-cache headers so browsers always fetch the latest`/*`\n\nto flush the CDN edge cache`src/data/blog/*.md`\n\ncopied to an S3 `blog/`\n\nprefix (source for a Bedrock knowledge base)The two-pass sync matters because the cache strategy is different per file type. Static assets are content-addressed (Astro hashes filenames), so they can be cached indefinitely. HTML is not — you want readers to see the new post immediately.\n\n**Local deploy is the same pipeline, different credential path.** `./deploy.sh`\n\nruns the build and S3/CloudFront steps on your machine. It requires three environment variables — `AWS_PROFILE`\n\n, `S3_BUCKET`\n\n, `CLOUDFRONT_DISTRIBUTION_ID`\n\n— and uses your local AWS CLI profile instead of 1Password-in-CI. Use it when you want to deploy without pushing to `main`\n\n, or when debugging a failed Actions run.\n\nThe split between Cloudflare DNS and CloudFront hosting means Cloudflare never touches the content. DNS resolves to the CloudFront distribution, CloudFront pulls from S3, and deploy invalidates CloudFront directly. Cloudflare sees none of this — it just points the domain at the right IP.\n\n**What is not in the repo:** bucket name, distribution ID, and 1Password item references live in GitHub repository variables and secrets. The workflow file names what must exist; the values stay out of git.\n\nStatic sites feel like going backward until you realize what you're trading away: runtime complexity, database dependencies, server costs, someone else's uptime SLA. The question isn't \"why would you use a static site in 2026\" — it's \"why would you add a server if you don't need one?\"\n\n**2026-05-05 — Day one**\n\n`artificialcuriositylabs.dev`\n\nvia Route 53 ($17/year).`llms.txt`\n\n+ `llms-full.txt`\n\n(AI-readable layer, build-time generated).**Subsequent updates (captured in posts and deploys)**\n\n`./deploy.sh`\n\nparity with CI.The full change history lives in the repo. The dedicated Build Log page has been retired — its value is now distributed across this post, the blog posts themselves (many tagged build-log), the homepage \"Now\" cards, and the agent-infra work documented elsewhere.", "url": "https://wpnews.pro/news/how-this-site-is-built", "canonical_source": "https://dev.to/amitrix/how-this-site-is-built-3bhj", "published_at": "2026-06-06 07:05:19+00:00", "updated_at": "2026-06-06 07:12:00.474392+00:00", "lang": "en", "topics": ["ai-infrastructure", "ai-tools", "ai-products"], "entities": ["Astro", "CloudFront", "SES", "Lambda", "Ghost", "Squarespace", "Substack", "Amazon Web Services"], "alternates": {"html": "https://wpnews.pro/news/how-this-site-is-built", "markdown": "https://wpnews.pro/news/how-this-site-is-built.md", "text": "https://wpnews.pro/news/how-this-site-is-built.txt", "jsonld": "https://wpnews.pro/news/how-this-site-is-built.jsonld"}}