{"slug": "database-migration-strategies-that-actually-work-in-production", "title": "Database Migration Strategies That Actually Work in Production", "summary": "This article provides practical strategies for performing database migrations safely in production environments, emphasizing that standard development approaches can lock tables and cause downtime on large datasets. It recommends breaking migrations into small, reversible steps—such as adding nullable columns, backfilling data in batches, and adding constraints later—and using concurrent index creation to avoid table locks. The article also stresses the importance of testing on production-sized data, documenting rollback plans, setting lock timeouts, and treating schema changes like code with proper version control and deployment procedures.", "body_md": "# Database Migration Strategies That Actually Work in Production\n\nDatabase migrations are the thing that looks simple until you're凌晨3点 trying to recover from a migration that locked your production table for 45 minutes. After running migrations on systems with billions of rows, here's what I've learned about doing them safely.\n\n## The Fundamental Problem\n\nMost migration tutorials show you this:\n\n```\nclass AddIndexesToOrders < ActiveRecord::Migration[7.0]\n  def change\n    add_index :orders, :user_id\n  end\nend\n```\n\nThis is fine in development. In production with 50 million orders, this will lock your table and bring down your application.\n\n## The Core Principle: Small Changes, Applied Incrementally\n\nEvery production-safe migration follows the same pattern: **make the change in small, non-breaking steps that can be rolled back independently.**\n\n### Expanding-Contract Pattern\n\nInstead of one big migration, use three:\n\n**Migration 1: Add new column (nullable)**\n\n```\n-- Migration 1: Safe - never locks\nALTER TABLE orders ADD COLUMN user_email VARCHAR(255);\n\n-- Update application to write BOTH old and new columns\n-- Deploy this first\n```\n\n**Migration 2: Backfill data (in batches)**\n\n``` bash\n-- Safe batched backfill\nDO $$\nDECLARE\n  batch_size INT := 10000;\n  offset_val INT := 0;\n  max_id INT;\n  updated INT;\nBEGIN\n  SELECT MAX(id) INTO max_id FROM orders;\n\n  LOOP\n    UPDATE orders \n    SET user_email = (SELECT email FROM users WHERE users.id = orders.user_id)\n    WHERE id IN (\n      SELECT id FROM orders \n      WHERE user_email IS NULL \n      AND id <= max_id\n      ORDER BY id \n      LIMIT batch_size\n    );\n\n    GET DIAGNOSTICS updated = ROW_COUNT;\n    EXIT WHEN updated = 0;\n\n    -- Prevent lock contention\n    PERFORM pg_sleep(0.1);\n  END LOOP;\nEND $$;\n```\n\n**Migration 3: Add NOT NULL constraint**\n\n```\n-- Now safe because all rows have values\nALTER TABLE orders ALTER COLUMN user_email SET NOT NULL;\n```\n\n## Handling Long-Running Migrations\n\n### The Lock Timeout Strategy\n\n```\n-- Set a short lock timeout so migration fails fast instead of hanging\nSET lock_timeout = '2s';\n\n-- Migration that might need a lock\nALTER TABLE orders ADD COLUMN status VARCHAR(50);\n\n-- If it can't get lock in 2s, it fails immediately\n-- Instead of blocking for minutes\n```\n\n### Concurrent Index Building\n\nNever use `CREATE INDEX`\n\nin production. Always use `CREATE INDEX CONCURRENTLY`\n\n.\n\n```\n-- BAD: Locks table, blocks reads/writes\nCREATE INDEX idx_orders_user_id ON orders(user_id);\n\n-- GOOD: Runs without locking, takes longer but zero downtime\nCREATE INDEX CONCURRENTLY idx_orders_user_id ON orders(user_id);\n```\n\n**Critical note:** `CREATE INDEX CONCURRENTLY`\n\ncannot run inside a transaction. Your migration framework needs to handle this.\n\n```\n# Rails: Tell it to run outside a transaction\nclass AddIndexesToOrders < ActiveRecord::Migration[7.0]\n  disable_ddl_transaction!\n\n  def change\n    add_index :orders, :user_id, algorithm: :concurrently\n  end\nend\n```\n\n## Schema Versioning: The Branching Model\n\nFor complex systems, treat database schema like code with proper branching:\n\n```\nmain (production schema)\n  └── staging-test (validate migrations)\n        └── feature/user-email-migration (your change)\n# Before starting a migration\ngit checkout main\ngit pull\ngit checkout -b migration/user-email-fix\n\n# Run migrations locally against fresh production copy\n# Once validated:\ngit checkout main\ngit merge migration/user-email-fix\n# Deploy migration to production\n```\n\n## The Pre-Migration Checklist\n\n```\nBefore ANY production migration:\n\n□ Tested on production-size dataset (at minimum on staging with production data snapshot)\n□ Lock duration estimated (use EXPLAIN ANALYZE)\n□ Rollback plan documented\n□ Canary/deploy step prepared (migrate 1% of traffic, observe, then full deploy)\n□ Alert thresholds set (if migration causes >X% error rate, auto-rollback)\n□ Migrations scheduled during low-traffic window\n□ On-call engineer aware and standing by\n□ Database backup verified (point-in-time recovery tested)\n□ Lock timeout set appropriately\n□ Query plan examined for full table scans\n```\n\n## Real-World Example: Renaming a Column Safely\n\nRenaming a column is a four-migration process:\n\n**Migration 1: Add new column (double-write starts)**\n\n```\nALTER TABLE users ADD COLUMN display_name VARCHAR(100);\n# Update application to WRITE to both columns\n# User.where(name: 'John').update(display_name: 'John') \n# runs in background\n```\n\n**Migration 2: Backfill**\n\n```\nUPDATE users SET display_name = name WHERE display_name IS NULL;\n-- In batches of 10,000 with 0.1s sleep\n```\n\n**Migration 3: Stop reading from old column**\n\n```\n# Deploy code that only reads from display_name\n# Verify everything works\n```\n\n**Migration 4: Drop old column**\n\n```\nALTER TABLE users DROP COLUMN name;\n-- Must run outside transaction for PostgreSQL\n```\n\n## PostgreSQL-Specific Tools\n\n### pg_repack: Remove Bloat Without Table Locks\n\n```\n# Install\nCREATE EXTENSION pg_repack;\n\n# Repack a bloated table without locking\npg_repack -d mydb -t orders --no-indexes\n\n# Repack with specific index\npg_repack -d mydb -t orders -i idx_orders_user_id\n```\n\n### pg_activity: Monitor Migration Progress\n\n```\n# Watch active queries during migration\npg_activity -h localhost -U postgres\n\n# Or query directly\nSELECT pid, state, query, query_start, now() - query_start AS duration\nFROM pg_stat_activity\nWHERE state != 'idle'\nORDER BY duration DESC;\n```\n\n### The Migration Locking Hierarchy\n\nUnderstanding lock modes prevents surprises:\n\n| Lock Mode | Blocks |\n|---|---|\n| Access Share | DROP TABLE, TRUNCATE |\n| Row Share | DELETE, UPDATE, SELECT FOR UPDATE |\n| Row Exclusive | INSERT, UPDATE, DELETE |\n| Share Update Exclusive | ANALYZE, CREATE INDEX CONCURRENTLY |\n| Share | CREATE INDEX (blocking) |\n| Share Row Exclusive | ALTER TABLE |\n| Exclusive | REFRESH MATERIALIZED VIEW CONCURRENTLY |\n| Access Exclusive | DROP TABLE, TRUNCATE, most ALTER TABLE |\n\n## The Golden Rule\n\n**If your migration takes more than 100ms on a production table, it's wrong.** Go back and break it into smaller pieces.\n\nThe goal is always: zero downtime, zero data loss, instant rollback capability.\n\n*What migration horror stories do you have? What's your go-to strategy for risky migrations?*", "url": "https://wpnews.pro/news/database-migration-strategies-that-actually-work-in-production", "canonical_source": "https://dev.to/zny10289/database-migration-strategies-that-actually-work-in-production-4a8", "published_at": "2026-05-23 20:21:52+00:00", "updated_at": "2026-05-23 20:32:20.288433+00:00", "lang": "en", "topics": ["data", "developer-tools", "enterprise-software"], "entities": ["ActiveRecord", "PostgreSQL"], "alternates": {"html": "https://wpnews.pro/news/database-migration-strategies-that-actually-work-in-production", "markdown": "https://wpnews.pro/news/database-migration-strategies-that-actually-work-in-production.md", "text": "https://wpnews.pro/news/database-migration-strategies-that-actually-work-in-production.txt", "jsonld": "https://wpnews.pro/news/database-migration-strategies-that-actually-work-in-production.jsonld"}}