Hosting Postgres with GeoLite2: a practical guide to IP geolocation, data loading, and updates

To load MaxMind's free GeoLite2 IP geolocation database into PostgreSQL for SQL-based queries, enabling joins with other tables, batch operations, and data sharing across services. It provides a practical guide for container-based deployments, including an initialization script that downloads CSV files, creates tables, and builds indexes. The guide notes that while the binary MMDB format is simpler for individual IP lookups, loading data into Postgres is better for complex analytical use cases.

Hosting Postgres with GeoLite2: a practical guide to IP geolocation, data loading, and updates IP geolocation maps IP addresses to physical locations: countries, cities, coordinates. MaxMind's GeoLite2 is the standard free database for this, used by analytics platforms, content localization systems, fraud detection tools, and compliance workflows. You can query GeoLite2 through MaxMind's binary format MMDB or load it into Postgres for SQL access. This guide covers the Postgres approach: when it makes sense, how to deploy it, and how to keep the data fresh. Both approaches work. The right choice depends on how your application uses geolocation data. The binary format MMDB is optimized for fast, single-IP lookups. MaxMind provides client libraries for most languages that read the binary file directly. Lookups are fast sub-millisecond and the integration is straightforward: download the file, point your code at it, call a function. Loading into Postgres makes sense when you need to: Join geolocation data with other tables. If you're enriching user records, log entries, or analytics data with location information, doing it in SQL is often simpler than fetching each IP individually in application code. Run batch operations. Geolocating thousands or millions of IPs is more efficient as a single SQL query than thousands of individual library calls. Query the geolocation data itself. If you need to answer questions like "which IP ranges are in Germany?" or "how many networks map to this city?", SQL queries are the natural tool. Share data across services. Multiple applications can query the same Postgres database without each needing its own copy of the MMDB file. The tradeoff is operational complexity. You need to load the data, keep it updated, and manage Postgres. For simple use cases where you just need to look up individual IPs, the binary format is easier. Container-based deployments give you several options. The right one depends on whether you want the data loaded on first boot, baked into your image, or managed by a separate process. Postgres containers run scripts in /docker-entrypoint-initdb.d/ when the database initializes with an empty data directory. This is the simplest approach for getting started. Your init script downloads the GeoLite2 CSV files, creates tables, loads the data, and builds indexes: /bin/bash set -e Download GeoLite2 City data requires MaxMind license key curl -L -o /tmp/geolite2-city.zip \ "https://download.maxmind.com/app/geoip download?edition id=GeoLite2-City-CSV&license key=${MAXMIND LICENSE KEY}&suffix=zip" unzip /tmp/geolite2-city.zip -d /tmp/ Create tables and load data psql -v ON ERROR STOP=1 --username "$POSTGRES USER" --dbname "$POSTGRES DB" <<-EOSQL CREATE TABLE geoip network network cidr NOT NULL, geoname id int, registered country geoname id int, represented country geoname id int, is anonymous proxy bool, is satellite provider bool, postal code text, latitude numeric, longitude numeric, accuracy radius int, is anycast bool ; CREATE TABLE geoip location geoname id int NOT NULL, locale code text NOT NULL, continent code text, continent name text, country iso code text, country name text, subdivision 1 iso code text, subdivision 1 name text, subdivision 2 iso code text, subdivision 2 name text, city name text, metro code int, time zone text, is in european union bool, PRIMARY KEY geoname id, locale code ; \copy geoip location FROM PROGRAM 'cat /tmp/GeoLite2-City-CSV /GeoLite2-City-Locations-en.csv' WITH FORMAT CSV, HEADER ; \copy geoip network FROM PROGRAM 'cat /tmp/GeoLite2-City-CSV /GeoLite2-City-Blocks-IPv4.csv' WITH FORMAT CSV, HEADER ; \copy geoip network FROM PROGRAM 'cat /tmp/GeoLite2-City-CSV /GeoLite2-City-Blocks-IPv6.csv' WITH FORMAT CSV, HEADER ; CREATE INDEX idx geoip network ON geoip network USING gist network inet ops ; EOSQL Cleanup rm -rf /tmp/geolite2-city.zip /tmp/GeoLite2-City-CSV This runs once when the database initializes. Subsequent container restarts skip initialization because the data directory isn't empty. To use this with Railway, add the script to a custom Postgres template. The database initializes with GeoLite2 data on first deploy. If you want the data baked into your image for reproducible deployments, build a custom Postgres image: FROM postgres:16 COPY geolite2-city-blocks-ipv4.csv /docker-entrypoint-initdb.d/data/ COPY geolite2-city-blocks-ipv6.csv /docker-entrypoint-initdb.d/data/ COPY geolite2-city-locations-en.csv /docker-entrypoint-initdb.d/data/ COPY init-geolite2.sql /docker-entrypoint-initdb.d/ The init SQL script loads from local files instead of downloading. This approach guarantees the same data every time you deploy, but requires rebuilding the image whenever you want fresh data. For production workloads where you need regular updates, run data loading as a separate service. This decouples the database from the update pipeline. Deploy a service that runs on a schedule, downloads the latest data, and refreshes the database tables. Railway supports cron jobs for this pattern. import { execSync } from "child process"; import { createReadStream } from "fs"; import { pipeline } from "stream/promises"; import postgres from "postgres"; import { from as copyFrom } from "pg-copy-streams"; const DATABASE URL = process.env.DATABASE URL ; const MAXMIND LICENSE KEY = process.env.MAXMIND LICENSE KEY ; async function updateGeoLite2 { // Download latest data const downloadUrl = https://download.maxmind.com/app/geoip download?edition id=GeoLite2-City-CSV&license key=${MAXMIND LICENSE KEY}&suffix=zip ; execSync curl -L -o /tmp/geolite2.zip "${downloadUrl}" ; execSync "unzip -o /tmp/geolite2.zip -d /tmp/" ; const sql = postgres DATABASE URL ; // Load into temporary tables, then swap await sql TRUNCATE geoip network, geoip location ; // Use COPY for fast bulk loading await sql COPY geoip location FROM PROGRAM 'cat /tmp/GeoLite2-City-CSV /GeoLite2-City-Locations-en.csv' WITH FORMAT CSV, HEADER ; await sql COPY geoip network FROM PROGRAM 'cat /tmp/GeoLite2-City-CSV /GeoLite2-City-Blocks-IPv4.csv' WITH FORMAT CSV, HEADER ; await sql COPY geoip network FROM PROGRAM 'cat /tmp/GeoLite2-City-CSV /GeoLite2-City-Blocks-IPv6.csv' WITH FORMAT CSV, HEADER ; await sql.end ; console.log "GeoLite2 update complete" ; } updateGeoLite2 ; This keeps your main Postgres deployment simple while ensuring data stays current. Once the data is loaded, IP lookups use Postgres's built-in network operators. The = operator checks if a network contains an IP address: SELECT l.country name, l.city name, n.latitude, n.longitude, n.accuracy radius FROM geoip network n JOIN geoip location l ON n.geoname id = l.geoname id WHERE n.network = '8.8.8.8'::inet AND l.locale code = 'en'; With the GiST index on the network column, this query runs in under 10ms even with millions of network ranges. For batch operations, join your data directly: SELECT logs.ip address, logs.timestamp, l.country iso code, l.city name FROM access logs logs JOIN geoip network n ON n.network = logs.ip address::inet JOIN geoip location l ON n.geoname id = l.geoname id WHERE l.locale code = 'en' AND logs.timestamp NOW - INTERVAL '1 day'; This geolocates all IPs in a single query rather than making individual lookups. MaxMind updates the GeoLite2 databases weekly, every Tuesday. IP allocations change as ISPs acquire new ranges and reassign existing ones. Stale data means incorrect geolocation for some percentage of lookups. How much accuracy matters depends on your use case: Analytics and reporting can tolerate weekly or even monthly updates. A small percentage of IPs geolocating incorrectly doesn't significantly affect aggregate statistics. Content localization benefits from fresher data but isn't critically dependent on it. Showing the wrong language or currency to a small percentage of users is suboptimal but not catastrophic. Compliance and fraud detection may require more frequent updates. If you're blocking traffic from specific countries or flagging suspicious locations, stale data creates both false positives and false negatives. For most applications, updating weekly matching MaxMind's release cadence is sufficient. Set up a cron job or scheduled service that runs every Tuesday or Wednesday. A simple update strategy: - Download the new CSV files - Load into temporary tables - Swap the tables in a transaction - Drop the old tables BEGIN; -- Rename current tables ALTER TABLE geoip network RENAME TO geoip network old; ALTER TABLE geoip location RENAME TO geoip location old; -- Rename new tables ALTER TABLE geoip network new RENAME TO geoip network; ALTER TABLE geoip location new RENAME TO geoip location; COMMIT; -- Clean up outside transaction DROP TABLE geoip network old; DROP TABLE geoip location old; This approach minimizes downtime. The swap happens in a transaction, so queries see either the old data or the new data, never a partial state. The storage requirements are modest: If you're loading all three databases with both IPv4 and IPv6 data, expect roughly 500-600 MB of storage. This is small enough that storage scaling isn't a significant concern. The more relevant operational question is how your hosting provider handles storage growth and pricing. Provisioned storage requires choosing a disk size upfront. You're paying for the full amount whether you use it or not, and resizing may require downtime. Usage-based storage charges for what you actually use. If your database uses 1 GB including GeoLite2 data, you pay for 1 GB. Railway uses the usage-based model. Volumes grow as your data grows, and you pay for actual consumption. For GeoLite2 specifically, this means you don't need to guess how much space to provision. The GiST index on the network column is essential for performance. Without it, every lookup scans the entire table millions of rows for IPv4 + IPv6 . With it, lookups are sub-10ms. CREATE INDEX idx geoip network ON geoip network USING gist network inet ops ; A few things affect performance at scale: Index must fit in memory. The GiST index for GeoLite2 City is roughly 200 MB. If this exceeds your available shared buffers , queries slow down. For most deployments, this isn't an issue. Batch queries are more efficient than individual lookups. If you need to geolocate 10,000 IPs, do it in one query with a JOIN rather than 10,000 individual queries. Consider caching for hot paths. If the same IPs are looked up repeatedly common in web applications , cache the results in Redis or application memory. GeoLite2 data changes weekly, so cached results stay valid for days. For most applications, a single Postgres instance handles GeoLite2 queries without performance issues. The workload is read-heavy and the data fits comfortably in memory. Railway runs Postgres as a containerized service with persistent storage. For GeoLite2 specifically, this means: Flexible deployment options. Use initialization scripts, custom Docker images, or separate data-loading services. Railway supports all three approaches. Usage-based storage. Pay for what you use. GeoLite2 data adds roughly 500 MB to your database, and you pay for that incrementally. Cron support for updates. Deploy a service with a cron trigger that runs weekly to refresh your GeoLite2 data. Railway handles the scheduling. Private networking. Your application connects to Postgres over a private network. The database isn't exposed to the public internet. Automated update pipeline. Railway provides the infrastructure cron triggers, private networking , but you write the update script that downloads new data and refreshes the tables. Monitoring for data freshness. Set up alerts if your update job fails. Stale GeoLite2 data degrades accuracy silently. Backup verification. Railway provides scheduled backups, but verify your backups include the GeoLite2 tables and can be restored successfully. Deploying Postgres with GeoLite2 on Railway: - Create a Ra