{"slug": "building-an-african-economic-data-pipeline-with-python-duckdb-world-bank-api", "title": "Building an African Economic Data Pipeline with Python, DuckDB & World Bank API", "summary": "The article describes the AfriData Pipeline, a production-grade ETL system built with Python, DuckDB, and the free World Bank API that extracts and processes economic data for all 54 African countries. The pipeline handles 13,500 data points in under 50 seconds, computes year-over-year changes, and includes a data quality framework that scored 95.8/100. The system exports data to JSON files for a static interactive dashboard and runs automatically via a daily GitHub Actions workflow.", "body_md": "Every data engineer knows the struggle: finding a project that's both technically impressive and genuinely useful. Today I'll walk you through **AfriData Pipeline** — a production-grade ETL system that extracts economic data for all 54 African countries, loads it into a DuckDB analytical warehouse, and serves an interactive dashboard.\n\nNo paid APIs. No cloud services required. Just Python, DuckDB, and free public data.\n\n## Why This Project?\n\nAfrica's economy is growing fast, but finding clean, consolidated economic data is surprisingly hard. The World Bank has an amazing free API with 16,000+ indicators — but raw API responses need serious engineering to become useful.\n\nThis project demonstrates:\n\n-\n**ETL pipeline design** with proper error handling and retries -\n**Dimensional modeling**(star schema) in DuckDB -\n**Data quality engineering**— automated checks for completeness, validity, and freshness -\n**Full-stack delivery**— from raw API to interactive dashboard\n\n## Architecture Overview\n\n```\nWorld Bank API v2 → Extract (httpx) → Transform (Python) → Load (DuckDB)\n                                                               ↓\n                                            Export JSON → Static Dashboard (Vercel)\n```\n\nThe pipeline processes **13,500 data points** (54 countries × 10 indicators × 25 years) in under 50 seconds.\n\n## The Data: 10 Key Indicators\n\nI selected indicators that tell a comprehensive economic story:\n\n| Indicator | Category | Why It Matters |\n|---|---|---|\n| GDP (US$) | Economy | Total economic output |\n| GDP Growth (%) | Economy | Economic momentum |\n| Population | Demographics | Scale context |\n| Inflation (CPI) | Economy | Cost of living pressure |\n| Unemployment | Labor | Job market health |\n| Life Expectancy | Health | Quality of life proxy |\n| Internet Users (%) | Technology | Digital readiness |\n| Electricity Access (%) | Infrastructure | Development foundation |\n| Literacy Rate (%) | Education | Human capital |\n| FDI Inflows (% GDP) | Investment | External confidence |\n\n## Building the Extract Layer\n\nThe World Bank API v2 is beautifully simple — no auth required, JSON responses, and you can batch multiple countries in one request:\n\n``` python\nimport httpx\nimport time\n\nWB_BASE = \"https://api.worldbank.org/v2\"\nMAX_RETRIES = 3\n\ndef extract_indicator(client: httpx.Client, indicator_code: str, \n                      country_codes: str) -> list[dict]:\n    url = (f\"{WB_BASE}/country/{country_codes}/indicator/{indicator_code}\"\n           f\"?format=json&date=2000:2024&per_page=10000\")\n\n    for attempt in range(MAX_RETRIES):\n        try:\n            resp = client.get(url, timeout=60)\n            resp.raise_for_status()\n            data = resp.json()\n            # World Bank returns [metadata, records]\n            if isinstance(data, list) and len(data) == 2:\n                return data[1] or []\n        except (httpx.HTTPStatusError, httpx.ReadTimeout) as e:\n            delay = 2 * (2 ** attempt)\n            time.sleep(delay)\n    return []\n```\n\nKey design decisions:\n\n-\n**Exponential backoff** on failures (2s, 4s, 8s) -\n**Single request per indicator**— semicolon-separated country codes let us fetch all 54 countries at once -\n**60-second timeout**— some indicators return large payloads -\n**0.5s delay between indicators**— respect the free API\n\n## The Star Schema\n\nDuckDB is perfect for this: blazing fast analytics, zero configuration, and a single portable file.\n\n```\ndim_country ◄──── fact_indicators ────► dim_indicator\n     │                  │\n     └────────── dim_date ──────────────┘\npython\nimport duckdb\n\ndef create_schema(conn):\n    conn.execute(\"\"\"\n        CREATE TABLE IF NOT EXISTS fact_indicators (\n            country_key  INTEGER,\n            indicator_key INTEGER,\n            date_key     INTEGER,\n            value        DOUBLE,\n            yoy_change   DOUBLE,\n            extracted_at TIMESTAMP DEFAULT current_timestamp,\n            PRIMARY KEY (country_key, indicator_key, date_key)\n        )\n    \"\"\")\n    # Plus dim_country (54 rows), dim_indicator (10 rows), dim_date (25 rows)\n```\n\nThe transform layer also computes **year-over-year change** for every data point:\n\n``` python\ndef calculate_yoy(current, previous):\n    if current is not None and previous is not None and previous != 0:\n        return round(((current - previous) / abs(previous)) * 100, 2)\n    return None\n```\n\n## Data Quality Framework\n\nThis is what separates a toy project from a production one. The quality framework scores three dimensions:\n\n**1. Completeness** — What percentage of expected data points are non-null?\n\n```\nLiteracy Rate: only 18% complete (data is sparse)\nPopulation: 100% complete (every country, every year)\n```\n\n**2. Validity** — Are values within expected ranges?\n\n```\nLife expectancy: 25-95 years ✅\nGDP: $1M - $10T ✅\nInflation: -30% to 10,000% (yes, hyperinflation happens) ✅\n```\n\n**3. Freshness** — How recent is the latest data?\n\n```\nGDP: 2024 ✅\nLiteracy: 2021 ⚠️ (surveys are infrequent)\n```\n\nThe final score: **95.8/100** — with completeness dragging slightly due to sparse literacy data (expected for survey-based indicators).\n\n## Interactive Dashboard\n\nThe dashboard is a static site (HTML + Tailwind CSS + Chart.js + Leaflet.js) that loads pre-exported JSON files:\n\n**Features:**\n\n- 🗺️\n**Choropleth map**— click any African country, toggle between indicators - 📈\n**Country comparison**— compare up to 6 countries over 25 years - 🏆\n**Rankings table**— sortable by any indicator - 🌙\n**Dark mode**— full theme support - 📱\n**Responsive**— works on mobile\n\nThe dashboard reads four JSON files exported by the pipeline:\n\n-\n`country_profiles.json`\n\n— all data per country (897KB) -\n`rankings.json`\n\n— pre-sorted rankings per indicator -\n`summary_stats.json`\n\n— aggregate statistics -\n`quality_report.json`\n\n— transparency on data quality\n\n## Automated Daily Refresh\n\nA GitHub Actions workflow runs the pipeline daily at 6 AM UTC:\n\n```\nname: Daily ETL Pipeline\non:\n  schedule:\n    - cron: '0 6 * * *'\n  workflow_dispatch:\n\njobs:\n  etl:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v4\n      - uses: actions/setup-python@v5\n        with: { python-version: '3.12' }\n      - run: pip install -r requirements.txt\n      - run: python -m pipeline.main all\n      - run: |\n          git config user.name \"github-actions[bot]\"\n          git add dashboard/data/\n          git diff --cached --quiet || git commit -m \"chore: update data\"\n          git push\n```\n\nFresh data → committed JSON → Vercel auto-deploys. Zero manual intervention.\n\n## Key Takeaways\n\n**Free APIs are underrated**— The World Bank API has incredible depth. No auth, no rate limits worth worrying about, and 25+ years of history.** DuckDB is a game-changer**for small-to-medium analytical workloads. Zero setup, single file, and it handles 13K+ rows with analytical queries in milliseconds.**Data quality isn't optional**— Even with a trusted source like the World Bank, you'll find missing data, sparse indicators, and surprises. Build quality checks into the pipeline, not as an afterthought.**Static dashboards scale**— By pre-computing JSON at ETL time, the dashboard is just a static site. No backend, no database connection, no server costs. Deploy to Vercel for free.**Star schemas still matter**— Even in a world of data lakes and denormalized tables, dimensional modeling makes your data*queryable*and*understandable*.\n\n## Try It Yourself\n\nThe entire project is open source:\n\n-\n**GitHub:**[hajirufai/afridata-pipeline](https://github.com/hajirufai/afridata-pipeline) -\n**Stack:** Python 3.12, httpx, DuckDB, Chart.js, Leaflet.js, Tailwind CSS\n\n```\ngit clone https://github.com/hajirufai/afridata-pipeline.git\ncd afridata-pipeline\npip install -r requirements.txt\npython -m pipeline.main all\ncd dashboard && python -m http.server 8080\n```\n\nData engineering doesn't have to be about massive Spark clusters and cloud bills. Sometimes the best projects start with a free API and a clear question.\n\n*What economic indicators would you add? Drop a comment below!*", "url": "https://wpnews.pro/news/building-an-african-economic-data-pipeline-with-python-duckdb-world-bank-api", "canonical_source": "https://dev.to/hajirufai/building-an-african-economic-data-pipeline-with-python-duckdb-world-bank-api-3al0", "published_at": "2026-05-23 12:41:33+00:00", "updated_at": "2026-05-23 13:03:43.029123+00:00", "lang": "en", "topics": ["data", "developer-tools", "open-source"], "entities": ["World Bank", "DuckDB", "Python", "AfriData Pipeline", "Vercel"], "alternates": {"html": "https://wpnews.pro/news/building-an-african-economic-data-pipeline-with-python-duckdb-world-bank-api", "markdown": "https://wpnews.pro/news/building-an-african-economic-data-pipeline-with-python-duckdb-world-bank-api.md", "text": "https://wpnews.pro/news/building-an-african-economic-data-pipeline-with-python-duckdb-world-bank-api.txt", "jsonld": "https://wpnews.pro/news/building-an-african-economic-data-pipeline-with-python-duckdb-world-bank-api.jsonld"}}