Show Dev: Self-reinforcing K-pop data pipeline using Spring Boot and pgvector (Built on OCI Free Tier)

wpnews.pro

cd /news/artificial-intelligence/show-dev-self-reinforcing-k-pop-data… · home › topics › artificial-intelligence › article

[ARTICLE · art-42859] src=dev.to ↗ pub=2026-06-29T02:21Z topic=artificial-intelligence verified=true sentiment=↑ positive

Show Dev: Self-reinforcing K-pop data pipeline using Spring Boot and pgvector (Built on OCI Free Tier)

A Seoul-based backend developer built k-cosmos, an interactive 3D music space that maps K-pop tracks using 768-dimensional vector embeddings, running entirely on the OCI Free Tier. The system uses a self-reinforcing pipeline where an LLM analyzes mood and aesthetic to generate search keywords that fuel future ingestion, enabling autonomous growth. To handle performance constraints with 4,000 tracks, the developer implemented a three-phase transaction model with Java 21 Virtual Threads and a PostgreSQL query using pgvector for diversified nearest-neighbor search.

read2 min views1 publishedJun 29, 2026

Hi everyone,

I'm a backend developer based in Seoul. I built k-cosmos, an interactive web-based 3D music space that maps K-pop tracks based on 768-dimensional vector embeddings.

The main reason I had to build this from scratch is that there's no clean, structured K-pop metadata or emotional tag dataset available anywhere.

How the pipeline grows itself

It runs on an autonomous background sync cycle. First, the system ingests tracks and uses an LLM to analyze the mood and aesthetic. Then, the AI reverse-engineers low-latency search keywords based on that analysis. These keywords are absorbed back into the database to fuel the next day's ingestion scheduler, allowing the system to expand its data footprint without human intervention.

Architectural decisions under hard constraints

Since I am running everything on the OCI free tier with around 4,000 tracks, I had to resolve several performance bottlenecks at the database and thread layer.

Phase 1 (Short TX): Claim the target track via FOR UPDATE SKIP LOCKED and immediately flip the status to PROCESSING to isolate rows for worker concurrency. Commit and release the connection.

Phase 2 (Zero TX): Perform the heavy external network I/O and embedding generation while holding zero active DB connections.

Phase 3 (Short TX): Open a short transaction to persist the final structured entity data.

The entire flow runs over Java 21 Virtual Threads to minimize scheduling overhead during I/O wait states.

WITH candidates AS (
    SELECT *, embedding <=> CAST(:embedding AS vector) AS distance
    FROM cosmos_tracks
    WHERE status = 'COMPLETED' AND cluster_id = :clusterId AND id NOT IN (:excludeIds)
    ORDER BY embedding <=> CAST(:embedding AS vector) LIMIT :poolSize
),
diversified AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY artist ORDER BY distance) AS artist_rank
    FROM candidates
)
SELECT * FROM diversified WHERE artist_rank <= :maxPerArtist ORDER BY distance LIMIT :limit

This preserves index efficiency while strictly scattering artist density with a single roundtrip.

I deliberately chose a Thymeleaf SSR hybrid architecture to keep the deployment unit single and maintain high operational visibility (P6Spy, Actuator) instead of splitting into a separate SPA.

Live Project: https://cosmos.codeghost.cloud/

I'm very happy to discuss any architectural or design decisions. Let me know your thoughts or hit me with any questions!

source & further reading

dev.to — original article I run a 209-node automation pipeline on n8n. I modeled what it would cost on per-task billing. How We Reduced Our LLM API Costs by 60%: What Actually Worked Building Multi-Agent Systems with Python: Orchestration Patterns That Work

~/api · this article 200

$curl api.wpnews.pro/v1/news/show-dev-self-reinforcin…

Read original on dev.to → dev.to/cosmos0709/show-dev-self-reinforcing-k-po…

mentioned entities

k-cosmos

Spring Boot

pgvector

OCI Free Tier

Java 21

Thymeleaf

PostgreSQL

Seoul

metadata

slugshow-dev-self-reinforcing-k-pop-data-pipeline-using-spring-boot-and-pgvector-on

topic#artificial-intelligence

secondary4 topics

sentimentpositive

canonicaldev.to

navigation

← prevAnthropic Claude Fable 5, on tra…

next →Has anyone used DeepSeek? Is it …

── more in #artificial-intelligence 4 stories · sorted by recency

discuss.huggingface.co · 17 Jun · #artificial-intelligence

Independent Researcher seeking arXiv endorsement for cs.SE (Software Engineering) - Local-First AI Platform

dev.to · 29 Jun · #artificial-intelligence

How We Reduced Our LLM API Costs by 60%: What Actually Worked

dev.to · 29 Jun · #artificial-intelligence

Building Multi-Agent Systems with Python: Orchestration Patterns That Work

dev.to · 29 Jun · #artificial-intelligence

I run a 209-node automation pipeline on n8n. I modeled what it would cost on per-task billing.

── more on @k-cosmos 3 stories trending now

wpnews · 28 May · #ai-startups

[AINews] Cognition raises $1B in $26B Series D

wpnews · 5 Jun · #ai-agents

Miasma Worm Targets AI Coding Agents via GitHub Repos

wpnews · 28 Jun · #ai-agents

OpenCode v1.17: Session Snapshots Undo Your AI Agent

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required