{"slug": "show-dev-self-reinforcing-k-pop-data-pipeline-using-spring-boot-and-pgvector-on", "title": "Show Dev: Self-reinforcing K-pop data pipeline using Spring Boot and pgvector (Built on OCI Free Tier)", "summary": "A Seoul-based backend developer built k-cosmos, an interactive 3D music space that maps K-pop tracks using 768-dimensional vector embeddings, running entirely on the OCI Free Tier. The system uses a self-reinforcing pipeline where an LLM analyzes mood and aesthetic to generate search keywords that fuel future ingestion, enabling autonomous growth. To handle performance constraints with 4,000 tracks, the developer implemented a three-phase transaction model with Java 21 Virtual Threads and a PostgreSQL query using pgvector for diversified nearest-neighbor search.", "body_md": "Hi everyone,\n\nI'm a backend developer based in Seoul. I built k-cosmos, an interactive web-based 3D music space that maps K-pop tracks based on 768-dimensional vector embeddings.\n\nThe main reason I had to build this from scratch is that there's no clean, structured K-pop metadata or emotional tag dataset available anywhere.\n\nHow the pipeline grows itself\n\nIt runs on an autonomous background sync cycle. First, the system ingests tracks and uses an LLM to analyze the mood and aesthetic. Then, the AI reverse-engineers low-latency search keywords based on that analysis. These keywords are absorbed back into the database to fuel the next day's ingestion scheduler, allowing the system to expand its data footprint without human intervention.\n\nArchitectural decisions under hard constraints\n\nSince I am running everything on the OCI free tier with around 4,000 tracks, I had to resolve several performance bottlenecks at the database and thread layer.\n\nPhase 1 (Short TX): Claim the target track via FOR UPDATE SKIP LOCKED and immediately flip the status to PROCESSING to isolate rows for worker concurrency. Commit and release the connection.\n\nPhase 2 (Zero TX): Perform the heavy external network I/O and embedding generation while holding zero active DB connections.\n\nPhase 3 (Short TX): Open a short transaction to persist the final structured entity data.\n\nThe entire flow runs over Java 21 Virtual Threads to minimize scheduling overhead during I/O wait states.\n\n``` js\nWITH candidates AS (\n    SELECT *, embedding <=> CAST(:embedding AS vector) AS distance\n    FROM cosmos_tracks\n    WHERE status = 'COMPLETED' AND cluster_id = :clusterId AND id NOT IN (:excludeIds)\n    ORDER BY embedding <=> CAST(:embedding AS vector) LIMIT :poolSize\n),\ndiversified AS (\n    SELECT *, ROW_NUMBER() OVER (PARTITION BY artist ORDER BY distance) AS artist_rank\n    FROM candidates\n)\nSELECT * FROM diversified WHERE artist_rank <= :maxPerArtist ORDER BY distance LIMIT :limit\n```\n\nThis preserves index efficiency while strictly scattering artist density with a single roundtrip.\n\nI deliberately chose a Thymeleaf SSR hybrid architecture to keep the deployment unit single and maintain high operational visibility (P6Spy, Actuator) instead of splitting into a separate SPA.\n\nLive Project: [https://cosmos.codeghost.cloud/](https://cosmos.codeghost.cloud/)\n\nI'm very happy to discuss any architectural or design decisions. Let me know your thoughts or hit me with any questions!", "url": "https://wpnews.pro/news/show-dev-self-reinforcing-k-pop-data-pipeline-using-spring-boot-and-pgvector-on", "canonical_source": "https://dev.to/cosmos0709/show-dev-self-reinforcing-k-pop-data-pipeline-using-spring-boot-and-pgvector-built-on-oci-free-5g7", "published_at": "2026-06-29 02:21:06+00:00", "updated_at": "2026-06-29 02:57:27.611697+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "developer-tools", "ai-infrastructure"], "entities": ["k-cosmos", "Spring Boot", "pgvector", "OCI Free Tier", "Java 21", "Thymeleaf", "PostgreSQL", "Seoul"], "alternates": {"html": "https://wpnews.pro/news/show-dev-self-reinforcing-k-pop-data-pipeline-using-spring-boot-and-pgvector-on", "markdown": "https://wpnews.pro/news/show-dev-self-reinforcing-k-pop-data-pipeline-using-spring-boot-and-pgvector-on.md", "text": "https://wpnews.pro/news/show-dev-self-reinforcing-k-pop-data-pipeline-using-spring-boot-and-pgvector-on.txt", "jsonld": "https://wpnews.pro/news/show-dev-self-reinforcing-k-pop-data-pipeline-using-spring-boot-and-pgvector-on.jsonld"}}