Everything is a Pipeline

A technical writer argues that diverse technical systems—from data engineering pipelines to RAG-based AI products—share a common underlying structure, using stages of refinement to transform raw inputs into usable outputs. The piece draws parallels between the medallion architecture in data processing and the chunking-embedding-retrieval flow in RAG pipelines, suggesting that understanding one pipeline can demystify others.

Everything is a Pipeline Why technical concepts are often more alike then we might think. We’re in a major cognitive dissonance era right now. AI is “moving so fast.” We all “feel behind.” So we use AI to deliver quick results vibe coding, analyzing 30 page PDFs, Claude doing your taxes . But token maxxing leaves us feeling hollow, like Taco Bell after a night out. Because it stops us from taking a moment to actually think or understand anything. So let’s think for a second. Is everything technical just a pipeline? If you understand how one pipeline like a data engineering pipeline works, do you understand them all? Are all of these pipelines essentially the same? Data engineering pipelines RAG https://technically.dev/universe/rag pipelines for AI productsSales pipelines Image processing pipelines Frontend https://technically.dev/universe/frontend app compilation pipelines Let’s explore. I promise this post will not result in you using more tokens. The Data Pipeline Bronze, silver, and gold. Databricks https://technically.dev/posts/what-does-databricks-do-2025 ’ marketing team uses this holy trinity to describe the medallion architecture for processing data. Other companies like dbt have used staging, intermediate, and mart. I’ve even seen raw, cleaned, and curated. They all mean the same thing. It describes how data is processed as it moves through the pipeline: Bronze is data that’s been ingested, usually in the most raw state eg logs or events . Silver is the cleaned, joined, and standardized sets of data. This is where stuff like deduplication, filtering, and constraints are enforced. Gold is ready to be used by the business, in reporting or machine learning https://technically.dev/universe/machine-learning models. Each step of the pipeline the data gets more refined + usable, following a set of principles. You can walk into any company following this 3-stage pipeline architecture, look at their data transformation code, and be able to make sense hopefully : of what they’re doing. The RAG Pipeline I once interviewed at a vector database https://technically.dev/universe/vector-database company that had a pretty involved take-home assignment, that required me to explain a RAG retrieval augmented generation pipeline. Coming from working in data, this was just one hop over, but it still felt foreign. RAG systems let an LLM https://technically.dev/universe/llm retrieve information from external datasets, and use that information to generate responses. RAG shows up every time an LLM like Claude does a web search. It’s so seamless we don’t even think about the underpinnings of what’s happening. Notion is a great example. When you use Notion AI, queries are being embedded into vectors on the fly and it’s searching a vectorized version of your Notion workspace. You might be thinking “how the hell is it so fast?”. That’s part of the magic. This vector pipeline runs in the background chunking, embedding, and storing in a vector db so that Notion AI run your semantic vector search when it needs to. During the ingestion stage, we add metadata to our documents like title, author, etc which helps add context to them. This will follow the documents through the rest of the pipeline. Chunking breaks the documents into similar-sized pieces, so that they’re faster to search through. Embedding is where we are transforming the text or image / video other modal into a vector. Next we package everything together and store it in a vector database. This is where that metadata becomes super helpful, because it can be used for cheap + fast filtering at retrieval time. Every pipeline requires tradeoffs. In the data pipeline example, a set of rules like the Medallion Architecture helps us decide where to put different types of data transformation work. In a RAG pipeline, the tradeoffs we’d think about are: Chunking size: Do you chunk per token or per document? What happens when you need to re-chunk?Embedding models: Do you use open source https://technically.dev/universe/open-source or self-hosted embedding models, or just use OpenAI https://technically.dev/posts/what-does-openai-do ? When do you switch to a new embedding model? Latency: Search demands speed. If you have a B2C product and the search sucks, users may bail. If you want to go deeper, I highly recommend Notion’s blog https://www.notion.com/blog/two-years-of-vector-search-at-notion on vector search. The more I learned about RAG pipelines, the more they look like the same pipeline problem. The Sales Pipeline I work as an AE at OpenRouter, which means I manage a sales pipeline. My sales pipeline looks eerily similar to just a data pipeline: Bronze layer: leads in my CRM without any enrichment. They could be cold leads or warm leads, but I have no idea. Some are missing job titles, phone numbers or worse their name. Silver layer: qualified leads. I know about them and they know about me too, how wonderful I’m still qualifying them but at least they talk to me. Gold layer: This is where the magic happens, where those companies and leads become opportunities. All that work to build a business case, it’s all paid off here in the gold layer. Now I can go and close them as customers I’ve simplified the hell out of this for the bit, but I think it fits. So where do tradeoffs come into play here? It shows up in the small decisions I make every day. It shows up when I’m trying to protect my customer engineer’s time because the prospect sounds flaky. And it most certainly shows up when I need to forecast my sales for the quarter. We need to know whether a deal is moving forward or being left behind: The Photography Pipeline Years ago I got into photography and started doing paid shoots on the side. I used to just take photos on my phone and then edit them in the VSCO app, but when I started doing paid shoots I upgraded to a full frame Sony A7 camera with everything in RAW format. Photo editing is also a pipeline. First, I’d take lots hundreds of photos, way more than I’d need. Then I’d edit in layers: Bronze layer: Filter through them and make selections Silver layer: play with exposures, shadow details, and more. Gold layer: once the composition was right, I’d put the finishing touches on to get them delivery-ready. What you See is Not What you Get I’ve always felt I had a strong knack for drawing patterns and connections but really I think it comes down to how I simplify my approach and layer in a framework. In this case I drew a lot of connections from looking at them as pipelines. Pipelines have workflow stages, and there are inherent tradeoffs about where in which stage you put the work. The order of operations matters What’s fun about pipeline thinking is that you can break any process down into a pipeline, and play with where to draw lines between stages. But a pipeline is just one mental model. What’s your favorite? If LLMs went away tomorrow, what framework would you fall back on to describe how things work?