From a Simple Web App to a Production-Style Platform: My DevOps Learning Journey A developer transformed a simple web app called SystemCraft into a production-style platform by adopting Docker, Kubernetes, Helm, ArgoCD, Prometheus, and security scanning with Trivy. The project evolved from a system design interview tool into a full DevOps learning journey, emphasizing automation, GitOps, and monitoring. When I started building SystemCraft, my goal wasn't to learn Kubernetes, GitOps, monitoring, or cloud-native architecture. I just wanted to build a system design interview platform. Fast forward a few months, and that simple web application evolved into something much bigger: CI/CD Pipelines Dockerized Deployments Kubernetes Helm Charts ArgoCD GitOps Prometheus Monitoring Grafana Dashboards AlertManager Auto Scaling Security Scanning This article is the story of how that happened and what I learned along the way. The Original Idea SystemCraft was designed to solve a problem I noticed while preparing for system design interviews. Most preparation resources are passive: Reading blogs Watching videos Looking at architecture diagrams But real system design interviews are interactive. You need to make decisions, justify trade-offs, adapt to changing requirements, and explain your reasoning. I wanted to create a platform where engineers could: Design architectures visually Receive AI-powered feedback Simulate real interview scenarios Learn through iteration The first version was straightforward: Next.js ↓ MongoDB ↓ Gemini API The Docker Phase My first step was containerization. I created a Dockerfile and containerized the entire application. At first, I thought Docker was the hard part. I quickly learned it wasn't. Building containers is easy. Operating containers reliably is the real challenge. Questions started appearing: How do I deploy updates? How do I manage multiple replicas? How do I scale? How do I monitor failures? Docker solved packaging. It didn't solve operations. Building a Real CI Pipeline The next step was automation. I didn't want deployments to depend on manual commands. I created a GitHub Actions pipeline that would automatically: Lint & Typecheck ↓ Playwright E2E Tests ↓ Docker Build ↓ Trivy Security Scan ↓ Kubernetes Validation ↓ Deployment One lesson became obvious: Automation isn't about speed. It's about consistency. The pipeline catches mistakes long before they reach production. Security Wasn't Optional One of the most valuable additions was Trivy. Initially I wasn't thinking much about container security. Then I started scanning images and realized how many vulnerabilities can exist inside dependencies you didn't even know you had. Every build now goes through: Docker Build ↓ Trivy Scan ↓ Deployment This simple addition completely changed how I think about shipping software. Enter Kubernetes Eventually a single container stopped being enough. I wanted: Multiple replicas Self-healing workloads Rolling updates Horizontal scaling Kubernetes provided all of that. But Kubernetes introduced new challenges: YAML management Service discovery Resource limits Health checks Configuration management The complexity increased significantly. At the same time, I started understanding why Kubernetes became the industry standard. Helm Changed Everything Managing raw Kubernetes manifests quickly became painful. I introduced Helm charts to template deployments and environments. Instead of maintaining multiple copies of manifests, I could parameterize everything: Image versions Replica counts Resource limits Environment variables Deployment became much more manageable. Discovering GitOps with ArgoCD This was probably the biggest mindset shift. Originally deployment looked like: GitHub Actions ↓ kubectl apply After learning GitOps: Git Commit ↓ Git Repository ↓ ArgoCD ↓ Kubernetes Cluster The cluster state became fully declarative. Git became the source of truth. Rollback became dramatically easier. Auditing changes became trivial. I finally understood why so many engineering teams are adopting GitOps workflows. Monitoring: The Missing Piece For a long time I only cared whether the application worked. Then I realized: If something breaks in production, how would I know? That question led me to Prometheus and Grafana. I instrumented the application and started tracking: API latency Request volume Error rates Resource utilization Application health Suddenly I could see what the system was actually doing. Monitoring transformed troubleshooting from guessing into observing. Adding Alerting Monitoring is useful. Alerting is essential. I integrated AlertManager so that operational issues could be detected automatically. This forced me to think about: Error thresholds SLOs Availability targets Incident response Topics I previously associated only with large companies. Testing Scalability Eventually I wanted to understand how the platform behaved under load. I simulated 500 concurrent users. The results were revealing. Single Container Metric Value Requests 23,381 Throughput ~155 req/s P95 Latency 3.33s The Node.js process became saturated. Performance degraded rapidly. Kubernetes with HPA Metric Value Requests 61,026 Throughput ~351 req/s P95 Latency 861ms By distributing traffic across multiple pods, latency dropped dramatically while throughput more than doubled. This was the first time I could actually see the benefits of horizontal scaling in practice. Current Architecture Today the deployment flow looks like this: Developer ↓ GitHub ↓ GitHub Actions ↓ Docker Build ↓ Trivy Scan ↓ GHCR ↓ ArgoCD ↓ Kubernetes ↓ Prometheus ↓ Grafana ↓ AlertManager What started as a simple web application became a complete cloud-native platform. What I Learned A few lessons stood out throughout this journey. What's Next The next phase of my learning journey involves: AWS Terraform Infrastructure as Code Distributed Load Testing Platform Engineering I'm currently building an open-source load testing tool called Loadster, inspired by the challenges I encountered while testing SystemCraft. Check out the site Live: https://system-craft-kohl.vercel.app/ https://system-craft-kohl.vercel.app/ If you like the article make sure to drop a like and maybe even checkout the github repo and help me contribute and make it even better