{"slug": "rag-architecture-with-n8n-postgresql-pgvector-ollama-gemma4-on-aws-ec2", "title": "RAG Architecture with n8n + PostgreSQL (pgvector) + Ollama Gemma4 on AWS EC2", "summary": "This article details a Retrieval-Augmented Generation (RAG) architecture built on AWS EC2 that uses n8n for workflow orchestration, PostgreSQL with pgvector for vector storage, and Ollama to run the Gemma4 model. The system automatically ingests and processes documents like emails and PDFs by extracting text, creating embeddings, and storing them for semantic search, enabling Gemma4 to generate AI answers grounded in private enterprise data. The architecture is split across two EC2 instances to separate orchestration from AI workloads, improving scalability and security for business applications such as customer support and knowledge management.", "body_md": "This is a submission for the Gemma 4 Challenge: Write About Gemma 4\nA Complete Enterprise AI Knowledge Retrieval Architecture for Private Document Intelligence\nSummary\nThis article explains a Retrieval-Augmented Generation (RAG) architecture using n8n, PostgreSQL with pgvector, Ollama, and Gemma4 running on AWS EC2. The platform automatically ingests emails and PDFs, creates embeddings, stores vectors in PostgreSQL, and retrieves contextual information to generate AI answers grounded in enterprise data.\nContent\nYou can view a video of this Gemma4 architecture here:\nhttps://www.youtube.com/watch?v=bTP-sNKlsxc\nRAG architectures combine vector search with large language models. In this solution, n8n orchestrates ingestion workflows and query processing. Emails and PDF documents are read automatically, text is extracted and cleaned, then split into semantic chunks. The chunks are embedded using the nomic-embed-text model and stored in PostgreSQL pgvector. When users ask questions, the question is embedded and compared against stored vectors to retrieve the most relevant chunks. Gemma4 then generates a final response using retrieved context.\nThe architecture uses two AWS EC2 instances. The first server hosts n8n, PostgreSQL, Docker, and orchestration services. The second server hosts Ollama, Gemma4, and the embedding model. This separation improves scalability and isolates AI workloads from orchestration tasks.\nDocker containers simplify deployment and maintenance. PostgreSQL with pgvector enables semantic similarity search directly inside the relational database. This architecture is modular and can evolve with future embedding models and LLM technologies.\nBusiness Applications\n1.Customer Support AI\nSupport teams can query manuals, troubleshooting guides, and ticket histories using natural language to accelerate customer assistance.\n2.Enterprise Knowledge Management\nOrganizations can centralize contracts, policies, reports, and procedures into an intelligent AI search platform.\nFinancial Analytics\nExecutives can ask natural language questions about sales trends, ERP reports, invoices, and operational metrics.\nTechnical Details\nInfrastructure Requirements:\nImplementation Details:\nn8n automates email ingestion, PDF extraction, chunking, embedding generation, and vector storage. Chunk sizes around 1000 characters with overlap improve semantic retrieval. PostgreSQL pgvector performs cosine similarity searches. Gemma4 receives contextual prompts generated from retrieved chunks.\nSecurity and Networking:\nUse HTTPS with reverse proxies, encrypted EBS volumes, private networking between EC2 instances, and restricted security groups to protect sensitive enterprise data.\nEstimated Costs:\nFigure 1: Architecture of two AWS EC2, one running gemma4 and the other N8N\nCOMPLETE STEP-BY-STEP FLOW\nSTEP 1 — User Sends Data Into the System\nAccording to the infographic:\nEmails with PDFs are read on EC2 #1.\nThis is the beginning of the Ingestion Workflow.\nThe company may receive:\ninvoices\nmanuals\ncontracts\nreports\ncustomer emails\ntechnical documentation\nsupport tickets\nn8n automatically monitors:\nIMAP mailboxes\nfolders\nAPIs\nSharePoint\nGoogle Drive\nCRMs\nERPs\nWhat n8n Does\nn8n acts as the automation orchestrator.\nExample:\nNew email arrives\nn8n detects it\nDownloads PDF attachment\nStarts the AI pipeline automatically\nNo human intervention is required.\n*STEP 2 — PDF Text Extraction\n*\nThe infographic shows:\nExtract PDF Text\nAt this stage:\nPDFs are parsed\ntext is extracted\nmetadata is collected\nMetadata may include:\nsender\ndate\nfilename\ndocument type\ndepartment\ncustomer ID\nWhy This Matters\nLLMs cannot directly understand PDFs.\nThe system must convert documents into raw text before AI processing.\nExample:\nA 200-page manual becomes machine-readable text.\n_STEP 3 — Clean and Normalize Text\n_\nThe infographic shows:\nClean & Normalize Text\nRaw PDF extraction is usually messy.\nProblems include:\nbroken lines\nduplicated spaces\nheaders/footers\npage numbers\nencoding problems\ntables split incorrectly\nn8n cleans the content using scripts or functions.\nExample\nBefore cleaning:\nInvoice #2939\nCustomer:\nACME Corp\nPage 1\nAfter cleaning:\nInvoice #2939 Customer: ACME Corp\nSTEP 4 — Chunking the Text\nThe infographic shows:\nChunk Text\nThis is one of the MOST important steps in RAG.\nWhy Chunking Is Necessary\nLLMs have token limits.\nA 500-page document cannot be sent entirely to the model.\nSo the document is split into smaller pieces called:\nChunks\nExample chunk size:\n1000 characters\n200 overlap\nWhat Overlap Means\nSuppose chunk #1 ends with:\nThe warranty expires after...\nChunk #2 begins with:\n...after 24 months of operation.\nOverlap preserves semantic continuity.\nWithout overlap:\ninformation can be lost\nmeaning breaks between chunks\nSTEP 5 — Create Embeddings\nThe infographic shows:\nPrepare Embedding Payload\nCall Embedding Server\nThis is where semantic AI begins.\nWhat Is an Embedding?\nAn embedding converts text into mathematical vectors.\nThe embedding model understands meaning.\nExample:\n\"car\"\nand\n\"vehicle\"\ngenerate similar vectors.\nEmbedding Process\nChunk text is sent from EC2 #1 to EC2 #2.\nThe embedding server uses:\nnomic-embed-text\nto transform text into vectors.\nExample:\n[0.023, -0.991, 0.224, ...]\nThese vectors may contain:\n768 dimensions\n1024 dimensions\n1536 dimensions\ndepending on the model.\nWhy Embeddings Are Powerful\nTraditional search uses keywords.\nEmbeddings use:\nSemantic Meaning\nThis means users can ask:\nHow long is the warranty?\nEven if the document says:\nCoverage remains valid for 24 months.\nThe system still finds the answer.\n_STEP 6 — Store Embeddings in PostgreSQL pgvector\n_\nThe infographic shows:\nStore Embeddings in PostgreSQL (pgvector)\nNow the vectors are saved in PostgreSQL.\nWhat Is pgvector?\npgvector is an extension for PostgreSQL that adds:\nvector storage\nsimilarity search\nAI search capabilities\nExample table:\nid chunk_text embedding\n1 warranty info [0.12, ...]\nWhy PostgreSQL Is Used\nAdvantages:\nmature database\nACID compliance\nreliability\nbackups\nindexing\nSQL support\nenterprise-ready\nInstead of needing a separate vector DB like Pinecone or Weaviate, pgvector keeps everything inside PostgreSQL.\n_STEP 7 — User Asks a Question\n_\nThe infographic says:\nUser sends question (HTTPS POST /ask)\nA user opens the web interface and types:\nWhat is the warranty period for industrial pumps?\nThe question goes to EC2 #1.\nSTEP 8 — Create Embedding for the Question\nThe infographic shows:\nCreate Embedding for Question\nThe question itself is transformed into a vector using the SAME embedding model.\nThis is critical.\nIf documents and questions use different embedding models:\nsimilarity breaks\nretrieval quality drops\n*STEP 9 — Vector Similarity Search\n*\nThe infographic shows:\nVector Search in PostgreSQL\nThis is the core of RAG.\nHow Similarity Search Works\nThe question vector is compared against ALL stored chunk vectors.\nUsing:\ncosine similarity\nEuclidean distance\ninner product\nPostgreSQL finds the chunks mathematically closest in meaning.\nExample\nUser asks:\nHow many vacation days do employees receive?\nThe database may retrieve:\nEmployees are entitled to 15 annual leave days.\neven without keyword matching.\nSTEP 10 — Build Context\nThe infographic shows:\nBuild Context\nThe best matching chunks are combined together.\nExample:\nChunk 1: vacation policy\nChunk 2: HR policy\nChunk 3: employment handbook\nThe system assembles them into context.\nWhy Context Is Critical\nLLMs hallucinate when lacking information.\nRAG prevents hallucinations by giving:\nRelevant Ground Truth Data\nThe model answers from company knowledge.\n_STEP 11 — Build Prompt for Gemma4\n_\nThe infographic shows:\nBuild Prompt for Gemma4\nA structured prompt is generated.\nExample:\nYou are an enterprise assistant.\nAnswer ONLY using the provided context.\nContext:\n[retrieved chunks]\nQuestion:\nHow many vacation days do employees receive?\nThis prompt engineering layer is extremely important.\n*STEP 12 — Send to Gemma4 via Ollama\n*\nThe infographic shows:\nSend to Gemma4 (EC2 #2)\nThe prompt is sent to Ollama.\nOllama exposes APIs like:\n/v1/chat/completions\nGemma4 processes:\ncontext\ninstructions\nuser question\nThen generates the final response.\nWhy Ollama Is Important\nOllama simplifies:\nlocal LLM serving\nmodel management\nGPU usage\nAPI exposure\nWithout Ollama:\nrunning LLMs locally is much harder.\nSTEP 13 — Return Answer to User\nThe infographic ends with:\nAnswer is returned to the user\nThe final answer travels back:\nGemma4 → EC2 #1 → Web Client\nThe user receives a grounded response.\nExample:\nEmployees receive 15 vacation days annually after completing one year of employment.\nWhy This Architecture Is Powerful\nThis architecture creates:\nPrivate AI\nData stays inside AWS infrastructure.\nSemantic Search\nSearches by meaning, not keywords.\nScalable AI\nYou can scale:\ndatabase\nworkflows\nLLM server\nembedding server\nindependently.\nConclusions\nThis RAG architecture demonstrates how organizations can build private and scalable AI systems using open-source technologies. The combination of n8n, PostgreSQL pgvector, Ollama, and Gemma4 enables intelligent retrieval of enterprise knowledge while maintaining full infrastructure control. The modular design supports future scalability, model upgrades, and advanced AI workflows.", "url": "https://wpnews.pro/news/rag-architecture-with-n8n-postgresql-pgvector-ollama-gemma4-on-aws-ec2", "canonical_source": "https://dev.to/fernando77/rag-architecture-with-n8n-postgresql-pgvector-ollama-gemma4-on-aws-ec2-1ke", "published_at": "2026-05-22 21:31:12+00:00", "updated_at": "2026-05-22 22:03:40.811495+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "open-source", "enterprise-software"], "entities": ["n8n", "PostgreSQL", "pgvector", "Ollama", "Gemma4", "AWS EC2", "nomic-embed-text", "Docker"], "alternates": {"html": "https://wpnews.pro/news/rag-architecture-with-n8n-postgresql-pgvector-ollama-gemma4-on-aws-ec2", "markdown": "https://wpnews.pro/news/rag-architecture-with-n8n-postgresql-pgvector-ollama-gemma4-on-aws-ec2.md", "text": "https://wpnews.pro/news/rag-architecture-with-n8n-postgresql-pgvector-ollama-gemma4-on-aws-ec2.txt", "jsonld": "https://wpnews.pro/news/rag-architecture-with-n8n-postgresql-pgvector-ollama-gemma4-on-aws-ec2.jsonld"}}