Build a RAG System with Python and OpenAI

wpnews.pro

🚀 Technical Briefing:This tutorial is part of our deep-dive series on Agentic Workflows at[Gate of AI]. For the full technical breakdown, interactive code sandbox, and the native Arabic translation, visit the[original article here].

<span>Tutorial</span>
<span>Intermediate</span>
<span>⏱ 60 min read</span>
<span>© Gate of AI 2026-06-15</span>

In this tutorial, you will learn how to build a powerful Retrieval-Augmented Generation (RAG) system using Python and OpenAI's latest SDK. This system will enhance your language model's responses by grounding them in relevant data, with a focus on applications in the GCC region.

We will construct a Retrieval-Augmented Generation (RAG) system that leverages the strengths of large language models with the precision of targeted data retrieval. The system will be capable of fetching relevant information from a specified dataset and using that information to generate more accurate, contextually grounded responses. This is particularly useful in the GCC region where initiatives like Saudi Vision 2030 emphasize AI integration.

The finished project will allow you to input a query, retrieve pertinent data from your database, and then produce a response that integrates this data using OpenAI's GPT model. This setup is ideal for applications such as customer support, educational tools, or any context where accurate and informed responses are crucial.

To start building our RAG system, we need to set up our development environment with the necessary tools and libraries. This includes installing the OpenAI SDK and setting up a vector database for data retrieval.

pip install openai pinecone-client

We will also need to set up environment variables to securely store our API keys and other configuration settings. Create a .env

file in your project directory with the following content:

OPENAI_API_KEY=your_openai_api_key_here
PINECONE_API_KEY=your_pinecone_api_key_here

The vector database is central to our RAG system, as it allows us to perform efficient similarity searches. We will use Pinecone, a leading vector search engine, to store and retrieve data based on similarity to our input queries.

from pinecone import Pinecone

  
  
  Initialize Pinecone client

pc = Pinecone(api_key=os.getenv('PINECONE_API_KEY'))

  
  
  Define schema for your data

index = pc.Index('document-index')

  
  
  Create schema in Pinecone

index.create_index(dimension=512)

Here we initialize a Pinecone client with a secure connection. We define an index for our documents, specifying the dimensionality of the vectors. This index is then created in our Pinecone instance, allowing us to store and query documents.

With our database schema ready, we can now ingest data into Pinecone. This involves adding documents that the system will later retrieve and use to augment its responses.

documents = [
    {"content": "OpenAI develops AI technologies and models for various applications."},
    {"content": "Pinecone is a leading vector search engine."},
    {"content": "Retrieval-Augmented Generation enhances language model outputs."}
]

  
  
  Add documents to Pinecone

for doc in documents:
    index.upsert(vectors=[(doc['content'], vector)])

This code snippet loops through a list of documents and adds each one to the Pinecone database. These documents will be used during the retrieval phase to provide contextually relevant information to our language model.

Now that our data is stored, we can construct the core of the RAG system. This involves querying the vector database to retrieve relevant documents and using the OpenAI API to generate a response based on these documents.

from openai import OpenAI

  
  
  Initialize OpenAI client

client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

def generate_response(query):
    result = index.query(query, top_k=3)

retrieved_texts = [doc['content'] for doc in result]

prompt = f"Using the following information, answer the query: {query}\n" + "\n".join(retrieved_texts)

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "system", "content": prompt}]
)

return response['choices'][0]['message']['content']

  
  
  Example usage

query = "What is RAG in AI?"
response = generate_response(query)
print(response)

In this step, we define a function generate_response

that takes a user's query as input. It retrieves the top 3 most relevant documents from Pinecone and constructs a prompt that includes these documents. This prompt is then sent to the OpenAI GPT model to generate a coherent response. The function returns the generated response, which can be printed or used in your application.

⚠️ Common Mistake: Ensure your Pinecone client is correctly authenticated and your OpenAI API key is valid. Misconfiguration can lead to authentication errors.

To verify that your RAG system works correctly, you should test it with various queries and check that the responses are both relevant and accurate. The goal is to ensure that the retrieved documents genuinely enhance the language model's output.

  
  
  Test the system

test_queries = [
    "Explain the concept of RAG in AI.",
    "What is OpenAI known for?",
    "Describe Pinecone's functionality."
]

for query in test_queries:
    print(f"Query: {query}")
    response = generate_response(query)
    print(f"Response: {response}\n")

Run this test script to see how well your system performs. The responses should reflect the content of your stored documents and provide informative answers to the queries.

Here are a few ideas for expanding the capabilities of your RAG system:

source & further reading

dev.to — original article How to Connect Claude Code to Your CMS with MCP From Software Engineer to AI Engineer - Part 1: A whole new world Angular was built for codebases where no one person could review every change, and agent-generated code is that same problem arriving faster.

Build a RAG System with Python and OpenAI

Run your AI side-project on zahid.host