{"slug": "getting-started-with-vector-databases-using-amazon-aurora-postgresql-pgvector", "title": "Getting Started with Vector Databases Using Amazon Aurora PostgreSQL + pgvector", "summary": "Satoshi Kaneyasu, a DevOps engineer at Serverworks, has published a guide explaining vector databases and their implementation using Amazon Aurora PostgreSQL with the pgvector extension. The tutorial covers how vector databases store data as multidimensional arrays and perform semantic similarity searches, contrasting them with traditional relational databases that rely on exact or partial text matching. The guide also details common use cases including RAG (Retrieval-Augmented Generation), semantic search, recommendation systems, and image search, while explaining the vectorization process that converts both stored data and search queries into numerical representations.", "body_md": "Hello!\n\nI'm Satoshi Kaneyasu, DevOps engineer at Serverworks.\n\nIn this article, I'll introduce the basic concepts and terminology of vector databases for those who are just starting to learn about them.\n\nThis article is aimed at beginners to vector databases.\n\nYou may have heard that vector databases are related to LLMs and RAG, but aren't quite sure what they actually are.\n\nThink of this as written with that kind of reader in mind.\n\nA vector database is a database that stores data as vectors (arrays of numbers) and searches for data using \"distance\" or \"similarity\" between vectors.\n\nTraditional relational databases search for data using \"exact match\" or \"partial match\" (LIKE queries), but vector databases can search for things that are **semantically similar**.\n\nFor example, searching for \"weather in Tokyo\" might return results like \"temperature in Tokyo\" or \"weather conditions in Kanto\" — data that differs as a string but is semantically related.\n\nIn a vector database, all data is represented as points in a multidimensional space. When searching, the query is also converted into a vector, and data that is \"close in distance\" within that space is retrieved.\n\nThis diagram represents it in two dimensions, but in a real vector database, proximity and distance are defined across many dimensions.\n\nVector databases are used across a wide range of applications:\n\n| Use Case | Description |\n|---|---|\nRAG (Retrieval-Augmented Generation) |\nKnowledge base search to provide external knowledge to LLMs. Allows internal documents and up-to-date information to be reflected in LLM responses |\nSemantic Search |\nSearching internal documents or FAQs by meaning rather than keywords. Handles spelling variations and synonyms |\nRecommendation |\nRecommending products and content whose vectors are close to a user's preference vector. Used as an alternative or complement to collaborative filtering |\nImage Search |\nSearching for similar images (face recognition, product image matching). Images are vectorized using an embedding model and compared |\nAnomaly Detection |\nDetecting data that deviates far from the vector of normal patterns. Used in log analysis and security monitoring |\nDuplicate Detection |\nDetecting similar documents or code. Used for plagiarism detection and content deduplication |\n\nThe most common use case is RAG.\n\nRAG (Retrieval-Augmented Generation) is a technique that improves LLM response accuracy by searching for relevant information from external data sources before generating a response, then including that information in the prompt.\n\nLLMs cannot accurately respond to information not included in their training data (internal documents, recent news, specialized technical information, etc.).\n\nWith RAG, you can have the LLM reference external knowledge stored in a vector database to generate more accurate and up-to-date responses.\n\nWhen using Amazon Bedrock as the LLM for RAG, there is a fully managed RAG feature called **Knowledge Bases**.\n\nWith Knowledge Bases, you simply register documents stored in S3 and AWS manages everything — vectorization, vector database setup, and search.\n\nSince you don't need to set up a vector database yourself, this is ideal when you want to try RAG quickly or minimize infrastructure management.\n\nSince this article focuses on the vector database itself, we'll proceed without using Knowledge Bases.\n\nThe RAG process follows this flow:\n\nAs you can see, the vector database plays a central role in RAG as the \"search engine for external knowledge.\"\n\nFrom here, let's dive deeper into the \"vector database search\" step.\n\nThe vector database search flow works as follows:\n\nIn a vector database, data is represented as multidimensional numbers.\n\nTherefore, data and search queries are converted to numbers at insertion time.\n\nThis is called vectorization, or Embedding.\n\nThe key point of vector database search is that **the search query itself is also vectorized**.\n\nInstead of searching with raw text, it is converted to a vector using an embedding model (described later), and data that is close in vector space is retrieved.\n\nFrom here, I'll use implementation examples with Aurora PostgreSQL + pgvector (abbreviated throughout) and Python code.\n\nThere are multiple options for building a vector database on AWS, but I find Aurora PostgreSQL + pgvector to be the most approachable starting point, and it's a great way to feel the difference between a conventional relational database and a vector database.\n\nHere is an implementation example using Aurora PostgreSQL + pgvector:\n\n```\n# ① Vectorize the text query (handler.py)\nembedding_result = generate_embedding(query)\nquery_embedding = embedding_result.embedding  # 1024-dimensional vector\n\n# ② Search the DB with the vectorized query (logic.py)\nwith connection.cursor() as cur:\n    cur.execute(\n        # Calculate cosine distance between query vector and DB vectors,\n        # return top_k results in ascending distance order\n        \"SELECT content, embedding <=> %s::vector AS distance \"\n        \"FROM embeddings ORDER BY distance LIMIT %s;\",\n        (query_embedding, top_k),\n    )\n    results = cur.fetchall()\n```\n\nThe `<=>`\n\noperator here is pgvector's cosine distance operator.\n\nA smaller value means higher similarity.\n\nBecause we're using Aurora PostgreSQL + pgvector, we can use SQL to query the vector DB.\n\nThis code uses a prepared statement to safely pass the vectorized search text and the result count (top_k) into the `%s`\n\nplaceholders.\n\nSeveral terms have appeared in this simple search, so let me explain them.\n\nEmbedding refers to the process of converting data such as text or images into a numerical vector.\n\nIt is also called \"vectorization.\"\n\nHumans intuitively know that \"Tokyo weather forecast\" and \"Tokyo temperature\" are similar, but computers can only compare strings.\n\nBy numerically representing meaning through embedding, computers can mathematically calculate \"semantic closeness.\"\n\n```\nBefore: \"Tokyo weather forecast\"\nAfter:  [0.0231, -0.0142, 0.0567, ..., 0.0412]  ← 1024 numbers\n```\n\nHere is an implementation example using Amazon Bedrock's Titan Embeddings V2.\n\nThe `generate_embedding`\n\nfunction implemented here is called at step ① in the \"Search Implementation Code\" above.\n\n``` php\ndef generate_embedding(text: str) -> EmbeddingResult:\n    \"\"\"Vectorize text using Bedrock Titan Embeddings V2.\"\"\"\n    client = _get_bedrock_client()\n    body = json.dumps({\n        \"inputText\": text,        # Before: text\n        \"dimensions\": 1024,       # Output dimensions\n        \"normalize\": True,        # Normalize (set vector length to 1)\n    })\n\n    response = client.invoke_model(\n        modelId=\"amazon.titan-embed-text-v2:0\",\n        body=body,\n    )\n\n    response_body = json.loads(response[\"body\"].read())\n    embedding = response_body[\"embedding\"]  # After: [float] × 1024\n    return EmbeddingResult(embedding=embedding, time_ms=elapsed_ms)\n```\n\nSpecifying `normalize=True`\n\nnormalizes the output vector length to 1.\n\nThis makes cosine similarity calculation equivalent to a dot product calculation, improving search efficiency.\n\nIn the embedding implementation code, there was a keyword called \"dimensions.\"\n\nDimensions refer to the number of numbers in a single vector.\n\n```\n3-dimensional vector:    [0.5, -0.3, 0.8]           ← 3 numbers\n1024-dimensional vector: [0.023, -0.014, ..., 0.041] ← 1024 numbers\n```\n\nMore dimensions allow for finer representation of \"meaning,\" but storage consumption increases accordingly.\n\n| Dimensions | Size per vector | Size for 100k records |\n|---|---|---|\n| 256 | 1 KB | ~100 MB |\n| 1024 | 4 KB | ~400 MB |\n| 1536 | 6 KB | ~600 MB |\n| 3072 | 12 KB | ~1.2 GB |\n\nThe number of dimensions is determined by the embedding model you use. Titan Embeddings V2 lets you choose from 256, 512, or 1024, allowing you to balance accuracy and cost based on your use case.\n\nSpecialized models that convert text to vectors are distinct from LLMs (generative models).\n\nEmbedding models specialize in generating representations for computing semantic similarity.\n\n| Model | Provider | Dimensions | Features |\n|---|---|---|---|\n| Titan Embeddings V2 | AWS Bedrock | 256/512/1024 | AWS native. Has normalization option. High affinity with AWS environments |\n| Cohere Embed v3 | AWS Bedrock | 1024 | Multilingual support. Evaluated as highly accurate for Japanese |\n| text-embedding-3-small | OpenAI | 256~1536 | Lightweight and low cost. Multilingual support. Best for cost-sensitive use cases |\n| text-embedding-3-large | OpenAI | 256~3072 | High accuracy and multilingual support. Flexible dimension selection |\n\nAn important note: **you must use the same model for both search and registration**.\n\nVectors generated by different models don't exist in the same space, so distance calculations are meaningless.\n\n[Amazon Titan Text Embeddings V2 - Bedrock Documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html)\n\nCosine similarity represents \"how much two vectors point in the same direction\" as a number between -1 and 1.\n\nCloser to 1 means more semantically similar, closer to 0 means unrelated, and closer to -1 means semantically opposite.\n\nCosine distance is defined as `1 - cosine similarity`\n\nand ranges from 0 to 2.\n\nA smaller value means higher similarity, and pgvector's `<=>`\n\noperator returns this cosine distance.\n\n\"Distance\" and \"similarity\" are just opposite representations of the same concept.\n\n| Metric | Range | \"More similar\" direction | Use case |\n|---|---|---|---|\n| Cosine Similarity | -1 to 1 | Larger value (closer to 1) | Threshold judgment (e.g., \"hit if >= 0.95\") |\n| Cosine Distance | 0 to 2 | Smaller value (closer to 0) | ORDER BY in SQL, KNN search |\n\nThe search implementation code (`embedding <=> %s::vector`\n\n) sorts by cosine distance, while the threshold judgment in semantic cache (described later) (`similarity >= 0.95`\n\n) uses cosine similarity.\n\ntop_k is the number of top-k results to return from a search. Set an appropriate value based on the use case.\n\nIn RAG, it is common to pass the full set of top_k results as context to the LLM.\n\nBe aware that making top_k too large will lengthen the context, increasing the LLM's token consumption and latency.\n\nNormalization is the process of setting the length (norm) of a vector to 1.\n\nWith Titan Embeddings V2, specifying `normalize=True`\n\nautomatically normalizes the output vector.\n\nCosine similarity between normalized vectors becomes equivalent to a simple dot product.\n\nSince dot products have lower computational cost than cosine similarity, this leads to more efficient search.\n\nAlso, by standardizing vector lengths, distance comparisons purely reflect \"differences in direction,\" which stabilizes search result quality.\n\nOf course, data must be registered in advance before you can search a vector database.\n\nLet's now look at data registration in a vector database.\n\nData registration in a vector database follows this flow:\n\nAs with the search explanation, I'll use implementation examples with Aurora PostgreSQL + pgvector and Python.\n\nThe following table and index are created on Aurora PostgreSQL with the pgvector extension enabled:\n\n```\n-- Enable pgvector extension\nCREATE EXTENSION IF NOT EXISTS vector;\n\n-- embeddings table (storage for vector data)\nCREATE TABLE IF NOT EXISTS embeddings (\n    id SERIAL PRIMARY KEY,\n    content TEXT NOT NULL,\n    embedding vector(1024) NOT NULL\n);\n\n-- HNSW index (speeds up ANN search)\nCREATE INDEX IF NOT EXISTS idx_embeddings_embedding\n    ON embeddings\n    USING hnsw (embedding vector_cosine_ops)\n    WITH (m = 16, ef_construction = 64);\n```\n\nThe `content`\n\ncolumn in the `embeddings`\n\ntable stores the text data, and the `embedding`\n\ncolumn stores the vectorized text.\n\nAn HNSW index is then created on the `embedding`\n\ncolumn.\n\nVector databases have indexes too, and in Aurora PostgreSQL + pgvector, you create indexes with the `CREATE INDEX`\n\nstatement just like regular indexes.\n\nHere, `ON embeddings USING hnsw`\n\nspecifies something called the index algorithm.\n\nThe index algorithm is closely related to the search algorithm, and these two algorithms are critical in vector databases.\n\nThere are two main types of search methods in vector databases:\n\n| Search Method | Full Name | Features |\n|---|---|---|\nKNN |\nK-Nearest Neighbor | Compares against all data exhaustively. Accuracy is perfect but computation cost increases linearly as data grows, making it slow |\nANN |\nApproximate Nearest Neighbor | Searches approximately. Slightly lower accuracy but can search at high speed even with large volumes of data |\n\nIn practical systems, ANN is almost always used.\n\nKNN is fine for small-scale data of a few thousand records, but ANN becomes essential when dealing with tens of thousands of records or more.\n\nThe data structures used to implement ANN are called index algorithms, and there are several types:\n\n| Algorithm | Mechanism | Features |\n|---|---|---|\nHNSW |\nBuilds a hierarchical graph structure and progressively narrows the search range from upper to lower layers | High accuracy and high speed. Higher memory consumption but currently the most widely used |\nIVF |\nClusters data and performs partial search only on clusters close to the query | Memory-efficient. Suitable for large-scale data but may have lower accuracy than HNSW |\n\nCurrently, the **ANN + HNSW** combination is the standard for building vector databases.\n\nAWS offers multiple ways to build vector databases, and Aurora PostgreSQL + pgvector, OpenSearch, and MemoryDB all support HNSW.\n\n```\n-- HNSW index (speeds up ANN search)\nCREATE INDEX IF NOT EXISTS idx_embeddings_embedding\n    ON embeddings\n    USING hnsw (embedding vector_cosine_ops)\n    WITH (m = 16, ef_construction = 64);\n```\n\nThe WITH clause in the index creation SQL specifies the HNSW index parameters:\n\n| Parameter | Meaning | Effect when increased | Typical value |\n|---|---|---|---|\nm |\nConnections per node | Search accuracy ↑ / Memory consumption ↑ / Build time ↑ | 16 |\nef_construction |\nSearch width during construction | Search accuracy ↑ / Build time ↑ | 64~200 |\n\nHere is the Python code to register a substantial amount of data into Aurora PostgreSQL + pgvector:\n\n```\nclass AuroraIngester:\n    \"\"\"Batch INSERT data into Aurora pgvector.\n\n    Efficiently inserts vector data using batch INSERT of 500 records at a time.\n    \"\"\"\n    def __init__(self, connection: psycopg2.extensions.connection) -> None:\n        self._connection = connection\n\n    def ingest_batch(self, start_index: int, end_index: int) -> int:\n        \"\"\"Batch INSERT records in the specified range.\n\n        Args:\n            start_index: Start index (inclusive)\n            end_index: End index (exclusive)\n\n        Returns:\n            Number of records inserted\n        \"\"\"\n        values_parts: list[str] = []\n        params: list[str | list[float]] = []\n        for i in range(start_index, end_index):\n            values_parts.append(\"(%s, %s::vector)\")\n            params.append(f\"doc-{i}\")\n            params.append(generate_vector(seed=i))\n\n        sql = f\"INSERT INTO embeddings (content, embedding) VALUES {', '.join(values_parts)};\"\n        with self._connection.cursor() as cur:\n            cur.execute(sql, params)\n        self._connection.commit()\n        return end_index - start_index\n\n    def ingest_all(self, record_count: int, batch_size: int = 500) -> int:\n        \"\"\"Insert all records into Aurora in batches.\n\n        Args:\n            record_count: Total number of records to insert\n            batch_size: Number of records per batch (default 500)\n\n        Returns:\n            Total number of records inserted\n        \"\"\"\n        log = logger.bind(database=\"aurora_pgvector\")\n        total_inserted = 0\n\n        for start in range(0, record_count, batch_size):\n            end = min(start + batch_size, record_count)\n            for attempt in range(1, MAX_RETRIES + 1):\n                try:\n                    count = self.ingest_batch(start, end)\n                    total_inserted += count\n                    break\n                except Exception as e:\n                    log.warning(\"batch_insert_retry\", start=start, end=end, attempt=attempt, error=str(e))\n                    if attempt == MAX_RETRIES:\n                        log.error(\"batch_insert_failed\", start=start, end=end, error=str(e))\n                        break\n                    time.sleep(RETRY_DELAY_SECONDS)\n\n        log.info(\"ingest_all_complete\", total_inserted=total_inserted)\n        return total_inserted\npython\ndef _run_database_ingestion(index_manager, ingester, record_count):\n    \"\"\"Execute bulk data insertion into the database.\n\n    Args:\n        index_manager: Object managing index drop and creation (implementation omitted)\n        ingester: Object that inserts data in batches (described above)\n        record_count: Total number of records to insert\n    \"\"\"\n    # ① Drop index (speeds up registration)\n    index_manager.drop_index()\n    # SQL executed internally:\n    # DROP INDEX IF EXISTS embeddings_hnsw_idx;\n    # TRUNCATE TABLE embeddings;\n\n    # ② Batch registration (500 records at a time)\n    ingester.ingest_all(record_count, batch_size=500)\n\n    # ③ Bulk index creation\n    index_manager.create_index()\n    # SQL executed internally:\n    # CREATE INDEX embeddings_hnsw_idx\n    #   ON embeddings USING hnsw (embedding vector_cosine_ops)\n    #   WITH (m = 16,              -- Connections per node (more = higher accuracy, more memory)\n    #         ef_construction = 64); -- Search width during construction (more = higher accuracy, slower build)\n```\n\nThe reason for dropping the index first, registering data, and then recreating the index is that registering data while an index exists makes processing time unpredictable.\n\nThis technique is commonly used in relational databases and applies equally to Aurora PostgreSQL + pgvector.\n\nFor more details, see: [Index Considerations When Bulk-Inserting Large Amounts of Data into a Database (Japanese)](https://blog.serverworks.co.jp/database-bulk-insert-index-strategy)\n\nOne technique for speeding up search and data retrieval is caching.\n\nFor vector databases, there is a technology called semantic cache that differs slightly from conventional caching.\n\nSemantic cache is a mechanism that uses the embedding vector of a query as a key to cache past search results or FM (Foundation Model) responses, and quickly returns results from the cache for semantically similar queries.\n\nComparing it with conventional caching reveals its unique characteristics:\n\n| Conventional Cache | Semantic Cache | |\n|---|---|---|\n| Key | Exact string match | Vector similarity |\n| Hit condition | Only the exact same query | Semantically similar queries also hit |\n| Example | Only \"weather in Tokyo\" hits | \"Tokyo weather forecast\" and \"What's the weather in Tokyo today?\" also hit |\n\nWith conventional caching, \"weather in Tokyo\" and \"Tokyo weather forecast\" are treated as different keys, resulting in lower cache hit rates. Semantic cache can group semantically equivalent queries together for caching, dramatically improving hit rates.\n\nWhen implementing semantic cache on AWS, Amazon ElastiCache or Amazon MemoryDB are the typical options.\n\nHere, I'll introduce a semantic cache implementation using Amazon MemoryDB (hereafter, MemoryDB), referencing the following documentation:\n\n[Amazon MemoryDB - Vector Search Examples](https://docs.aws.amazon.com/memorydb/latest/devguide/vector-search-examples.html)\n\nSetting aside the RAG with a vector database for a moment, if you introduce semantic cache for Foundation Model queries, the processing flow would look like this:\n\nNote: MemoryDB is a Redis-compatible key-value store and does not have \"tables\" like RDBs. Data is stored in Hash-type keys, and the search schema is defined as an \"index\" using the`FT.CREATE`\n\ncommand.\n\nIn this repository, the following `FT.CREATE`\n\ncommand creates the index for semantic cache:\n\n```\nFT.CREATE semantic_cache_idx\n  ON HASH\n  PREFIX 1 cache:\n  SCHEMA\n    embedding    VECTOR HNSW 10\n                   TYPE FLOAT32\n                   DIM 1024\n                   DISTANCE_METRIC COSINE\n                   M 16\n                   EF_CONSTRUCTION 512\n    query_text   TAG\n    result       TEXT\n    created_at   NUMERIC\n    ttl          NUMERIC\n```\n\n| Field | Type | Description |\n|---|---|---|\nembedding |\nVECTOR (HNSW) | Query embedding vector (1024 dimensions). Target for KNN search |\nquery_text |\nTAG | Original query text. For exact match filtering |\nresult |\nTEXT | FM response result (cached answer) |\ncreated_at |\nNUMERIC | Cache entry creation time (UNIX timestamp) |\nttl |\nNUMERIC | Cache expiration time (seconds) |\n\n`PREFIX 1 cache:`\n\nmeans only Hashes whose key name starts with `cache:`\n\nare indexed`EF_CONSTRUCTION=512`\n\nis set higher than Aurora pgvector (64). Since MemoryDB operates in-memory, build cost is relatively low, so accuracy is prioritizedThe threshold for semantic cache is the cosine similarity value used to determine cache hits.\n\n| Threshold | Characteristics | Recommended Use Case |\n|---|---|---|\n| 0.95~1.0 | Only nearly identical queries hit | Accuracy-focused. When you want to minimize the risk of returning incorrect cached responses |\n| 0.80~0.90 | Synonymous phrasing variations also hit | Practical balance. Recommended for most use cases |\n| 0.70~0.80 | Related queries also broadly hit | Hit rate-focused. However, the risk of returning unrelated results increases |\n\nThe appropriate threshold depends on business requirements, so I think it's safe to start with a high threshold around 0.95 and gradually lower it while monitoring cache hit rates.\n\nThese are not keywords specific to vector databases or semantic cache — they are Redis commands, which is the engine underlying MemoryDB.\n\nA command that saves field-value pairs together in a Hash-type key.\n\nMultiple fields like `embedding`\n\n, `query_text`\n\n, `result`\n\n, and `created_at`\n\ncan be stored as a single entry.\n\nIn Redis / MemoryDB, it's conventional to use colon-separated naming like `cache:abc123`\n\nfor key names.\n\nThis simply means \"entry abc123 in the cache category\" — the colon itself has no special function.\n\nThe `PREFIX 1 cache:`\n\nin the index definition is a setting to make only keys starting with this prefix subject to search.\n\nA command that sets an expiration time (TTL) on a key. After the specified number of seconds, the key is automatically deleted. This prevents stale cache entries from accumulating.\n\nThe implementation code got a bit long, but what it does is the same as typical cache-based data retrieval: use the cache if available, otherwise search and save the result to cache.\n\nI'll introduce the implementation code in three stages.\n\n``` python\ndef handler(event, context):\n    query = event[\"query\"]  # \"What is AWS S3?\"\n\n    # ① Vectorize the query (Bedrock Titan V2)\n    embedding_result = generate_embedding(query)\n    query_embedding = embedding_result.embedding\n\n    # ② Cache lookup via MemoryDB → FM call\n    cache_result = process_query(\n        query_text=query,\n        query_embedding=query_embedding,\n        redis_client=redis_client,\n        threshold=0.95,   # Environment variable SIMILARITY_THRESHOLD\n        ttl_seconds=3600, # Environment variable CACHE_TTL\n    )\n\n    # ③ Return response (with metrics)\n    return {\"statusCode\": 200, \"body\": {...}}\npython\ndef process_query(query_text, query_embedding, redis_client,\n                  threshold, ttl_seconds):\n    # ① Query MemoryDB cache (FT.SEARCH KNN)\n    search_results = search_similar(redis_client, query_embedding)\n\n    if search_results:\n        key, similarity, fields = search_results[0]\n\n        # ② Cache hit → Return result from cache (no FM call)\n        if similarity >= threshold:\n            return CacheResult(hit=True, source=\"cache\",\n                               result=fields[\"result\"])\n\n    # ③ Cache miss → Query FM directly and get result\n    fm_result = _invoke_fm(query_text)\n\n    # ④ Save result to cache (HSET + EXPIRE)\n    _store_cache_entry(redis_client, query_text,\n                       query_embedding, fm_result, ttl_seconds)\n\n    return CacheResult(hit=False, source=\"fm\", result=fm_result)\npython\ndef search_similar(redis_client, query_embedding, top_k=1):\n    \"\"\"Execute KNN vector search with FT.SEARCH.\"\"\"\n    query_vec = struct.pack(f\"<{len(query_embedding)}f\", *query_embedding)\n\n    query = (\n        Query(f\"*=>[KNN {top_k} @embedding $query_vec AS score]\")\n        .return_fields(\"query_text\", \"result\", \"created_at\", \"score\")\n        .sort_by(\"score\", asc=True)\n        .paging(0, top_k)\n        .dialect(2)\n        .timeout(3000)  # 3-second timeout\n    )\n\n    results = redis_client.ft(\"semantic_cache_idx\").search(\n        query, query_params={\"query_vec\": query_vec}\n    )\n\n    # Convert cosine distance to similarity (distance = 1 - similarity)\n    return [(doc.id, 1.0 - float(doc.score), fields) for doc in results.docs]\n```\n\nMemoryDB's FT.SEARCH command is compatible with Redis's RediSearch module and natively supports KNN vector search.\n\n`score`\n\nis returned as cosine distance (`1 - cosine similarity`\n\n, theoretically in the range 0~2). `1.0 - score`\n\nconverts it to cosine similarity.\n\nWith Titan V2's `normalize=True`\n\n, output vectors are already normalized, so actual scores fall in the range 0~1, meaning the converted similarity also stays in the 0~1 range.\n\nHere are the measured results under the following conditions:\n\n| Item | Value |\n|---|---|\n| FM (Foundation Model) | Claude 3 Haiku (`anthropic.claude-3-haiku-20240307-v1:0` ) |\n| Embedding Model | Titan Embeddings V2 (1024 dimensions) |\n| Cache Store | Amazon MemoryDB |\n| Similarity Threshold | 0.95 |\n| Test Query | \"What is AWS S3?\" (same query run twice) |\n\nThe threshold is set high at 0.95.\n\nPlease treat these measurement results as reference values to demonstrate that semantic cache has a certain level of effectiveness.\n\n| Metric | Cache Miss (1st run) | Cache Hit (2nd run) | Reduction |\n|---|---|---|---|\n| Total Response Time | 4,573ms | 279ms | 94% |\n| Embedding Generation | 194ms | 192ms | — |\n| Cache Lookup | 4ms | 3ms | — |\n| FM Call | 4,375ms | 0ms | 100% |\n\nWhen there's a cache hit, the FM call is completely skipped, reducing response time by 94%.\n\nSince only embedding generation (~190ms) and cache lookup (~3ms) are needed to complete the response, user experience is dramatically improved.\n\nSkipping the FM call also directly translates to reduced API usage costs.\n\nSemantic cache can be integrated into a RAG system.\n\nIn that case, the processing flow would look like this:\n\nIn this article, I covered everything from the basic concepts of vector databases to implementation on AWS and optimization with semantic cache.\n\nThat's all for this time.\n\nThank you for reading this lengthy article!", "url": "https://wpnews.pro/news/getting-started-with-vector-databases-using-amazon-aurora-postgresql-pgvector", "canonical_source": "https://dev.to/aws-builders/getting-started-with-vector-databases-using-amazon-aurora-postgresql-pgvector-4go6", "published_at": "2026-06-03 03:32:51+00:00", "updated_at": "2026-06-03 03:42:04.037630+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "generative-ai", "natural-language-processing"], "entities": ["Amazon Aurora PostgreSQL", "pgvector", "Satoshi Kaneyasu", "Serverworks", "RAG"], "alternates": {"html": "https://wpnews.pro/news/getting-started-with-vector-databases-using-amazon-aurora-postgresql-pgvector", "markdown": "https://wpnews.pro/news/getting-started-with-vector-databases-using-amazon-aurora-postgresql-pgvector.md", "text": "https://wpnews.pro/news/getting-started-with-vector-databases-using-amazon-aurora-postgresql-pgvector.txt", "jsonld": "https://wpnews.pro/news/getting-started-with-vector-databases-using-amazon-aurora-postgresql-pgvector.jsonld"}}