Building RAG Pipelines with pgvector and OpenAI: A Practical Guide
- Mokshit Jain
- AI/ML , Backend Development
- 15 Aug, 2025
Retrieval-Augmented Generation (RAG) has become the go-to architecture for building AI applications that need to reason over custom data. After implementing RAG pipelines that achieved 99% accuracy in product identification for a sales automation system, I want to share practical insights on building production-ready RAG systems.
What is RAG?
RAG combines the power of large language models with your own data. Instead of relying solely on the model’s training data, you retrieve relevant context from your knowledge base and augment the prompt with it. This approach:
- Reduces hallucinations by grounding responses in facts
- Keeps your knowledge base up-to-date without retraining
- Provides source attribution for better trust
Architecture Overview
User Query → Vector Search → Context Retrieval → Augmented Prompt → LLM Response
Setting Up pgvector
First, enable the pgvector extension in PostgreSQL:
CREATE EXTENSION IF NOT EXISTS vector;
-- Create a products table with vector embeddings
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
description TEXT,
price DECIMAL(10, 2),
embedding vector(1536) -- OpenAI embedding dimension
);
Generating Embeddings with OpenAI
Using Python and OpenAI’s API:
from openai import AsyncOpenAI
import numpy as np
client = AsyncOpenAI(api_key="your-api-key")
async def generate_embedding(text: str) -> list[float]:
response = await client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
Building the RAG Pipeline
Here’s a practical implementation I’ve used in production:
from sqlalchemy import text
from pgvector.sqlalchemy import Vector
async def search_similar_products(query: str, limit: int = 5):
# Generate query embedding
query_embedding = await generate_embedding(query)
# Vector similarity search with cosine distance
sql_query = text("""
SELECT id, name, description, price,
1 - (embedding <=> :embedding) as similarity
FROM products
ORDER BY embedding <=> :embedding
LIMIT :limit
""")
results = await db.execute(
sql_query,
{"embedding": query_embedding, "limit": limit}
)
return results.fetchall()
Augmenting the Prompt
Once you retrieve relevant context, augment it with the user’s query:
async def generate_rag_response(query: str) -> str:
# Retrieve relevant context
context = await search_similar_products(query)
# Build context string
context_text = "\n".join([
f"- {row.name}: {row.description} (${row.price})"
for row in context
])
# Augmented prompt
prompt = f"""
Based on the following product catalog:
{context_text}
Customer Query: {query}
Provide a helpful response identifying the relevant product(s).
"""
response = await client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful sales assistant."},
{"role": "user", "content": prompt}
]
)
return response.choices[0].message.content
Performance Optimization Tips
1. Batch Embedding Generation
Generate embeddings in batches to reduce API calls:
async def batch_generate_embeddings(texts: list[str]) -> list[list[float]]:
response = await client.embeddings.create(
model="text-embedding-3-small",
input=texts
)
return [item.embedding for item in response.data]
2. Create HNSW Index for Faster Search
CREATE INDEX ON products USING hnsw (embedding vector_cosine_ops);
3. Implement Caching
Cache frequently queried embeddings to reduce costs:
from functools import lru_cache
@lru_cache(maxsize=1000)
def get_cached_embedding(text: str) -> list[float]:
return asyncio.run(generate_embedding(text))
Key Learnings from Production
After deploying RAG pipelines processing 100+ queries daily with 99% accuracy:
- Data Quality Matters: Clean, structured product data significantly improves retrieval accuracy
- Embedding Model Choice:
text-embedding-3-smalloffers great performance-to-cost ratio - Similarity Threshold: Set a minimum similarity score (e.g., 0.75) to filter irrelevant results
- Context Window: Include 3-5 retrieved items in context to avoid overwhelming the model
Conclusion
RAG pipelines are incredibly powerful for building domain-specific AI applications. With pgvector and OpenAI, you can achieve production-ready accuracy without the complexity of separate vector databases. The key is to focus on data quality, efficient retrieval, and proper prompt engineering.
Have questions about implementing RAG in your project? Feel free to reach out!