
Zero-Downtime Embedding Migration: Switching from text-embedding-004 to text-embedding-3-large in Production
Our embedding model got deprecated overnight. Every RAG query started returning 404s. Here's the exact playbook we used to migrate to a new model in 48 hours with zero downtime. The Situation Service: RAG retrieval service using pgvector on PostgreSQL Old model: text-embedding-004 (deprecated) New model: text-embedding-3-large (768 dimensions) Data volume: Thousands of embedded documents Constraint: Zero downtime, zero data loss, production traffic must keep flowing Step 1: Make the Model Configurable Before anything else, stop hardcoding: # Before (hardcoded in 6 places) response = openai . embeddings . create ( model = " text-embedding-004 " , input = text , ) # After (configured once) EMBED_MODEL = os . getenv ( " EMBED_MODEL " , " text-embedding-3-large " ) EMBED_DIMENSIONS = int ( os . getenv ( " EMBED_DIMENSIONS " , " 768 " )) response = openai . embeddings . create ( model = EMBED_MODEL , input = text , dimensions = EMBED_DIMENSIONS , ) Two environment variables. This is what ma
Continue reading on Dev.to Python
Opens in a new tab



