Back to articles
ONNX Runtime + pgvector in Django: semantic search without PyTorch or external APIs
NewsDevOps

ONNX Runtime + pgvector in Django: semantic search without PyTorch or external APIs

via Dev.toMatías

Exogram is an open-source social network for Kindle readers. There is a recurring tension in the design of small-to-medium web applications that need semantic search: the easiest path—calling an external embedding API—introduces costs, latency, and privacy concerns that are often disproportionate to the scale of the problem. The harder path—running a model locally—has historically meant pulling PyTorch into your Docker image and accepting a bloated, fragile deployment. This article documents a third option: running inference with ONNX Runtime, backed by pgvector for storage, on standard Django infrastructure. No external API calls, no separate vector database, no PyTorch in production. The problem with "just call the API" The reflex to reach for OpenAI's embedding API is understandable. You get high-quality embeddings with one HTTP call, no model management, and results that work immediately. For a prototype, that tradeoff is usually correct. For a production app that processes user da

Continue reading on Dev.to

Opens in a new tab

Read Full Article
2 views

Related Articles