ONNX Runtime + pgvector in Django: semantic search without PyTorch or external APIs

Exogram is an open-source social network for Kindle readers. There is a recurring tension in the design of small-to-medium web applications that need semantic search: the easiest path—calling an external embedding API—introduces costs, latency, and privacy concerns that are often disproportionate to the scale of the problem. The harder path—running a model locally—has historically meant pulling PyTorch into your Docker image and accepting a bloated, fragile deployment. This article documents a third option: running inference with ONNX Runtime, backed by pgvector for storage, on standard Django infrastructure. No external API calls, no separate vector database, no PyTorch in production. The problem with "just call the API" The reflex to reach for OpenAI's embedding API is understandable. You get high-quality embeddings with one HTTP call, no model management, and results that work immediately. For a prototype, that tradeoff is usually correct. For a production app that processes user da

ONNX Runtime + pgvector in Django: semantic search without PyTorch or external APIs

Related Articles

These Sony headphones are under $50 and punch above their weight - and they're on sale

Copilot Didn’t Replace Developers But Replaced Thinking

Google TV’s new Gemini features keep fans updated on sports teams and more

LeetCode Solution: 34. Find First and Last Position of Element in Sorted Array

Does your Android Auto keep disconnecting? What to do about it - for now

Related Articles

News
These Sony headphones are under $50 and punch above their weight - and they're on sale
ZDNet • 3h ago

News
Copilot Didn’t Replace Developers But Replaced Thinking
Medium Programming • 3h ago

News
Google TV’s new Gemini features keep fans updated on sports teams and more
TechCrunch • 3h ago

News
LeetCode Solution: 34. Find First and Last Position of Element in Sorted Array
Dev.to Tutorial • 4h ago

News
Does your Android Auto keep disconnecting? What to do about it - for now
ZDNet • 4h ago