Self-Hosting a Vision Model on a Datacenter GPU: BAGEL-7B-MoT on a Tesla V100

I have an AI character named Sophia who lives inside a Godot game. She talks, she listens, she plays music, she controls the smart lights. And now she can see . Not "process an image if you upload one" see. Real-time webcam-capture, face-detection, emotion-reading see. She looks through the camera, describes what she sees, reads your mood, and responds accordingly. The vision model powering all of this is BAGEL-7B-MoT running on a Tesla V100 16GB GPU. Getting it there was not straightforward. Why We Ditched LLaVA We were running LLaVA 1.6 (7B) via Ollama for months. It worked, but it had problems: Slow -- 8-15 seconds for a basic description on a V100 Hallucination-heavy -- it would confidently describe objects that weren't there No generation capability -- LLaVA is understand-only. No image editing, no generation Stale architecture -- the LLaVA project hasn't seen meaningful updates BAGEL-7B-MoT (Mixture of Transformers) from ByteDance Research offered everything we needed: image unde

Self-Hosting a Vision Model on a Datacenter GPU: BAGEL-7B-MoT on a Tesla V100

Related Articles

Legacy PC design misery

Most scientific models assume the system already exists.

Why 90% of Claude Code Users Are Missing Its Most Powerful Feature ‍♂️

A Review on Language Models as Knowledge Bases

Observa 0.2.0: Dashboards, Alerting, Backups, and Data Export

Related Articles

News
Legacy PC design misery
Lobsters • 2d ago

News
Most scientific models assume the system already exists.
Medium Programming • 2d ago

News
Why 90% of Claude Code Users Are Missing Its Most Powerful Feature ‍♂️
Medium Programming • 2d ago

News
A Review on Language Models as Knowledge Bases
Dev.to • 2d ago

News
Observa 0.2.0: Dashboards, Alerting, Backups, and Data Export
Medium Programming • 2d ago