
A Serverless Blueprint for Multimodal Video Search on AWS
Originally published on Build With AWS . Subscribe for weekly AWS builds. This design was inspired by Miguel Otero Pedrido and Alex Razvant’s “Kubrick” course, but rebuilt using native AWS primitives instead of custom frameworks. Video is impossible to search. You can scrub through it manually, or rely on YouTube’s auto-generated captions that only match exact keywords. But what if you want to find “the outdoor mountain scene” or “where they discuss AI ethics”? Traditional video platforms fail here because they treat video as a single data type. This system treats video as three parallel search problems. Speech gets transcribed with word-level timestamps and indexed for semantic search. Every frame generates a semantic description through Claude Vision and goes into a separate index. Those same frames become 1,024-dimensional vectors for visual similarity search. Users ask questions in natural language, and an intelligent agent figures out which index to query. Results come back with e
Continue reading on Dev.to Tutorial
Opens in a new tab



