
VoxEdit AI – A Conversational Video Editing Agent with Gemini and Google Cloud
VoxEdit AI is a conversational video editing agent that allows users to edit videos using natural language commands instead of complex editing tools. The goal of this project is to simplify video editing by allowing creators to interact with an AI assistant that understands their intent and automatically performs editing operations. This project was built using Google’s Gemini multimodal AI models combined with Google Cloud infrastructure to create a scalable AI-powered editing pipeline. How VoxEdit AI Works The system allows users to upload a video clip and give editing commands such as trimming, adding sound effects, or generating audio responses. Instead of manually editing timelines, the user simply tells the AI what they want to change. The workflow of VoxEdit AI is: The user uploads a video through the frontend interface. The backend processes the video and stores it temporarily for analysis. Frames and contextual information from the video are analyzed using Gemini AI. Gemini in
Continue reading on Dev.to
Opens in a new tab



