Contextual Multimodal Video Editing

CMVE was also presented as an invited talk at the ICCV 2021 Workshop on AI for Creative Video Editing and Understanding.

2021

Conference on Computer Vision and Pattern Recognition (CVPR) Workshops Publication

Contextual Multimodal Video Editing (CMVE) is an automated video editing model that leverages visual and textual metadata describing video clips and a learned editing style from a single example video to coherently combine clips into edited video. Experimental results showed no significant difference in the CMVE vs human edited video in terms of matching the text query and the level of interest each generates.

Collaborators: Sharath Koorathota (Fovea, Inc., Columbia University), Paul Sajda (Columbia University), Kelly Cotton (CUNY)
Original Publication: “Editing Like Humans: Editing Like Humans: A Contextual, Multimodal Framework for Automated Video Editing”