Editing. Frame-based video editing is very time intensive, as it forces users to operate on very minute details. Editors can leverage metadata, such as shot boundaries [39] and transcripts [27] that help users place cuts and transitions. This gives users higher-level editing operations at the shot level rather than the frame level. Techniques of computer vision and speech analysis can automate certain visual effects, such as creating cinemagraphs [13, 113], automatically editing lecture videos [102], creating zoomable tapestries and synopses [15, 175], and stabilizing shaky amateur videos [140]. Edits can take place during recording, such as switching to a close-up view of a person who is speaking [179]. When material like character animations can be reviewed in 3D, camera angles can be optimized to render a new video by analyzing the actor’s motion data [10, 11]. Finally, when video analysis is a matter of subjective taste, identifying salient frames or highlights can be outsourced to crowd workers [26, 200].
MixT, DemoCut, Kinectograph, and DemoDraw also use computer vision techniques for making automatic editing decisions. They differ from previous approaches in their focus on two particular application domains – software and physical demonstration videos. By focusing on a specific domain, MixT and DemoCut can make assumptions about the structure of the input and output video, such as the fact that there is a linear set of steps. DemoCut offers an authoring interface that makes it easier to create high quality instructional videos. Kinectograph makes editing decisions (e.g., pan-and-tilt, zooming) based on an actor’s body location in video frames. DemoDraw includes a multi-modal interface where authors can annotate a recorded body motion using speech while physically performing movements.
Tools for Navigating Video
Video playback can be controlled by inferring user intention from their actions. For example, segments can be played back [174] or speed can be modified [45] based on a user’s actions in a software application. Videos can be navigated at the content level beyond a linear timeline. For instance, subjects’ movements can be visualized in a storyboard [84] or continuous image mosaic [203], and timelines can be navigated by manipulating a target in 2D [68, 85, 115] or 3D [161]. These techniques help viewers understand content flow and navigate videos, and
have been applied to screencast videos [60, 160]. Canvases of video tiles and timelines [100] or thumbnails [147] can make navigating long videos or sets of videos faster. Video digests can be an effective way for viewers to browse and skim video content [170].
These novel forms of video navigation inspired us to help viewers navigate video content more effectively with our tools. MixT supports per-step video navigation embedded in a static tutorial. DemoWiz augments a screencast video with novel visualization to help users view the content. DemoDraw renders a series of human movements as concise motion illustrations.
Summary
Our approaches take video as system input (i.e., to track changes of salient UI components of screencast videos in MixT, DIY activities in DemoCut, and body movements in Kinectograph and DemoDraw), as well as output (i.e., to offer interactive instructions with MixT’s per-step video segments, DemoWiz’s augmented screencast recording, DemoCut’s concise video, and Kinectograph’s instructor-focused video). We design user interfaces and algorithms for authors and learners to interact with video-based instructional content.
Chapter 4
Do'stlaringiz bilan baham: |