Creating TikToks, Memes, Accessible Content, and Books from Engineering Videos? First Solve the Scene Detection Problem

Lawrence Angrave; Jiaxi Li; Ninghan Zhong

Download Paper | Permalink

Conference: 2022 ASEE Annual Conference & Exposition
Location: Minneapolis, MN
Publication Date: August 23, 2022
Start Date: June 26, 2022
End Date: June 29, 2022
Conference Session: NEE Technical Session - Innovative Teaching Strategies I
Page Count: 22
DOI: 10.18260/1-2--41185
Permanent URL: https://peer.asee.org/41185
Download Count: 737

Paper Authors

biography

Lawrence Angrave University of Illinois at Urbana - Champaign orcid.org/0000-0001-9762-7181

visit author page

Lawrence Angrave is computer science teaching professor at University of Illinois who playfully creates and researches the use of new software and learning practices often with the goals of improving equity, accessibility, and learning.

visit author page

biography

Jiaxi Li University of Illinois at Urbana - Champaign

visit author page

Jiaxi Li is a 5-year BS-MS student at the University of Illinois at Urbana-Champaign (UIUC). He is co-advised by Professor Lawrence Angrave and Professor Klara Nahrstedt. He has research interests in Artificial Intelligence for Human Computer Interaction. He has experience in Machine Learning, Computer Vision, and Text Mining.

visit author page

biography

Ninghan Zhong University of Illinois at Urbana - Champaign

visit author page

Ninghan Zhong is a senior student in Computer Science at the University of Illinois at Urbana-Champaign. His research interests are in (but are not limited to) Autonomous Systems, Human-Robot Interactions, Human-Computer Interactions, and Artificial Intelligence. He has experience in computer vision, robotics, and machine learning.

visit author page

Download Paper | Permalink

Abstract

To efficiently create books and other instructional content from videos and further improve accessibility of our course content we needed to solve the scene detection (SD) problem for engineering educational content. We present the pedagogical applications of extracting video images for the purposes of digital book generation and other shareable resources, within the themes of accessibility, inclusive education, universal design for learning and how we solved this problem for engineering education lecture videos. Scene detection refers to the process of merging visually similar frames into a single video segment, and subsequent extraction of semantic features from the video segment (e.g., title, words, transcription segment and representative image). In our approach, local features were extracted from inter-frame similarity comparisons using multiple metrics. These include numerical measures based on optical character recognition (OCR) and pixel similarity with and without face and body position masking. We analyze and discuss the trade-offs in accuracy, performance and computational resources required. By applying these features to a corpus of labeled videos, a support vector machine determined an optimal parametric decision surface to model if adjacent frames were semantically and visually similar or not. The algorithm design, data flow, and system accuracy and performance are presented. We evaluated our system using videos from multiple engineering disciplines where the content was comprised of different presentation styles including traditional paper handouts, Microsoft PowerPoint slides, and digital ink annotations. For each educational video, a comprehensive digital-book composed of lecture clips, slideshow text, and audio transcription content can be generated based on our new scene detection algorithm. Our new scene detection approach was adopted by ClassTranscribe, an inclusive video platform that follows Universal Design for Learning principles. We report on the subsequent experiences and feedback from students who reviewed the generated digital-books as a learning component. We highlight remaining challenges and describe how instructors can use this technology in their own courses. The main contributions of this work are: Identifying why automated scene detection of engineering lecture videos is challenging; Creation of a scene-labeled corpus of videos representative of multiple undergraduate engineering disciplines and lecture styles suitable for training and testing; Description of a set of image metrics and support vector machine-based classification approach; Evaluation of the accuracy, recall and precision of our algorithm; Use of an algorithmic optimization to obviate GPU resources; Student commentary on the digital book interface created from videos using our SD algorithm; Publishing of a labeled corpus of video content to encourage additional research in this area; and an independent open-source scene extraction tool that can be used pedagogically by the ASEE community e.g., to remix and create fun shareable instructional content memes, and to create accessible audio and text descriptions for students who are blind or have low vision. Text extracted from each scene can also used to improve the accuracy of captions and transcripts, improving accessibility for students who are hard of hearing or deaf.

Citation
Format

Angrave, L., & Li, J., & Zhong, N. (2022, August), Creating TikToks, Memes, Accessible Content, and Books from Engineering Videos? First Solve the Scene Detection Problem Paper presented at 2022 ASEE Annual Conference & Exposition, Minneapolis, MN. 10.18260/1-2--41185

TY  - CPAPER
AB  - To efficiently create books and other instructional content from videos and further improve accessibility of our course content we needed to solve the scene detection (SD) problem for engineering educational content. We present the pedagogical applications of extracting video images for the purposes of digital book generation and other shareable resources, within the themes of accessibility, inclusive education, universal design for learning and how we solved this problem for engineering education lecture videos. Scene detection refers to the process of merging visually similar frames into a single video segment, and subsequent extraction of semantic features from the video segment (e.g., title, words, transcription segment and representative image). In our approach, local features were extracted from inter-frame similarity comparisons using multiple metrics. These include numerical measures based on optical character recognition (OCR) and pixel similarity with and without face and body position masking. We analyze and discuss the trade-offs in accuracy, performance and computational resources required. By applying these features to a corpus of labeled videos, a support vector machine determined an optimal parametric decision surface to model if adjacent frames were semantically and visually similar or not. The algorithm design, data flow, and system accuracy and performance are presented. We evaluated our system using videos from multiple engineering disciplines where the content was comprised of different presentation styles including traditional paper handouts, Microsoft PowerPoint slides, and digital ink annotations. For each educational video, a comprehensive digital-book composed of lecture clips, slideshow text, and audio transcription content can be generated based on our new scene detection algorithm. Our new scene detection approach was adopted by ClassTranscribe, an inclusive video platform that follows Universal Design for Learning principles. We report on the subsequent experiences and feedback from students who reviewed the generated digital-books as a learning component. We highlight remaining challenges and describe how instructors can use this technology in their own courses. The main contributions of this work are: Identifying why automated scene detection of engineering lecture videos is challenging; Creation of a scene-labeled corpus of videos representative of multiple undergraduate engineering disciplines and lecture styles suitable for training and testing; Description of a set of image metrics and support vector machine-based classification approach; Evaluation of the accuracy, recall and precision of our algorithm; Use of an algorithmic optimization to obviate GPU resources;  Student commentary on the digital book interface created from videos using our SD algorithm; Publishing of a labeled corpus of video content to encourage additional research in this area; and an independent open-source scene extraction tool that can be used pedagogically by the ASEE community e.g., to remix and create fun shareable instructional content memes, and to create accessible audio and text descriptions for students who are blind or have low vision. Text extracted from each scene can also used to improve the accuracy of captions and transcripts, improving accessibility for students who are hard of hearing or deaf.
AU  - Lawrence Angrave
AU  - Jiaxi Li
AU  - Ninghan Zhong
CY  - Minneapolis, MN
DA  - 2022/08/23
PB  - ASEE Conferences
TI  - Creating TikToks, Memes, Accessible Content, and Books from Engineering Videos? First Solve the Scene Detection Problem
UR  - https://peer.asee.org/41185
DO  - 10.18260/1-2--41185
ER  -