Minneapolis, MN
August 23, 2022
June 26, 2022
June 29, 2022
22
10.18260/1-2--41185
https://peer.asee.org/41185
679
Lawrence Angrave is computer science teaching professor at University of Illinois who playfully creates and researches the use of new software and learning practices often with the goals of improving equity, accessibility, and learning.
Jiaxi Li is a 5-year BS-MS student at the University of Illinois at Urbana-Champaign (UIUC). He is co-advised by Professor Lawrence Angrave and Professor Klara Nahrstedt. He has research interests in Artificial Intelligence for Human Computer Interaction. He has experience in Machine Learning, Computer Vision, and Text Mining.
Ninghan Zhong is a senior student in Computer Science at the University of Illinois at Urbana-Champaign. His research interests are in (but are not limited to) Autonomous Systems, Human-Robot Interactions, Human-Computer Interactions, and Artificial Intelligence. He has experience in computer vision, robotics, and machine learning.
To efficiently create books and other instructional content from videos and further improve accessibility of our course content we needed to solve the scene detection (SD) problem for engineering educational content. We present the pedagogical applications of extracting video images for the purposes of digital book generation and other shareable resources, within the themes of accessibility, inclusive education, universal design for learning and how we solved this problem for engineering education lecture videos. Scene detection refers to the process of merging visually similar frames into a single video segment, and subsequent extraction of semantic features from the video segment (e.g., title, words, transcription segment and representative image). In our approach, local features were extracted from inter-frame similarity comparisons using multiple metrics. These include numerical measures based on optical character recognition (OCR) and pixel similarity with and without face and body position masking. We analyze and discuss the trade-offs in accuracy, performance and computational resources required. By applying these features to a corpus of labeled videos, a support vector machine determined an optimal parametric decision surface to model if adjacent frames were semantically and visually similar or not. The algorithm design, data flow, and system accuracy and performance are presented. We evaluated our system using videos from multiple engineering disciplines where the content was comprised of different presentation styles including traditional paper handouts, Microsoft PowerPoint slides, and digital ink annotations. For each educational video, a comprehensive digital-book composed of lecture clips, slideshow text, and audio transcription content can be generated based on our new scene detection algorithm. Our new scene detection approach was adopted by ClassTranscribe, an inclusive video platform that follows Universal Design for Learning principles. We report on the subsequent experiences and feedback from students who reviewed the generated digital-books as a learning component. We highlight remaining challenges and describe how instructors can use this technology in their own courses. The main contributions of this work are: Identifying why automated scene detection of engineering lecture videos is challenging; Creation of a scene-labeled corpus of videos representative of multiple undergraduate engineering disciplines and lecture styles suitable for training and testing; Description of a set of image metrics and support vector machine-based classification approach; Evaluation of the accuracy, recall and precision of our algorithm; Use of an algorithmic optimization to obviate GPU resources; Student commentary on the digital book interface created from videos using our SD algorithm; Publishing of a labeled corpus of video content to encourage additional research in this area; and an independent open-source scene extraction tool that can be used pedagogically by the ASEE community e.g., to remix and create fun shareable instructional content memes, and to create accessible audio and text descriptions for students who are blind or have low vision. Text extracted from each scene can also used to improve the accuracy of captions and transcripts, improving accessibility for students who are hard of hearing or deaf.
Angrave, L., & Li, J., & Zhong, N. (2022, August), Creating TikToks, Memes, Accessible Content, and Books from Engineering Videos? First Solve the Scene Detection Problem Paper presented at 2022 ASEE Annual Conference & Exposition, Minneapolis, MN. 10.18260/1-2--41185
ASEE holds the copyright on this document. It may be read by the public free of charge. Authors may archive their work on personal websites or in institutional repositories with the following citation: © 2022 American Society for Engineering Education. Other scholars may excerpt or quote from these materials with the same citation. When excerpting or quoting from Conference Proceedings, authors should, in addition to noting the ASEE copyright, list all the original authors and their institutions and name the host city of the conference. - Last updated April 1, 2015