Asee peer logo

Creating Course Material through the Automation of Lecture Caption Conversion

Download Paper |

Conference

2022 ASEE Gulf Southwest Annual Conference

Location

Prairie View, Texas

Publication Date

March 16, 2022

Start Date

March 16, 2022

End Date

March 18, 2022

Page Count

11

DOI

10.18260/1-2--39169

Permanent URL

https://peer.asee.org/39169

Download Count

489

Paper Authors

biography

Salvatore Enrico Paolo Indiogine Texas A&M University

visit author page

Bachelor of Science in Engineering from New Mexico State University and Ph.D. in Curriculum & Instruction from Texas A&M University. I work as an instructional designer at the College of Engineering of Texas A&M University.

visit author page

biography

Brandon Chi-Thien Le Texas A&M University

visit author page

Brandon Le is a Business Honors and Management Information Systems graduate student at Texas A&M University. He has worked with the Texas A&M College of Engineering Studio for Advanced Instruction and Learning for over two years as a Production Assistant, and focuses his work on using technology to enable course development and innovation. Brandon is from Austin, Texas, and plans on going into a career in financial technology.

visit author page

author page

Sidharth Dhaneshkumar Shah

Download Paper |

Abstract

One major challenge to the creation of online courses in higher education is the high workload required from the faculty in developing the course. Due to time limitations, a common practice is to use lecture capture in regular classrooms, which is then streamed online to students using various techniques. Our research explores the use of Machine Learning to convert a captured classroom lecture into a document file containing transcribed audio, video screenshots, and a word cloud header. The purpose is to automate the inspection and conversion of a large number of video recordings of lecture captures into a readable document that reflects the content of the recorded classroom lecture. The document can then be used as (1) quick and accessible means of inspecting the content of a course or series of lectures, and (2) a starting point for online course development in collaboration with the instructional designers. In addition, a large number of recorded lectures can be indexed and searched for specific textual or image content.

We created a Python program that automates the entire document creation process, beginning with obtaining lecture captures from a video file repository. We then extract the audio and send it to be processed by a Speech to Text API. We utilized a process known as Asynchronous Recognition, which allows up to 480 minutes of audio containing speech to be transcribed into text. The API then outputs a text transcription of the audio. The program then removes the unnecessary words and adds punctuation.

The program uses various tools to extract screenshot images from the lecture capture based on a calculated time interval. To prevent an overload of unnecessary images, fuzzy logic equality was used to automatically determine and remove duplicate images. The program then creates a “word cloud” image derived from the most common words found in the audio transcription.

The audio text, screenshot images, and word cloud are spliced together in a single document and exported in Microsoft Word format to a predetermined repository, which can then be retrieved by the user. The entire process only requires the user to initiate the program. It can process multiple video MP4 files at once and can take from several minutes to several hours of processing.

In considering which Speech to Text API to use, we applied a concept known as Levenshtein distance to test for accuracy between two transcription providers: Google Cloud and AWS. We compared them to each other and to human transcription to determine the most effective provider, meaning it would require the least amount of human correction post-processing. We also compared them in terms of cost. We found that while AWS is the cheaper choice, Google Cloud yielded a significantly lower Levenshtein distance, indicating that significantly fewer corrections would be required.

Our preliminary results show that we can automate the extraction of text and images from captured classroom lectures to yield content that can be used in online course development as well as the creation of an institution-wide searchable database.

Indiogine, S. E. P., & Le, B. C., & Shah, S. D. (2022, March), Creating Course Material through the Automation of Lecture Caption Conversion Paper presented at 2022 ASEE Gulf Southwest Annual Conference, Prairie View, Texas. 10.18260/1-2--39169

ASEE holds the copyright on this document. It may be read by the public free of charge. Authors may archive their work on personal websites or in institutional repositories with the following citation: © 2022 American Society for Engineering Education. Other scholars may excerpt or quote from these materials with the same citation. When excerpting or quoting from Conference Proceedings, authors should, in addition to noting the ASEE copyright, list all the original authors and their institutions and name the host city of the conference. - Last updated April 1, 2015