Creating Course Material through the Automation of Lecture Caption Conversion

Salvatore Enrico Paolo Indiogine; Brandon Chi-Thien Le; Sidharth Dhaneshkumar Shah

Download Paper | Permalink

Conference: 2022 ASEE Gulf Southwest Annual Conference
Location: Prairie View, Texas
Publication Date: March 16, 2022
Start Date: March 16, 2022
End Date: March 18, 2022
Page Count: 11
DOI: 10.18260/1-2--39169
Permanent URL: https://peer.asee.org/39169
Download Count: 489

Paper Authors

biography

Salvatore Enrico Paolo Indiogine Texas A&M University

visit author page

Bachelor of Science in Engineering from New Mexico State University and Ph.D. in Curriculum & Instruction from Texas A&M University. I work as an instructional designer at the College of Engineering of Texas A&M University.

visit author page

biography

Brandon Chi-Thien Le Texas A&M University

visit author page

Brandon Le is a Business Honors and Management Information Systems graduate student at Texas A&M University. He has worked with the Texas A&M College of Engineering Studio for Advanced Instruction and Learning for over two years as a Production Assistant, and focuses his work on using technology to enable course development and innovation. Brandon is from Austin, Texas, and plans on going into a career in financial technology.

visit author page

author page

Sidharth Dhaneshkumar Shah

Download Paper | Permalink

Abstract

One major challenge to the creation of online courses in higher education is the high workload required from the faculty in developing the course. Due to time limitations, a common practice is to use lecture capture in regular classrooms, which is then streamed online to students using various techniques. Our research explores the use of Machine Learning to convert a captured classroom lecture into a document file containing transcribed audio, video screenshots, and a word cloud header. The purpose is to automate the inspection and conversion of a large number of video recordings of lecture captures into a readable document that reflects the content of the recorded classroom lecture. The document can then be used as (1) quick and accessible means of inspecting the content of a course or series of lectures, and (2) a starting point for online course development in collaboration with the instructional designers. In addition, a large number of recorded lectures can be indexed and searched for specific textual or image content.

We created a Python program that automates the entire document creation process, beginning with obtaining lecture captures from a video file repository. We then extract the audio and send it to be processed by a Speech to Text API. We utilized a process known as Asynchronous Recognition, which allows up to 480 minutes of audio containing speech to be transcribed into text. The API then outputs a text transcription of the audio. The program then removes the unnecessary words and adds punctuation.

The program uses various tools to extract screenshot images from the lecture capture based on a calculated time interval. To prevent an overload of unnecessary images, fuzzy logic equality was used to automatically determine and remove duplicate images. The program then creates a “word cloud” image derived from the most common words found in the audio transcription.

The audio text, screenshot images, and word cloud are spliced together in a single document and exported in Microsoft Word format to a predetermined repository, which can then be retrieved by the user. The entire process only requires the user to initiate the program. It can process multiple video MP4 files at once and can take from several minutes to several hours of processing.

In considering which Speech to Text API to use, we applied a concept known as Levenshtein distance to test for accuracy between two transcription providers: Google Cloud and AWS. We compared them to each other and to human transcription to determine the most effective provider, meaning it would require the least amount of human correction post-processing. We also compared them in terms of cost. We found that while AWS is the cheaper choice, Google Cloud yielded a significantly lower Levenshtein distance, indicating that significantly fewer corrections would be required.

Our preliminary results show that we can automate the extraction of text and images from captured classroom lectures to yield content that can be used in online course development as well as the creation of an institution-wide searchable database.

Citation
Format

Indiogine, S. E. P., & Le, B. C., & Shah, S. D. (2022, March), Creating Course Material through the Automation of Lecture Caption Conversion Paper presented at 2022 ASEE Gulf Southwest Annual Conference, Prairie View, Texas. 10.18260/1-2--39169

TY - CPAPER
AB - One major challenge to the creation of online courses in higher education is the high workload required from the faculty in developing the course. Due to time limitations, a common practice is to use lecture capture in regular classrooms, which is then streamed online to students using various techniques. Our research explores the use of Machine Learning to convert a captured classroom lecture into a document file containing transcribed audio, video screenshots, and a word cloud header. The purpose is to automate the inspection and conversion of a large number of video recordings of lecture captures into a readable document that reflects the content of the recorded classroom lecture. The document can then be used as (1) quick and accessible means of inspecting the content of a course or series of lectures, and (2) a starting point for online course development in collaboration with the instructional designers. In addition, a large number of recorded lectures can be indexed and searched for specific textual or image content.

We created a Python program that automates the entire document creation process, beginning with obtaining lecture captures from a video file repository. We then extract the audio and send it to be processed by a Speech to Text API. We utilized a process known as Asynchronous Recognition, which allows up to 480 minutes of audio containing speech to be transcribed into text. The API then outputs a text transcription of the audio. The program then removes the unnecessary words and adds punctuation.

The program uses various tools to extract screenshot images from the lecture capture based on a calculated time interval. To prevent an overload of unnecessary images, fuzzy logic equality was used to automatically determine and remove duplicate images. The program then creates a “word cloud” image derived from the most common words found in the audio transcription.

The audio text, screenshot images, and word cloud are spliced together in a single document and exported in Microsoft Word format to a predetermined repository, which can then be retrieved by the user. The entire process only requires the user to initiate the program. It can process multiple video MP4 files at once and can take from several minutes to several hours of processing.

In considering which Speech to Text API to use, we applied a concept known as Levenshtein distance to test for accuracy between two transcription providers: Google Cloud and AWS. We compared them to each other and to human transcription to determine the most effective provider, meaning it would require the least amount of human correction post-processing. We also compared them in terms of cost. We found that while AWS is the cheaper choice, Google Cloud yielded a significantly lower Levenshtein distance, indicating that significantly fewer corrections would be required.

Our preliminary results show that we can automate the extraction of text and images from captured classroom lectures to yield content that can be used in online course development as well as the creation of an institution-wide searchable database.
AU - Salvatore Enrico Paolo Indiogine
AU - Brandon Chi-Thien Le
AU - Sidharth Dhaneshkumar Shah
CY - Prairie View, Texas
DA - 2022/03/16
PB - ASEE Conferences
TI - Creating Course Material through the Automation of Lecture Caption Conversion
UR - https://peer.asee.org/39169
DO - 10.18260/1-2--39169
ER -