Using Deep Learning and Augmented Reality to Improve Accessibility: Inclusive Conversations Using Diarization, Captions, and Visualization

Yun Wang; Colin P. Lualdi; Lawrence Angrave; Guru Nanma Purushotam

Download Paper | Permalink

Conference: 2023 ASEE Annual Conference & Exposition
Location: Baltimore , Maryland
Publication Date: June 25, 2023
Start Date: June 25, 2023
End Date: June 28, 2023
Conference Session: Design in Engineering Education Division (DEED) Technical Session 1
Tagged Division: Design in Engineering Education Division (DEED)
Tagged Topic: Diversity
Page Count: 13
DOI: 10.18260/1-2--44572
Permanent URL: https://peer.asee.org/44572
Download Count: 614

Paper Authors

biography

Yun Wang Undergraduate at University of Illinois Urbana-Champaign

visit author page

Yun Wang is an Undergraduate at University of Illinois Urbana-Champaign. Research interests include using technology, algorithms to improve accessibility and inclusive education.”

visit author page

author page

Colin P. Lualdi University of Illinois Urbana-Champaign orcid.org/0000-0003-2309-4807

biography

Lawrence Angrave University of Illinois Urbana-Champaign orcid.org/0000-0001-9762-7181

visit author page

Dr. Lawrence Angrave is an award-winning computer science Teaching Professor at the University of Illinois Urbana-Champaign. He creates and researches new opportunities for accessible and inclusive equitable education.

visit author page

author page

Guru Nanma Purushotam

Download Paper | Permalink

Abstract

The problem of diarization - identifying different speakers in a conversation stream - has not been sufficiently addressed for deaf and hard-of-hearing students in learning communities such as student design teams in engineering and related STEM disciplines. Though the accuracy of the latest automated real-time speech-to-text systems is now approaching usable low word error rates, the generated text output is an incomplete representation of a multi-party conversation; In short, it solves the “what” but not the “who.” This creates barriers to our ideal of an inclusive and equitable learning community. Thus students who are deaf or hard of hearing are further marginalized and excluded from multi-party peer discussions with non-deaf participants because it is hard to visually follow who is speaking. To address these communication barriers, we utilized the Human Centered Engineering Design framework to identify a set of features that overcomes the above barriers. This paper explores computerized diarization techniques that utilize a wide set of algorithms and audio metrics to assist in speaker identification. These techniques include mel-frequency cepstrum coefficients (MFCC), volume, fundamental frequency identification, and deep learning of voice prints. For the goals described in this paper, a subset of existing algorithms that respected privacy and legal constraints was selected and evaluated for the purposes of identifying speakers using a live audio stream. Several visualization methods were also designed and evaluated. These included visualization of embedding mel-frequency cepstrum, speaker identifier, pitch, volume, and other voice characteristics into a live caption stream. Both diarization and visualization were integrated into a live captioning tool, ScribeAR, previously introduced in ASEE regional proceedings, and rendered using a lightweight Augmented Reality display. In order to facilitate captioning services in areas with limited network connectivity, whisper.cpp, a derivative of OpenAI’s Whisper project, was also incorporated into the application. Links to the open source project are included so that other educators may adopt this inclusive practice. Some accessibility-related opportunities that could be used as motivating design projects for engineering students are described.

Citation
Format

Wang, Y., & Lualdi, C. P., & Angrave, L., & Purushotam, G. N. (2023, June), Using Deep Learning and Augmented Reality to Improve Accessibility: Inclusive Conversations Using Diarization, Captions, and Visualization Paper presented at 2023 ASEE Annual Conference & Exposition, Baltimore , Maryland. 10.18260/1-2--44572

TY - CPAPER
AB - The problem of diarization - identifying different speakers in a conversation stream - has not been sufficiently addressed for deaf and hard-of-hearing students in learning communities such as student design teams in engineering and related STEM disciplines. Though the accuracy of the latest automated real-time speech-to-text systems is now approaching usable low word error rates, the generated text output is an incomplete representation of a multi-party conversation; In short, it solves the “what” but not the “who.” This creates barriers to our ideal of an inclusive and equitable learning community. Thus students who are deaf or hard of hearing are further marginalized and excluded from multi-party peer discussions with non-deaf participants because it is hard to visually follow who is speaking. To address these communication barriers, we utilized the Human Centered Engineering Design framework to identify a set of features that overcomes the above barriers. This paper explores computerized diarization techniques that utilize a wide set of algorithms and audio metrics to assist in speaker identification. These techniques include mel-frequency cepstrum coefficients (MFCC), volume, fundamental frequency identification, and deep learning of voice prints. For the goals described in this paper, a subset of existing algorithms that respected privacy and legal constraints was selected and evaluated for the purposes of identifying speakers using a live audio stream. Several visualization methods were also designed and evaluated. These included visualization of embedding mel-frequency cepstrum, speaker identifier, pitch, volume, and other voice characteristics into a live caption stream. Both diarization and visualization were integrated into a live captioning tool, ScribeAR, previously introduced in ASEE regional proceedings, and rendered using a lightweight Augmented Reality display. In order to facilitate captioning services in areas with limited network connectivity, whisper.cpp, a derivative of OpenAI’s Whisper project, was also incorporated into the application. Links to the open source project are included so that other educators may adopt this inclusive practice. Some accessibility-related opportunities that could be used as motivating design projects for engineering students are described.
AU - Yun Wang
AU - Colin P. Lualdi
AU - Lawrence Angrave
AU - Guru Nanma Purushotam
CY - Baltimore , Maryland
DA - 2023/06/25
PB - ASEE Conferences
TI - Using Deep Learning and Augmented Reality to Improve Accessibility: Inclusive Conversations Using Diarization, Captions, and Visualization
UR - https://peer.asee.org/44572
DO - 10.18260/1-2--44572
ER -