Baltimore , Maryland
June 25, 2023
June 25, 2023
June 28, 2023
Design in Engineering Education Division (DEED) Technical Session 1
Design in Engineering Education Division (DEED)
Diversity
13
10.18260/1-2--44572
https://peer.asee.org/44572
528
Yun Wang is an Undergraduate at University of Illinois Urbana-Champaign. Research interests include using technology, algorithms to improve accessibility and inclusive education.”
Dr. Lawrence Angrave is an award-winning computer science Teaching Professor at the University of Illinois Urbana-Champaign. He creates and researches new opportunities for accessible and inclusive equitable education.
The problem of diarization - identifying different speakers in a conversation stream - has not been sufficiently addressed for deaf and hard-of-hearing students in learning communities such as student design teams in engineering and related STEM disciplines. Though the accuracy of the latest automated real-time speech-to-text systems is now approaching usable low word error rates, the generated text output is an incomplete representation of a multi-party conversation; In short, it solves the “what” but not the “who.” This creates barriers to our ideal of an inclusive and equitable learning community. Thus students who are deaf or hard of hearing are further marginalized and excluded from multi-party peer discussions with non-deaf participants because it is hard to visually follow who is speaking. To address these communication barriers, we utilized the Human Centered Engineering Design framework to identify a set of features that overcomes the above barriers. This paper explores computerized diarization techniques that utilize a wide set of algorithms and audio metrics to assist in speaker identification. These techniques include mel-frequency cepstrum coefficients (MFCC), volume, fundamental frequency identification, and deep learning of voice prints. For the goals described in this paper, a subset of existing algorithms that respected privacy and legal constraints was selected and evaluated for the purposes of identifying speakers using a live audio stream. Several visualization methods were also designed and evaluated. These included visualization of embedding mel-frequency cepstrum, speaker identifier, pitch, volume, and other voice characteristics into a live caption stream. Both diarization and visualization were integrated into a live captioning tool, ScribeAR, previously introduced in ASEE regional proceedings, and rendered using a lightweight Augmented Reality display. In order to facilitate captioning services in areas with limited network connectivity, whisper.cpp, a derivative of OpenAI’s Whisper project, was also incorporated into the application. Links to the open source project are included so that other educators may adopt this inclusive practice. Some accessibility-related opportunities that could be used as motivating design projects for engineering students are described.
Wang, Y., & Lualdi, C. P., & Angrave, L., & Purushotam, G. N. (2023, June), Using Deep Learning and Augmented Reality to Improve Accessibility: Inclusive Conversations Using Diarization, Captions, and Visualization Paper presented at 2023 ASEE Annual Conference & Exposition, Baltimore , Maryland. 10.18260/1-2--44572
ASEE holds the copyright on this document. It may be read by the public free of charge. Authors may archive their work on personal websites or in institutional repositories with the following citation: © 2023 American Society for Engineering Education. Other scholars may excerpt or quote from these materials with the same citation. When excerpting or quoting from Conference Proceedings, authors should, in addition to noting the ASEE copyright, list all the original authors and their institutions and name the host city of the conference. - Last updated April 1, 2015