An Automated Approach for Finding Course-specific Vocabulary

Chirag Variawa; Susan McCahan; Mark Chignell

Download Paper | Permalink

Conference: 2013 ASEE Annual Conference & Exposition
Location: Atlanta, Georgia
Publication Date: June 23, 2013
Start Date: June 23, 2013
End Date: June 26, 2013
ISSN: 2153-5965
Conference Session: First-Year Programs (FPD) Poster Session
Tagged Division: First-Year Programs
Page Count: 14
Page Numbers: 23.155.1 - 23.155.14
DOI: 10.18260/1-2--19169
Permanent URL: https://peer.asee.org/19169
Download Count: 526

Paper Authors

biography

Chirag Variawa University of Toronto

visit author page

Chirag Variawa is a Ph.D. candidate in Industrial Engineering at the University of Toronto. His research is in using artificial intelligence to maximize the accessibility of language used in engineering education instructional materials. His work on the Board of Governors at the University of Toronto further serves to improve accessibility for all members of the university community.

visit author page

biography

Susan McCahan University of Toronto

visit author page

Dr. Susan McCahan is vice-dean, Undergraduate, and is a professor in the Department of Mechanical and Industrial Engineering in the Faculty of Applied Science and Engineering at the University of Toronto.

visit author page

biography

Mark Chignell University of Toronto

visit author page

Mark Chignell is a professor of Mechanical and Industrial Engineering at the University of Toronto where he has been on the faculty since 1990. Prior to that he was an assistant professor in Industrial and Systems Engineering at the University of Southern California from 1984 to 1990. He earned the Ph.D. in Psychology from the University of Canterbury in New Zealand in 1981, and an M.S. in Industrial and Systems Engineering from Ohio State in 1984. Mark is currently president of Vocalage Inc., a University of Toronto spinoff company, director of the Interactive Media Lab, and a visiting scientist at both the IBM Centre for Advanced Studies and Keio University in Japan.

visit author page

Download Paper | Permalink

Abstract

Evaluating a Computational Approach to Identify Domain-specific VocabularyUnderstanding domain-specific vocabulary is often a learning objective in the engineeringcurriculum. This vocabulary includes “technical jargon” in an engineering discipline andlearning the vocabulary of the field is an important technical competency. Final exams, forinstance, include such terms as they attempt to evaluate a students’ mastery of course concepts.These terms may be commonly used in a course, but are unfamiliar particularly to a student newto the field, i.e. a freshman. The corpus of language common to both the instructor and studentconverges as the student masters the domain vocabulary. However, at the freshman-level thedifference in vocabulary between the student and instructor is greatest. Language also plays akey role in creating an accessible and inclusive learning environment. Research suggests thatnew students may experience a sense of alienation in the classroom due in part to the differencein the way language is used in the learning environment relative to their home community. Thisgap in common vocabulary forms a communication-barrier that prohibits access to learning andundermines valid assessment.In this study, the authors attempt to systematically identify domain-specific vocabulary using acomputational and statistical approach to analyze the language used on engineering exams. Thegoal is to create an automated system to identify domain-specific vocabulary on exams or otherteaching materials to help both the instructor teach, and the students navigate, the language of thefield more effectively.The authors employed strategies from the fields of higher education, industrial engineering,computational linguistics, and statistics to create an algorithmic approach to identify andcategorize domain-specific language on engineering examinations. A critical component of thisstudy is the ability to distinguish domain-specific vocabulary from “everyday” language. Thegoal is to preserve the integrity of the exam and help faculty and students navigate the learningof technical language more effectively. Specifically, the authors are developing a computerprogram which automatically identifies domain-specific terms on any document. The programhas been tested on a databank of over 2800 exams (developed from 2000 to present) from a largeNorth American university. Specifically, each word from each exam is examined using a Term-Frequency Inverse Document-Frequency (TF-IDF) algorithm to generate lists of characteristicterms. Then, these lists are compared using IBM SPSS statistics software to further isolatediscipline-specific vocabulary.This study analyses the effectiveness of this program across several engineering disciplines tosee whether the language is categorized accurately. To date, the authors have analyzed thevocabulary of 15 exams in detail and results indicate that the program is able to accuratelyidentify discipline-specific words. Results of this work will be presented and an analysis of thedata will be included in the paper. Going forward, the authors anticipate that this method canalso help flag low-frequency non-domain-specific language on engineering exams and eventuallylead to software that can highlight potentially inaccessible language: both technical vocabularyand separately non-technical but linguistically or culturally difficult vocabulary that may requireexplanation.

Citation
Format

Variawa, C., & McCahan, S., & Chignell, M. (2013, June), An Automated Approach for Finding Course-specific Vocabulary Paper presented at 2013 ASEE Annual Conference & Exposition, Atlanta, Georgia. 10.18260/1-2--19169

TY - CPAPER
AB - Evaluating a Computational Approach to Identify Domain-specific VocabularyUnderstanding domain-specific vocabulary is often a learning objective in the engineeringcurriculum. This vocabulary includes “technical jargon” in an engineering discipline andlearning the vocabulary of the field is an important technical competency. Final exams, forinstance, include such terms as they attempt to evaluate a students’ mastery of course concepts.These terms may be commonly used in a course, but are unfamiliar particularly to a student newto the field, i.e. a freshman. The corpus of language common to both the instructor and studentconverges as the student masters the domain vocabulary. However, at the freshman-level thedifference in vocabulary between the student and instructor is greatest. Language also plays akey role in creating an accessible and inclusive learning environment. Research suggests thatnew students may experience a sense of alienation in the classroom due in part to the differencein the way language is used in the learning environment relative to their home community. Thisgap in common vocabulary forms a communication-barrier that prohibits access to learning andundermines valid assessment.In this study, the authors attempt to systematically identify domain-specific vocabulary using acomputational and statistical approach to analyze the language used on engineering exams. Thegoal is to create an automated system to identify domain-specific vocabulary on exams or otherteaching materials to help both the instructor teach, and the students navigate, the language of thefield more effectively.The authors employed strategies from the fields of higher education, industrial engineering,computational linguistics, and statistics to create an algorithmic approach to identify andcategorize domain-specific language on engineering examinations. A critical component of thisstudy is the ability to distinguish domain-specific vocabulary from “everyday” language. Thegoal is to preserve the integrity of the exam and help faculty and students navigate the learningof technical language more effectively. Specifically, the authors are developing a computerprogram which automatically identifies domain-specific terms on any document. The programhas been tested on a databank of over 2800 exams (developed from 2000 to present) from a largeNorth American university. Specifically, each word from each exam is examined using a Term-Frequency Inverse Document-Frequency (TF-IDF) algorithm to generate lists of characteristicterms. Then, these lists are compared using IBM SPSS statistics software to further isolatediscipline-specific vocabulary.This study analyses the effectiveness of this program across several engineering disciplines tosee whether the language is categorized accurately. To date, the authors have analyzed thevocabulary of 15 exams in detail and results indicate that the program is able to accuratelyidentify discipline-specific words. Results of this work will be presented and an analysis of thedata will be included in the paper. Going forward, the authors anticipate that this method canalso help flag low-frequency non-domain-specific language on engineering exams and eventuallylead to software that can highlight potentially inaccessible language: both technical vocabularyand separately non-technical but linguistically or culturally difficult vocabulary that may requireexplanation.
AU - Chirag Variawa
AU - Susan McCahan
AU - Mark Chignell
CY - Atlanta, Georgia
DA - 2013/06/23
PB - ASEE Conferences
TI - An Automated Approach for Finding Course-specific Vocabulary
UR - https://peer.asee.org/19169
DO - 10.18260/1-2--19169
ER -