June 23, 2013
June 23, 2013
June 26, 2013
23.155.1 - 23.155.14
Evaluating a Computational Approach to Identify Domain-specific VocabularyUnderstanding domain-specific vocabulary is often a learning objective in the engineeringcurriculum. This vocabulary includes “technical jargon” in an engineering discipline andlearning the vocabulary of the field is an important technical competency. Final exams, forinstance, include such terms as they attempt to evaluate a students’ mastery of course concepts.These terms may be commonly used in a course, but are unfamiliar particularly to a student newto the field, i.e. a freshman. The corpus of language common to both the instructor and studentconverges as the student masters the domain vocabulary. However, at the freshman-level thedifference in vocabulary between the student and instructor is greatest. Language also plays akey role in creating an accessible and inclusive learning environment. Research suggests thatnew students may experience a sense of alienation in the classroom due in part to the differencein the way language is used in the learning environment relative to their home community. Thisgap in common vocabulary forms a communication-barrier that prohibits access to learning andundermines valid assessment.In this study, the authors attempt to systematically identify domain-specific vocabulary using acomputational and statistical approach to analyze the language used on engineering exams. Thegoal is to create an automated system to identify domain-specific vocabulary on exams or otherteaching materials to help both the instructor teach, and the students navigate, the language of thefield more effectively.The authors employed strategies from the fields of higher education, industrial engineering,computational linguistics, and statistics to create an algorithmic approach to identify andcategorize domain-specific language on engineering examinations. A critical component of thisstudy is the ability to distinguish domain-specific vocabulary from “everyday” language. Thegoal is to preserve the integrity of the exam and help faculty and students navigate the learningof technical language more effectively. Specifically, the authors are developing a computerprogram which automatically identifies domain-specific terms on any document. The programhas been tested on a databank of over 2800 exams (developed from 2000 to present) from a largeNorth American university. Specifically, each word from each exam is examined using a Term-Frequency Inverse Document-Frequency (TF-IDF) algorithm to generate lists of characteristicterms. Then, these lists are compared using IBM SPSS statistics software to further isolatediscipline-specific vocabulary.This study analyses the effectiveness of this program across several engineering disciplines tosee whether the language is categorized accurately. To date, the authors have analyzed thevocabulary of 15 exams in detail and results indicate that the program is able to accuratelyidentify discipline-specific words. Results of this work will be presented and an analysis of thedata will be included in the paper. Going forward, the authors anticipate that this method canalso help flag low-frequency non-domain-specific language on engineering exams and eventuallylead to software that can highlight potentially inaccessible language: both technical vocabularyand separately non-technical but linguistically or culturally difficult vocabulary that may requireexplanation.
ASEE holds the copyright on this document. It may be read by the public free of charge. Authors may archive their work on personal websites or in institutional repositories with the following citation: © 2013 American Society for Engineering Education. Other scholars may excerpt or quote from these materials with the same citation. When excerpting or quoting from Conference Proceedings, authors should, in addition to noting the ASEE copyright, list all the original authors and their institutions and name the host city of the conference. - Last updated April 1, 2015