Automated and Scalable Assessment: Present and Future

Edward F. Gehringer

Download Paper | Permalink

Conference: 2015 ASEE Annual Conference & Exposition
Location: Seattle, Washington
Publication Date: June 14, 2015
Start Date: June 14, 2015
End Date: June 17, 2015
ISBN: 978-0-692-50180-1
ISSN: 2153-5965
Conference Session: Emerging Computing and Information Technologies II
Tagged Division: Computing & Information Technology
Page Count: 8
Page Numbers: 26.270.1 - 26.270.8
DOI: 10.18260/p.23609
Permanent URL: https://peer.asee.org/23609
Download Count: 670

Paper Authors

biography

Edward F. Gehringer North Carolina State University

visit author page

Dr. Gehringer is an associate professor in the Departments of Computer Science, and Electrical & Computer Engineering. His research interests include computerized assessment systems, and the use of natural-language processing to improve the quality of reviewing. He teaches courses in the area of programming, computer architecture, object-oriented design, and ethics in computing. He is the lead PI on a multi-institution NSF IUSE grant to construct web services for online peer-review systems.

visit author page

Download Paper | Permalink

Abstract

Scalable Assessment: Present and FutureA perennial problem in teaching is securing enough resources to adequately assessstudent work. In recent years, tight budgets have constrained the dollars available to hireteaching assistants. Concurrent with this trend, the rise of MOOCs, has raised assessmentchallenges to a new scale. In MOOCs, it’s necessary to get feedback to, and assigngrades to, thousands of students who don’t bring in any revenue. As MOOCs begin tocredential students, accurate assessment will become even more important. These twodevelopments have created an acute need for scalable assessment mechanisms, to assesslarge numbers of students without a proportionate increase in costs.There are three main approaches to scalable assessment: autograding, peer review, andautomated essay scoring. Autograding is provided, on a basic scale, by any LMS that canadminister quizzes made up of multiple-choice or fill-in-the-blank questions.Autograding is also done by publishers’ apps like Wiley Plus, third-party grading systemslike Webassign, and content-specific grading applications like Web-CAT for computerprogramming assignments. Autograding scales very well, because almost all of the effortis expended in creating the quiz, which can then be administered to any number ofstudents automatically. Autograding systems can randomize the ordering of questionsand answers, and randomize parameters to numeric problems, making it harder to cheat.With a little more effort, multiple-choice distractors can be keyed to specificmisconceptions. Advanced analytics can identify which material the students are havingtrouble with.Peer review is supported by almost every LMS, as well as standalone systems likeCalibrated Peer Review, Peerceptiv, and Peer Scholar. It is also a feature of severalMOOCs, notably Coursera and Canvas. Its greatest strength is the ability to give eachstudent formative assessment on their work. When used for summative assessment inMOOCs, its accuracy is questionable, especially for assignments that the students do notunderstand well. However, researchers are working on better techniques to identifyaccurate reviews, which would improve summative scores. Researchers are alsodeveloping techniques to allow peer reviewers to annotate, and respond to annotations, onsubmitted documents. Other research involves natural-language processing techniques toestimate the quality of a review before it is submitted, and give feedback to the revieweron how to improve the review.Automated essay scoring uses software that predicts how an instructor would score apiece of prose, by using metrics such as correlation of its vocabulary with essays scoredhigh by humans, average word length, and number of grammatical errors. An instructorneeds to train an AES system by scoring a number of essays (e.g., 100) to teach thesystem which characteristics are important. AES is used by MOOCs such as EdX, aswell as for high-stakes educational testing, and research is ongoing in applying similartechniques to the grading of non-prose submissions.The presentation will cover the current capabilities and research directions for thesesystems, and also show how they can be used to improve assessment in any context, byproviding greater formative feedback to students.

Citation
Format

Gehringer, E. F. (2015, June), Automated and Scalable Assessment: Present and Future Paper presented at 2015 ASEE Annual Conference & Exposition, Seattle, Washington. 10.18260/p.23609

TY - CPAPER
AB - Scalable Assessment: Present and FutureA perennial problem in teaching is securing enough resources to adequately assessstudent work. In recent years, tight budgets have constrained the dollars available to hireteaching assistants. Concurrent with this trend, the rise of MOOCs, has raised assessmentchallenges to a new scale. In MOOCs, it’s necessary to get feedback to, and assigngrades to, thousands of students who don’t bring in any revenue. As MOOCs begin tocredential students, accurate assessment will become even more important. These twodevelopments have created an acute need for scalable assessment mechanisms, to assesslarge numbers of students without a proportionate increase in costs.There are three main approaches to scalable assessment: autograding, peer review, andautomated essay scoring. Autograding is provided, on a basic scale, by any LMS that canadminister quizzes made up of multiple-choice or fill-in-the-blank questions.Autograding is also done by publishers’ apps like Wiley Plus, third-party grading systemslike Webassign, and content-specific grading applications like Web-CAT for computerprogramming assignments. Autograding scales very well, because almost all of the effortis expended in creating the quiz, which can then be administered to any number ofstudents automatically. Autograding systems can randomize the ordering of questionsand answers, and randomize parameters to numeric problems, making it harder to cheat.With a little more effort, multiple-choice distractors can be keyed to specificmisconceptions. Advanced analytics can identify which material the students are havingtrouble with.Peer review is supported by almost every LMS, as well as standalone systems likeCalibrated Peer Review, Peerceptiv, and Peer Scholar. It is also a feature of severalMOOCs, notably Coursera and Canvas. Its greatest strength is the ability to give eachstudent formative assessment on their work. When used for summative assessment inMOOCs, its accuracy is questionable, especially for assignments that the students do notunderstand well. However, researchers are working on better techniques to identifyaccurate reviews, which would improve summative scores. Researchers are alsodeveloping techniques to allow peer reviewers to annotate, and respond to annotations, onsubmitted documents. Other research involves natural-language processing techniques toestimate the quality of a review before it is submitted, and give feedback to the revieweron how to improve the review.Automated essay scoring uses software that predicts how an instructor would score apiece of prose, by using metrics such as correlation of its vocabulary with essays scoredhigh by humans, average word length, and number of grammatical errors. An instructorneeds to train an AES system by scoring a number of essays (e.g., 100) to teach thesystem which characteristics are important. AES is used by MOOCs such as EdX, aswell as for high-stakes educational testing, and research is ongoing in applying similartechniques to the grading of non-prose submissions.The presentation will cover the current capabilities and research directions for thesesystems, and also show how they can be used to improve assessment in any context, byproviding greater formative feedback to students.
AU - Edward F. Gehringer
CY - Seattle, Washington
DA - 2015/06/14
PB - ASEE Conferences
TI - Automated and Scalable Assessment: Present and Future
UR - https://peer.asee.org/23609
DO - 10.18260/p.23609
ER -