Asee peer logo

Mining Student-Generated Textual Data In MOOCS and Quantifying Their Effects on Student Performance and Learning Outcomes

Download Paper |


2014 ASEE Annual Conference & Exposition


Indianapolis, Indiana

Publication Date

June 15, 2014

Start Date

June 15, 2014

End Date

June 18, 2014



Conference Session

Data Analytics in Education

Tagged Division

Computers in Education

Page Count


Page Numbers

24.907.1 - 24.907.14



Permanent URL

Download Count


Request a correction

Paper Authors

author page

Conrad Tucker Pennsylvania State University, University Park


Barton K. Pursel The Pennsylvania State University

visit author page

Barton K. Pursel, Ph.D., is a Research Project Manager at the Pennsylvania State University, focusing on the intersection of technology and pedagogy. Barton works collaboratively with faculty across disciplines to explore how emerging technologies and trends, such as MOOCs, digital badges, and learning analytics, impacts both students and instructors.

visit author page

author page

Anna Divinsky

Download Paper |


Mining Student-Generated Textual Data In MOOCS And Quantifying Their Effects on Student Performance and Learning OutcomesAbstractMassive Open Online Courses (MOOCs) are freely available courses offered online for distancebased learners who have access to the internet. The tremendous success of MOOCs can in part,be attributed to their global availability, enabling anyone in the world to sign up/drop courses atany time during the course offerings. A single course enrollment in MOOCs can range between10,000 to 200,000 students, hereby providing a potentially rich venue for large scale digital data(e.g., student course comments, temporal and geo-location data, etc.). However, despite theoverabundance of digital data generated through MOOCs, research into how student interactionsin MOOCs translates to student performance and learning outcomes has been limited.The objective of this research is to mine student-generated textual data (e.g., online discussionforums) existing in MOOCs in order to quantify their impact on student performance andlearning outcomes. Student performance is quantified based on grades on course homeworkassignments, quizzes and examinations. Similar to in-class learning environments, studentsenrolled in MOOCs self-organize and form learning groups, where course topics andassignments can be discussed. One of the major benefits of MOOC data is that student networksand discussion therein are digitally stored and readily available for statistical analysis andmodeling. The proposed methodology employs robust natural language processing techniquesand data mining algorithms to quantify temporal changes in individual/group sentiments relatingto course topics and instructor clarity. Researchers aim to determine whether textual content(e.g., quality VS quantity of student forum discussions) expressed through MOOCs can serve asleading indicators of student performance in MOOCs. A case study involving two MOOCsoffered at University X, is used to validate the proposed methodology.

Tucker, C., & Pursel, B. K., & Divinsky, A. (2014, June), Mining Student-Generated Textual Data In MOOCS and Quantifying Their Effects on Student Performance and Learning Outcomes Paper presented at 2014 ASEE Annual Conference & Exposition, Indianapolis, Indiana. 10.18260/1-2--22840

ASEE holds the copyright on this document. It may be read by the public free of charge. Authors may archive their work on personal websites or in institutional repositories with the following citation: © 2014 American Society for Engineering Education. Other scholars may excerpt or quote from these materials with the same citation. When excerpting or quoting from Conference Proceedings, authors should, in addition to noting the ASEE copyright, list all the original authors and their institutions and name the host city of the conference. - Last updated April 1, 2015