June 24, 2017
June 24, 2017
June 28, 2017
Computers in Education
Synthesis of clustering techniques in educational data mining
With the increasing demand for high quality education coupled with geographic and logistic limitations of the traditional in-class education system, educational institutions are resorting to alternate forms of knowledge dissemination through online learning environments - such as Massive Open Online Courses (MOOC). These learning environments are now producing a tremendous amount of data that can provide deep insights into learning processes and learner behaviors. The large amounts of data that are generated require careful processing to convert them into actionable insights. The process of educational data mining (EDM) is concerned with developing methods for exploring data that come from educational settings, and using those methods to better understand students and the settings in which they learn. As is commonly known, educational platforms are different from classroom settings in that they allow students to register without regard to geographic location, financial and academic status; and to participate in and drop out of courses whenever they want with very little consequences. Naturally, learner behavior and motivation on such platforms is more diverse and very different from those in a traditional educational setup. Investigating types of learners and their behavioral traits is very important for devising effective pedagogical strategies for online learning.
Researchers have tried to use many traditional data mining techniques for studying behavioral patterns of online learners. Romero et al. (1 ,3) studies the application of traditional data mining techniques for EDM, particularly web-based and adaptive platforms. Baker et al. (2) reviews past trends in EDM and the kind of research questions that researchers have been trying to answer over the years. Merceron et al. (4) and Baker et al. (5) provide case studies of the various machine learning and visualization tools that have been applied to EDM. Castro, Felix et al. (7) provides a detailed study on the use of clustering techniques that have been widely used in EDM. This paper focuses on one such data mining technique – namely, the use of clustering techniques for understanding learner types that are typical in an online learning environment. Our goal is to provide a deep synthesis of clustering techniques in educational data mining.
In this paper, we investigate the use of clustering techniques for identifying learner types in Massive Open Online Courses (MOOCs). We discuss some of the challenges presented by such a study and compare different clustering techniques in the context of educational data. Following that, we describe and demonstrate the use of a popular clustering algorithm - the K means algorithm for learner classification. The final section of the paper will focus on the use of K means clustering for learner identification within more constrained contexts presented by a highly technical and advanced engineering MOOC. We shall investigate different types of learner behavior that emerge from the above-mentioned clustering and the ways in which each cluster is different from the rest. We will also discuss some of the technical implications of using K means clustering for learner identification in MOOCs, such as deciding the optimal number of clusters. We will provide a methodology for identifying appropriate labels for each user group according to their dominant behavioral characteristics and use the Kruskal-Wallis test to show that the difference in learner behavior across clusters is statistically significant.
The primary goal of this work is twofold: first, to undertake a literature survey of clustering techniques that have been applied in EDM for learner identification and classification; second, use the insights we gain from the literature synthesis to inform educators of the appropriate choice of clustering techniques. We demonstrate the use of one such clustering technique (K-means algorithm) for identification of learner types in a highly technical and advanced MOOC on Nanotechnology. Based on the literature survey, we will provide justifications for why K means clustering algorithms seems to function more efficiently in the context of classifying learner characteristics. We will provide a detailed description of the K-means algorithm and the technical implications of applying K-means for identifying learner types. We will demonstrate this algorithm in action when we attempt to classify learner population in a MOOC into distinct categories and study the characteristic behavioral traits of each category. The use of Kruskal-Wallis test to show that the difference in user behavior across clusters is statistically significant will also be discussed. The paper will also discuss the distinct learner categories that result from clustering and the characteristic traits of each type.
Roy, D., & Bermel, P., & Douglas, K. A., & Diefes-Dux, H. A., & Richey, M., & Madhavan, K., & Shah, S. (2017, June), Synthesis of clustering techniques in educational data mining Paper presented at 2017 ASEE Annual Conference & Exposition, Columbus, Ohio. 10.18260/1-2--28897
ASEE holds the copyright on this document. It may be read by the public free of charge. Authors may archive their work on personal websites or in institutional repositories with the following citation: © 2017 American Society for Engineering Education. Other scholars may excerpt or quote from these materials with the same citation. When excerpting or quoting from Conference Proceedings, authors should, in addition to noting the ASEE copyright, list all the original authors and their institutions and name the host city of the conference. - Last updated April 1, 2015