Using a Data Science Pipeline for Course Data: A Case Study Analyzing Heterogeneous Student Data in Two Flipped Classes

Asuman Cagla Acun Sener; Jeffrey Lloyd Hieb; Olfa Nasraoui

Download Paper | Permalink

Conference: 2019 ASEE Annual Conference & Exposition
Location: Tampa, Florida
Publication Date: June 15, 2019
Start Date: June 15, 2019
End Date: June 19, 2019
Conference Session: Curriculum and Assessment I
Tagged Division: Computing and Information Technology
Tagged Topic: Diversity
Page Count: 15
DOI: 10.18260/1-2--33492
Permanent URL: https://peer.asee.org/33492
Download Count: 457

Paper Authors

biography

Asuman Cagla Acun Sener University of Louisville

visit author page

Asuman Cagla Acun Sener holds B.S. and M.S. degrees in Computer Science and Computer Engineering. She is currently pursuing a doctoral degree in Computer Science at Knowledge Discovery & Web Mining Lab, Department of Computer Science and Computer Engineering, University of Louisville. She is also working as a graduate assistant. Her research interests are educational data mining, visualization, predictive modeling and recommender systems.

visit author page

biography

Jeffrey Lloyd Hieb University of Louisville

visit author page

Jeffrey L. Hieb is an Associate Professor in the Department of Engineering Fundamentals at the University of Louisville. He graduated from Furman University in 1992 with degrees in Computer Science and Philosophy. After 10 years working in industry, he returned to school, completing his Ph.D. in Computer Science Engineering at the University of Louisville’s Speed School of Engineering in 2008. Since completing his degree, he has been teaching engineering mathematics courses and continuing his dissertation research in cyber security for industrial control systems. In his teaching, Dr. Hieb focuses on innovative and effective use of tablets, digital ink, and other technology and is currently investigating the use of the flipped classroom model and collaborative learning. His research in cyber security for industrial control systems is focused on high assurance field devices using microkernel architectures.

visit author page

biography

Olfa Nasraoui University of Louisville

visit author page

Olfa Nasraoui is Professor of Computer Engineering and Computer Science, Endowed Chair of e-commerce, and the founding director of the Knowledge Discovery and Web Mining Lab at the University of Louisville. She received her Ph.D. in Computer Engineering and Computer Science from the University of Missouri-Columbia in 1999. From 2000 to 2004, she was an Assistant Professor at the University of Memphis. Her research activities include Data Mining/ Machine Learning, Web Mining, Information Retrieval and Personalization, in particular in problems involving large multiple domain, high dimensional data, such as text, transactions, and social network data. She is the recipient of the National Science Foundation CAREER Award, and the winner of two Best Paper Awards, a Best Paper Award in theoretical developments in computational intelligence at the Artificial Neural Networks In Engineering conference (ANNIE 2001) and a Best Paper Award at the Knowledge Discovery and Information Retrieval conference in Seville, Spain (KDIR 2018). She has more than 200 refereed publications, including over 47 journal papers and book chapters and 12 edited volumes. Her research has been funded notably by NSF and NASA. Between 2004 and 2008, she has co-organized the yearly WebKDD workshops on User Profiling and Web Usage Mining at the ACM KDD conference. She has served on the program committee member, track chair, or senior program committee of several Data mining, Big Data, and Artificial Intelligence conferences, including ACM KDD, WWW, RecSys, IEEE Big Data, ICDM, SDM, AAAI, etc. In summer 2015, she served as Technical Mentor/Project Lead at the Data Science for Social Good Fellowship, in the Center for Data Science and Public Policy at the University of Chicago. She is a member of ACM, ACM SigKDD, senior member of IEEE and IEEE-WIE. She is also on the leadership team of the Kentucky Girls STEM collaborative network.

visit author page

Download Paper | Permalink

Abstract

The landscape of student data in individual classes is changing rapidly. Traditionally, student data in an individual class consisted of homework assignments scores, exam and quiz scores, and project/lab scores. Those scores were usually manually entered in a gradebook. With course materials and assignments moving online and new educational technology tools being released with great frequency, there is an increasing amount of data recorded for each student in a class. That data can and does support formative assessment and evaluation, however there might be other information “hidden” in all that data. This study presents an applied data science methodology to explore student data from an engineering-mathematics course. The exploratory analysis serves two purposes, 1) it supports the faculty members desire to gain insights into the use of flipped classroom instruction and 2) it serves as a case study for a proposed data science pipeline for educational data. The instructor used Learning Catalytics, a classroom response system, on a daily basis in two sections of an engineering mathematics course. Each day’s scores were automatically recorded in the system. This data was combined with traditional homework and exam data and student demographic data. A combination of data mining and classical statistical techniques were used to reveal the trends and peculiarities in the data, without having a specific question or topic to investigate. The data science pipeline which we present has four major stages: data preprocessing, exploratory factor analysis, visualization and feature engineering. Analysis results show the differences and similarities within the course units and help to see learner behaviors. Significant differences related to gender were found, but prior experience in a course taught using the flipped classroom model did not show a significant difference. Exploratory factor analysis identified two factors from the whole data: class activities and exams (factor 1) and homeworks and lesson assignments (factor 2). When we take each factor, we found that they clustered as two groups within the course units: Unit 1 to 7 and Unit 8 to 13, which has a dividing point at the withdraw date. Results also shows that female students attend lesson more than male students and they are more engaged learners. The methodology is based on data mining methods such as factor analysis and visualization methods such as heat maps. Based on the exploratory data analysis, this paper proposes a data science pipeline methodology for analyzing and visualizing raw student data from multiple sources. We observed some trends and clusters within and across course units. Future work will include collecting more data and generating hypothesis.

Citation
Format

Acun Sener, A. C., & Hieb, J. L., & Nasraoui , O. (2019, June), Using a Data Science Pipeline for Course Data: A Case Study Analyzing Heterogeneous Student Data in Two Flipped Classes Paper presented at 2019 ASEE Annual Conference & Exposition , Tampa, Florida. 10.18260/1-2--33492

TY - CPAPER
AB - The landscape of student data in individual classes is changing rapidly. Traditionally, student data in an individual class consisted of homework assignments scores, exam and quiz scores, and project/lab scores. Those scores were usually manually entered in a gradebook. With course materials and assignments moving online and new educational technology tools being released with great frequency, there is an increasing amount of data recorded for each student in a class. That data can and does support formative assessment and evaluation, however there might be other information “hidden” in all that data. This study presents an applied data science methodology to explore student data from an engineering-mathematics course. The exploratory analysis serves two purposes, 1) it supports the faculty members desire to gain insights into the use of flipped classroom instruction and 2) it serves as a case study for a proposed data science pipeline for educational data. The instructor used Learning Catalytics, a classroom response system, on a daily basis in two sections of an engineering mathematics course. Each day’s scores were automatically recorded in the system. This data was combined with traditional homework and exam data and student demographic data. A combination of data mining and classical statistical techniques were used to reveal the trends and peculiarities in the data, without having a specific question or topic to investigate. The data science pipeline which we present has four major stages: data preprocessing, exploratory factor analysis, visualization and feature engineering. Analysis results show the differences and similarities within the course units and help to see learner behaviors. Significant differences related to gender were found, but prior experience in a course taught using the flipped classroom model did not show a significant difference. Exploratory factor analysis identified two factors from the whole data: class activities and exams (factor 1) and homeworks and lesson assignments (factor 2). When we take each factor, we found that they clustered as two groups within the course units: Unit 1 to 7 and Unit 8 to 13, which has a dividing point at the withdraw date. Results also shows that female students attend lesson more than male students and they are more engaged learners. The methodology is based on data mining methods such as factor analysis and visualization methods such as heat maps. Based on the exploratory data analysis, this paper proposes a data science pipeline methodology for analyzing and visualizing raw student data from multiple sources. We observed some trends and clusters within and across course units. Future work will include collecting more data and generating hypothesis.
AU - Asuman Cagla Acun Sener
AU - Jeffrey Lloyd Hieb
AU - Olfa Nasraoui
CY - Tampa, Florida
DA - 2019/06/15
PB - ASEE Conferences
TI - Using a Data Science Pipeline for Course Data: A Case Study Analyzing Heterogeneous Student Data in Two Flipped Classes
UR - https://peer.asee.org/33492
DO - 10.18260/1-2--33492
ER -