WIP: Traditional Engineering Assessments Challenged by ChatGPT: An Evaluation of its Performance on a Fundamental Competencies Exam

Trini Balart; Jorge Baier; Martín Eduardo Castillo

Download Paper | Permalink

Conference: 2024 ASEE Annual Conference & Exposition
Location: Portland, Oregon
Publication Date: June 23, 2024
Start Date: June 23, 2024
End Date: June 26, 2024
Conference Session: Educational Research and Methods Division (ERM) Technical Session 19
Tagged Division: Educational Research and Methods Division (ERM)
Page Count: 8
DOI: 10.18260/1-2--48322
Permanent URL: https://peer.asee.org/48322
Download Count: 176

Paper Authors

biography

Trini Balart Pontificia Universidad Católica de Chile

visit author page

Trinidad Balart is a PhD student at Texas A&M University. She completed her Bachelors of Science in Computer Science engineering from Pontifical Catholic University of Chile. She is currently pursuing her PhD in Multidisciplinary Engineering with a focus in engineering education and the impact of AI on education. Her main research interests include Improving engineering students' learning, innovative ways of teaching and learning, and how artificial intelligence can be used in education in a creative and ethical way.

visit author page

biography

Jorge Baier Pontificia Universidad Católica de Chile

visit author page

He is an associate professor in the Computer Science Department and Associate Dean for Engineering Education at the Engineering School in Pontificia Universidad CatÃ³lica de Chile. Jorge holds a PhD in Computer Science from the University of Toronto in Ca

visit author page

biography

Martín Eduardo Castillo Pontificia Universidad Católica de Chile

visit author page

Martín Castillo is currently pursuing a Bachelor of Science in Robotics Engineering at the Pontifical Catholic University of Chile. His interests lie in the intersection of artificial intelligence, robotics, control systems and applications of AI in education.

visit author page

Download Paper | Permalink

Abstract

ChatGPT, a chatbot which produces text with remarkable coherence, is leading higher education institutions to question the relevance of the current model of engineering education and, particularly, assessment. Among the many reasons behind this questioning is the fact that ChatGPT has been shown to be able to pass various engineering exams. In this research, the GPT-3.5 and GPT-4 models were used to solve different real-life versions of the Fundamental Competencies Exam (FCE), an exam used by a selective Latin American engineering school upon the completion of quintessential engineering courses like basic dynamics, ethics for engineers, and probability and statistics. The formulation of the questions seeks to demonstrate that the student has the fundamental knowledge of the discipline. We adopted a strategy in which the questions were extracted from the FCE modules, and translated to LaTeX. The statements of each question were presented in the absence of supplementary context to avoid influence between questions within the same exam. In addition, a comparative analysis of the effectiveness of the GPT-4 model is performed, evaluating its performance with and without image interpretation capability, due to the recent inclusion of the multimodal function of ChatGPT-4. The results obtained reveal that the difference in the pass rate of GPT-3.5 and GPT-4 is considerable, with 47.38% and 63.06% respectively. While the GPT-4 version without images achieved a passing rate sufficient to pass the exam in all modules, the results that include questions with images increased even more, reaching a 64.38% pass rate. We must continue to solve different versions of the exam. These data will allow us to perform multiple analyses related to the historical performance of the exam, providing a proxy to assess how difficulty has changed over the years. In light of these preliminary results, and given the tight constraints imposed on the model, it is imperative to question whether the FCE effectively assesses the fundamental skills required for an engineer and whether it is the best method for assessing foundational engineering competencies amidst the advent of innovative AI tools.

Citation
Format

Balart, T., & Baier, J., & Castillo, M. E. (2024, June), WIP: Traditional Engineering Assessments Challenged by ChatGPT: An Evaluation of its Performance on a Fundamental Competencies Exam Paper presented at 2024 ASEE Annual Conference & Exposition, Portland, Oregon. 10.18260/1-2--48322

TY  - CPAPER
AB  - ChatGPT, a chatbot which produces text with remarkable coherence, is leading higher education institutions to question the relevance of the current model of engineering education and, particularly, assessment. Among the many reasons behind this questioning is the fact that ChatGPT has been shown to be able to pass various engineering exams.
In this research, the GPT-3.5 and GPT-4 models were used to solve different real-life versions of the Fundamental Competencies Exam (FCE), an exam used by a selective Latin American engineering school upon the completion of quintessential engineering courses like basic dynamics, ethics for engineers, and probability and statistics. The formulation of the questions seeks to demonstrate that the student has the fundamental knowledge of the discipline.
We adopted a strategy in which the questions were extracted from the FCE modules, and translated to LaTeX. The statements of each question were presented in the absence of supplementary context to avoid influence between questions within the same exam. In addition, a comparative analysis of the effectiveness of the GPT-4 model is performed, evaluating its performance with and without image interpretation capability, due to the recent inclusion of the multimodal function of ChatGPT-4.
The results obtained reveal that the difference in the pass rate of GPT-3.5 and GPT-4 is considerable, with 47.38% and 63.06% respectively. While the GPT-4 version without images achieved a passing rate sufficient to pass the exam in all modules, the results that include questions with images increased even more, reaching a 64.38% pass rate. We must continue to solve different versions of the exam. These data will allow us to perform multiple analyses related to the historical performance of the exam, providing a proxy to assess how difficulty has changed over the years.
In light of these preliminary results, and given the tight constraints imposed on the model, it is imperative to question whether the FCE effectively assesses the fundamental skills required for an engineer and whether it is the best method for assessing foundational engineering competencies amidst the advent of innovative AI tools. 

AU  - Trini Balart
AU  - Jorge Baier
AU  - Martín Eduardo Castillo
CY  - Portland, Oregon
DA  - 2024/06/23
PB  - ASEE Conferences
TI  - WIP: Traditional Engineering Assessments Challenged by ChatGPT: An Evaluation of its Performance on a Fundamental Competencies Exam
UR  - https://peer.asee.org/48322
DO  - 10.18260/1-2--48322
ER  -