ComprehensionWorkshop

Using machine learning and eye-tracking to predict survey item comprehension

Authors:
Schmidt, Heiko, Heiko.Schmidt@gesis.org, GESIS - Leibniz Institute for the Social Sciences in Cologne
Pohl, Nada, pohln@hdm-stuttgart.de, Hochschule der Medien - Stuttgart
Kammerer, Yvonne, kammerer@hdm-stuttgart.de, Hochschule der Medien - Stuttgart
Gottschling, Steffen, gottschling@hdm-stuttgart.de, Hochschule der Medien - Stuttgart
Kern, Dagmar, Dagmar.Kern@gesis.org, GESIS - Leibniz Institute for the Social Sciences in Cologne

Keywords: Machine Learning, Eye-Tracking, Reading Comprehension

Abstract:

Questionnaires are an efficient method for collecting self-report data such as individuals’ attitudes. However, the quality of the collected data depends on the clarity and comprehensibility of the questionnaire items. We present initial research on the automated detection of comprehension problems in questionnaire items. Our working hypothesis is that the eye-tracking data of questionnaire items that are systematically manipulated to induce comprehension difficulties, contains patterns that can be detected by machine learning models. So far, we have conducted two user studies with 50 participants each, tracking their eye movements as they read a set of 16 comprehensible and 16 less comprehensible items (based on systematic manipulations using logical negation to create a less comprehensible version of each item in the first study and replacing one more frequent word per item by a less frequent synonym in the second study).
Applying single-feature machine learning models (Decision Tree, Logistic Regression, Naive Bayes, and Support Vector Machine) to the first dataset (logical negation) showed promising performance in differentiating between more and less comprehensible items. The best-performing model (Decision Tree) achieved a remarkable F1-macro score of 0.881 for the feature “Total fixation duration”. We are currently preparing the second dataset for the same approach. We will train, tune, and evaluate more sophisticated machine learning models (Ridge Classifier, Logistic Regression Classifier, SVM, SGD Classifier, DT, RF, MLP) to improve the results compared to the single-feature machine learning models. We aim to set a high standard for our analysis by finding the best feature set and hyperparameters to efficiently and reliably detect comprehension problems in questionnaire items. In the workshop, we will present the user studies, report on the results of the machine learning approaches and discuss implications of our approach on increasing the quality of questionnaire items. Beside the development of an automated system for the detection of less comprehensible items, we collaborate with colleagues from the field of psychology. Their work (also submitted for this workshop separately) focuses on an in-depth statistical analysis of the data, analyzing the relationship between reading comprehension ability and the comprehension of negated items.