My research spans a number of areas, but the one consistent theme is "research centered on using big data collected from educational systems to improve student learning." The work spans data mining, artificial intelligence, machine learning, intelligent tutoring systems, cognitive modeling, game development, and instructional design.

Active research includes:

  • The Hint Factory
  • Cognitive Model Discovery
  • DataShop
  • Educational eScience
  • Games for Learning

The Hint Factory

The Hint Factory is a novel application to automatically generate contextualized hints and feedback from past student data. We have applied the Hint Factory to an existing, non-adaptive, software program used to teach deductive logic in an undergraduate Philosophy course. This work has been positively received in the ITS community. Papers in the ITS 2008, ITS 2010, and AIED 2011 conferences were nominated for Best Paper and the ITS 2010 work received a Best Student Paper award. I am currently extending this work to other domains including an intelligent tutor for computer programming.

Cognitive Model Discovery

Cognitive Model Discovery is another area of research where data mining and machine learning are used to improve cognitive models of student knowledge. These models drive many of the instructional decisions that intelligent tutors make, including how to organize instructional messages, the sequence of topics, and problem selection in a curriculum. Traditionally, cognitive task analysis (CTA) is used to create these models, but CTA is expensive and the models created often do not adequately fit the data. We believe these models should be discovered, not created! Data mining techniques can suggest improvements to these models which can improve the overall efficiency of student learning leading to a significant savings in time needed for students to learn skills. These improvements can also suggest improvements in the instructional design of the courseware used. Our 2012 conference paper was awarded Best Paper at the 5th International Conference on Educational Data Mining.


DataShop is an open data repository and set of associated visualization and analysis tools. The repository has data from thousands of students derived from interactions with on-line course materials and intelligent tutoring systems. The data is fine-grained, with student actions recorded roughly every 20 seconds, and it is longitudinal, spanning semester or yearlong courses. As of January 2013, almost 400 datasets are stored including over 90 million student actions which equates to over 218,000 student hours of data. Most student actions are coded meaning they are not only graded as correct or incorrect, but are categorized in terms of the hypothesized competencies or knowledge components needed to perform that action. DataShop allows researchers to import data in order to use the provided analysis tools, and to export data from the repository to perform additional analysis. Researchers have analyzed these data to better understand student cognitive and affective states and the results have been used to redesign instruction and demonstrably improve student learning.

Educational eScience

Educational eScience is an emerging area of research which is enabled through the explosive growth in educational data. Traditionally, eScience has focused on areas where huge data sets exist such as medicine, physics, climate, and astronomy . More recently we have seen educational data sets reaching sizes where eScience becomes not only practical, but necessary. Data spanning entire school years (or multiple school years) with tens of thousands of data points per student are becoming a reality. I developed and co-chaired the 2010 KDD Cup Competition to showcase these types of large educational data sets and to encourage researchers to develop data mining methods that can be used in this area.

Games for Learning

Games for learning have become a hot topic in education. Tremendous work has been done to create games to teach in many domains, but little work has been done to standardize data collection from games or to develop standard measurements of the effectiveness of these games. I am currently involved with several projects to address the collection of data from educational games and methods to analyze these data. One ongoing project that was originally funded through a Next Generation Learning Challenge grant is work to teach math fluency with games. This project can be seen here.