The use of computers as a delivery platform not only enables the development of innovative item types but also facilitates the collection of a broader range of records in log files throughout human-machine interactions. These granular records, often referred to as process data, are typically stored as an ordered sequence of multi-type, time-stamped events. This rich source of data supports the exploration and identification of informative features from problem-solving processes beyond response data. They are used to investigate when and how respondents engage in solving interactive tasks (Goldhammer et al., 2014; He et al., 2021, 2023), to identify test-taking strategies across population subgroups, such as age, gender, and educational attainment (Eichmann et al., 2020; Liao et al., 2019), and to shape future test and item design (Zhang et al., 2023).

This training session introduces the fundamental structure and analytic methods of process data, aiming to support the incorporation of response process information in large-scale assessments. The training will combine theoretical instructions on methodologies to analyze process data through sequence-based and feature-based approaches, along with programming code demonstrations, and case study illustrations using process data in Program for the International Assessment of Adult Competencies (PIAAC) and National Assessment of Educational Progress (NAEP).

This training session will be conducted in two sections, focusing on four subtopics: (1) creating and selecting informative features from process data in large-scale assessments, (2) extracting and selecting gram-based features from clickstream sequences, (3) computing sequence distance to identify pairwise sequence similarity and conduct sequence clustering, and (4) identifying and interpreting meaningful patterns from process data. Section 1 will provide an overview of process data structure, preprocessing, and methods for extracting information from process data through sequence mining techniques. Section 2 will introduce feature-based process data analysis, covering how to create and select process variables and integrate content experts’ insights into these variables. Each section will be supported by case studies as illustrations.

This training session will enable participants to understand process data structure, gain basic knowledge about sequence- and feature-based methods for analyzing process data, and how to choose appropriate methods for their own research. All program codes will be included in the training package for participants’ self-learning and further programming needs.