Onlinecongress manager

3708 - CAN AI ESTIMATES NORMS? SIMULATED NORMS FOR A VERBAL REASONING TASK

Session: 3704 - ARTIFICIAL INTELLIGENCE IN PSYCHOLOGICAL ASSESSMENT: RISKS, NOVEL OPPORTUNITIES, AND EMERGING SOLUTIONS ACROSS APPLIED CONTEXTS

AUTHORS:

Ciancaleoni Matteo (Hogrefe Editore ~ Florence ~ Italy)

Abstract text:

AI is a very powerful tool that is widely used in many aspects of daily life. In recent years, several studies have investigated the potential application of AI in psychology (e.g., Dillon et al., 2023). However, there is still no evidence to suggest that AI can simulate test-taker responses to questionnaires (Harding, 2024). This study aimed to provide a preliminary evaluation of ChatGPT-based norms for a verbal reasoning task in children aged 5 and 10, examining both the overall sample and age-specific subgroups. The Naming Opposites subtest of the Intelligence and Development Scales ‒ 2 (Grob & Hagmann-von Arx, 2018) was administered to a sample of 450 children and ChatGPT (GPT-4 Omni) was used to replicate the test score distribution. The items of the scale were provided to ChatGPT and, iteratively, the item parameters, mean and standard deviation, were disclosed to the model. The results showed that AI cannot estimate the mean and standard deviation of the sample without knowing the item parameters. When the item parameters were added, the mean estimations improved, but the standard deviation estimates remained inconsistent with the real data. These patterns were consistent across the age-specific subgroups, except for the youngest and oldest children, for whom ChatGPT encountered greater challenges due to floor and ceiling effects. Furthermore, even when item parameters were provided, ChatGPT did not always improve the accuracy of its estimations. Further research is needed (e.g. using Item Response Theory parameters) to better understand ChatGPT's ability to estimate standard deviations and reduce the number of item parameters required to produce mean estimates that match real data. Currently, ChatGPT is unable to provide suitable norms, despite showing some awareness of typical verbal reasoning abilities in children aged 5 to 10, but it does not adequately account for variability among children of the same age.