1674 - DETECTING MILD COGNITIVE IMPAIRMENT IN OLDER ADULTS USING A MULTIMODAL DATA FUSION MODEL BASED ON CLINICAL GAIT VIDEOS

Session: P_D02S002 - Poster Session 2 - Division 2
AUTHORS:
Han Nuo (Department of Psychology, Faculty of Arts and Sciences, Beijing Normal University at Zhuhai ~ Zhuhai ~ China) , Li Tianyi (Beijing Key Laboratory of Applied Experimental Psychology, National Demonstration Center for Experimental Psychology Education (Beijing Normal University), Faculty of Psychology, Beijing Normal University ~ Beijing ~ China) , Zhu Tingshao (State Key Laboratory of Cognitive Science and Mental Health, Institute of Psychology, Chinese Academy of Sciences ~ Beijing ~ China)
Abstract text:
Introduction: Mild cognitive impairment (MCI) is an intermediate stage between normal cognitive aging and dementia; early detection enables timely interventions. However, traditional questionnaire-based MCI screening in older adults is limited by reliance on patient cooperation, social stigma, and high false-negative rates. Therefore, more convenient and objective MCI screening methods are needed. Gait, a natural, stable, and observable movement, exhibits abnormalities widely recognized as MCI biomarkers, making it a valuable data source for detection. Existing video-based gait analysis often relies on gait itself, limiting its ability to detect critical MCI-related features from other data modality.
Purpose: This study aims to develop a multimodal data fusion model integrating clinical gait videos, text, and quantitative features to detect MCI in older adults and evaluate its accuracy, specificity, and sensitivity.
Method: Data from 167 patients, including 74 with MCI, were used to integrate gait videos, clinician-generated text, and quantitative gait features into a multimodal framework for predicting MCI. Gait video features were extracted using a video transformer model, clinical text features were extracted using GPT-5 and validated by five clinicians for MCI relevance, and quantitative gait features, including skeleton coordinates, interframe differences, joint distances, angles, and wavelet coefficients, were extracted using MMPose. After prompt tuning a vision-language model, multimodal features were integrated into a unified semantic space for contrastive learning to predict MCI. Finally, six other algorithms were compared for MCI detection with multimodal data. Seven-fold cross validation was used for model evaluation.
Results: The proposed model achieved an accuracy of 70.45% and an AUC of 0.82, significantly outperforming other algorithms. Notably, the model attained a recall of 83.73% for MCI cases, underscoring its potential in early detection.
Conclusion: The findings indicate that multimodal gait video data can be effectively used to evaluate MCI, providing a novel method for convenient and non-invasive MCI assessment in older adults.