Introduction: Mild cognitive impairment (MCI) is an intermediate stage between normal cognitive aging and dementia; early detection enables timely interventions. However, traditional questionnaire-based MCI screening in older adults is limited by reliance on patient cooperation, social stigma, and high false-negative rates. Therefore, more convenient and objective MCI screening methods are needed. Gait, a natural, stable, and observable movement, exhibits abnormalities widely recognized as MCI biomarkers, making it a valuable data source for detection. Existing video-based gait analysis often relies on gait itself, limiting its ability to detect critical MCI-related features from other data modality.
Purpose: This study aims to develop a multimodal data fusion model integrating clinical gait videos, text, and quantitative features to detect MCI in older adults and evaluate its accuracy, specificity, and sensitivity.
Method: Data from 167 patients, including 74 with MCI, were used to integrate gait videos, clinician-generated text, and quantitative gait features into a multimodal framework for predicting MCI. Gait video features were extracted using a video transformer model, clinical text features were extracted using GPT-5 and validated by five clinicians for MCI relevance, and quantitative gait features, including skeleton coordinates, interframe differences, joint distances, angles, and wavelet coefficients, were extracted using MMPose. After prompt tuning a vision-language model, multimodal features were integrated into a unified semantic space for contrastive learning to predict MCI. Finally, six other algorithms were compared for MCI detection with multimodal data. Seven-fold cross validation was used for model evaluation.
Results: The proposed model achieved an accuracy of 70.45% and an AUC of 0.82, significantly outperforming other algorithms. Notably, the model attained a recall of 83.73% for MCI cases, underscoring its potential in early detection.
Conclusion: The findings indicate that multimodal gait video data can be effectively used to evaluate MCI, providing a novel method for convenient and non-invasive MCI assessment in older adults.