Introduction: Psychological distress has emerged as a significant public health concern during health crises, with prevalence rates reaching 25-45% globally. Traditional statistical methods present limitations in capturing complex non-linear interactions between multiple predictive variables. Machine learning (ML) techniques offer enhanced sensitivity for detecting patterns and developing clinically applicable predictive models.
Purpose: To develop and validate ML models for identifying the most predictive factors of psychological distress during health crises in adult populations, comparing eight different algorithms through systematic evaluation.
Method: Data from 1,605 adult participants from Madrid collected during a major health crisis were analysed. Psychological distress was assessed using the GHQ-12 questionnaire. Forty-five predictor variables were organized into seven theoretical categories: sociodemographic, economic, social, health exposure, institutional evaluations, specific health concerns, and housing conditions. Eight ML algorithms were compared using hyperparameter optimization and stratified cross-validation to select the optimal model.
Results: Psychological distress prevalence was 54.1%. LightGBM emerged as the best-performing algorithm, achieving an F1-Score of 0.707 and sensitivity of 77%. Economic factors dominated predictive capacity (26.1% of total importance), with personal economic concerns as the most relevant predictor. These were followed by institutional management evaluations (19.9%) and specific health-related concerns (17.5%). Psychosocial and economic factors predominated over direct health exposure.
Conclusions: ML models, particularly LightGBM, demonstrated clinically useful predictive capacity for identifying individuals at risk of psychological distress. The predominance of psychosocial and economic factors over direct exposure suggests that preventive interventions should focus on economic and social support during future health crises.