Introduction. Personality is a central construct in psychology, yet traditional self-report scales are prone to biases such as social desirability effects. Deep learning, with its powerful capacity for feature extraction and pattern recognition, offers a new paradigm for automated personality assessment.
Purpose. This study proposes a video-based deep learning framework for Big Five personality recognition, focusing on the psychological significance of dynamic facial emotions represented through the facial PAD model (Pleasure-Arousal-Dominance).
Method. A dual-stream feature extraction model, termed Cross-Modal Attention Vision Transformer (CMA-ViT), was developed to systematically model the associations between personality traits and multiple behavioral cues, including facial PAD, facial action units (AUs), and other indicators. The model employs two Transformer-based streams—video and behavioral—and integrates them via a cross-modal attention for holistic personality inference. Experiments on the MDPE public dataset achieved an average three-class accuracy of 0.77, demonstrating the model's capacity to capture multi-dimensional features predictive of personality.
Results.Feature attribution analysis using gradient-weighted importance scores revealed that PAD features exhibit the highest importance. Individuals with high Agreeableness exhibited sustained higher Pleasure (t(98) = 5.67, p < .001) and lower Arousal levels (t(98) = -3.87, p < .001), consistent with their emotionally stable and affiliative nature; those with high Neuroticism showed a "low-Pleasure, high-Arousal" negative affect pattern aligned with anxiety-prone tendencies; Extraverts maintained high Pleasure levels(t(98) = 3.92, p < .001), reflecting their expressive positivity; Conscientious individuals displayed elevated Dominance(t(98)=4.05, p<.001), indicative of goal-oriented control; and Openness demonstrated high variability, reflecting complex emotional experience. PAD exhibited high intra-individual consistency (ICC > 0.75), validating their stability as personality-related indicators.
Conclusions. This study highlights the crucial role of emotional dynamics in reflecting underlying cognitive-affective-behavioral patterns and establishes a theoretical and methodological foundation for explainable video-based personality recognition grounded in the PAD affective framework.