Introduction. Computer vision (CV) has advanced in facial expression classification, while remain under-discussed in continuous and subjective emotional modeling. The Pleasure-Arousal-Dominance (PAD) model offers a framework for continuous emotion quantification, but few empirical studies—particularly on non-contact approaches—have examined whether PAD-based models can capture individuals' subjective emotional states.
Purpose. To examined PAD model's real-world applicability, we developed a Vision Transformer (ViT)-based PAD regression model that predicts continuous emotional states from facial images.
Methods. The PAD model was trained on 13,000 expert-coded AffectNet images (MSE = 0.487), with high model-expert correlations of 0.919, 0.793, and 0.881 for Pleasure, Arousal, and Dominance dimensions. To evaluate the model's ability to predict subjective emotional states, an emotion induction experiment (n = 50) was conducted, collecting PAD data from three sources: Self-Assessment Manikin (SAM) ratings, expert-coded PAD values on facial images, and model-predicted PAD values.
Results. Self-reports and expert coding correlated significantly for Pleasure and Arousal, but only marginally for Dominance (rP = 0.672, rA = 0.679, ps < .001; rD = 0.144, p = .096). Model predictions correlated significantly with expert coding across all PAD dimensions (rP = 0.669, rA = 0.576, rD = 0.593, ps < .001) and with self-reports for Pleasure and Arousal (rP = 0.462, rA = 0.421, ps < .001), but not Dominance, indicating that the model, like human experts, relies on Pleasure and Arousal cues to infer individuals' emotional states.
Conclusions. The ViT-based PAD regression model performs comparably to trained human experts, effectively capturing self-reported emotional experiences, especially in Pleasure and Arousal dimensions. The consistent failure to infer subjective experience in Dominance across both expert and model observations suggests a fundamental limitation of facial cues alone in representing this internal dimension of emotional control. This study provides critical empirical evidence for facial emotion recognition models in subjective psychological assessments.