The current study examined the factor structure of the Pittsburgh Sleep Quality Index (PSQI), one of the most widely used self-report measures of sleep quality. Despite its popularity, prior research has raised concerns about its psychometric properties, particularly whether the global PSQI score truly reflects a unidimensional construct of sleep quality. To address these issues, we analyzed item-level responses from a large adult sample rather than relying on aggregated domain scores.
Specifically, we applied multidimensional full-information factor analysis (FIFA) under an item response theory (IRT) framework to evaluate one- through four-factor solutions. This approach allowed us to model the categorical response format of the PSQI items directly, providing a more precise evaluation of its latent structure compared to traditional exploratory factor analysis. We further examined measurement invariance across sex to evaluate whether the factor structure operated equivalently for men and women.
Results failed to support unidimensionality of the PSQI, raising questions about the interpretation of the widely used global score as a valid measure of overall sleep quality. Similarly, our findings did not support the original item aggregations of the PSQI's domain scores, suggesting that the scale's original scoring system may not align with the underlying structure of the data. The most consistent evidence supported a two-factor model, which demonstrated configural but not metric or scalar invariance across sex. Thus, although the same factor structure was present for men and women, item loadings and thresholds differed, limiting the comparability of mean scores.
Accordingly, observed sex differences in global or domain scores should be interpreted with caution, as they may reflect psychometric limitations rather than true differences in sleep quality. Overall, these findings raise concerns about the validity of both global and domain-based PSQI scores. Researchers and clinicians are encouraged to consider alternative scoring approaches that better reflect empirically derived factor structures.