The Patient Health Questionnaire-9 (PHQ-9) is the most widely used depression screening and severity measurement tool in healthcare worldwide. Developed by Kroenke, Spitzer, and Williams in 2001, this groundbreaking 9-item self-report questionnaire directly corresponds to the nine DSM-5 diagnostic criteria for major depressive disorder, making it uniquely valuable for both screening and clinical assessment.
The PHQ-9 transformed depression detection and management by providing the first brief, validated tool that directly maps onto diagnostic criteria while serving dual purposes as both a screening instrument and severity measure. Before the PHQ-9, depression assessment often relied on lengthy clinical interviews or instruments that measured symptoms without clear diagnostic relevance.
Depression as a Clinical Syndrome
Major depressive disorder is one of the most common mental health conditions worldwide, affecting approximately 5-7% of adults in any given year. It is characterized by persistent feelings of sadness or loss of interest, along with a constellation of cognitive, behavioral, and physical symptoms that significantly impair functioning.
The PHQ-9 measures depression as defined by the DSM-5, which requires the presence of at least five symptoms during the same two-week period, with at least one symptom being either depressed mood or anhedonia (loss of interest/pleasure). This symptom-based approach allows the PHQ-9 to function not only as a screening tool but also as an aid in diagnostic assessment and treatment monitoring.
Theoretical Foundation
The PHQ-9 is grounded in the DSM-5 diagnostic framework, which emphasizes symptom-based criteria for major depressive disorder. Unlike traditional depression scales that measure general distress or mood symptoms, the PHQ-9 systematically evaluates each of the nine specific criteria required for depression diagnosis.
The nine DSM-5 criteria assessed are:
Anhedonia (loss of interest or pleasure)
Depressed mood (feeling down or hopeless)
Sleep disturbances (insomnia or hypersomnia)
Fatigue or loss of energy
Appetite changes (increase or decrease)
Feelings of worthlessness or excessive guilt
Diminished ability to concentrate
Psychomotor agitation or retardation
Recurrent thoughts of death or suicidal ideation
The two-week timeframe used in the PHQ-9 directly corresponds to the DSM-5 diagnostic requirement, making it a clinically relevant assessment period. The frequency-based response scale (not at all, several days, more than half the days, nearly every day) captures the persistence of symptoms, which is critical for distinguishing clinical depression from transient mood changes.
This alignment with diagnostic criteria makes the PHQ-9 particularly valuable in clinical settings, where it can guide not only screening decisions but also inform diagnostic formulation and track symptom changes during treatment.
🏥 Clinical Standard: The PHQ-9 is recommended by major medical organizations including the US Preventive Services Task Force, American College of Physicians, and World Health Organization for routine depression screening.
Key Features
Assessment Characteristics
9 items corresponding exactly to DSM-5 depression criteria
2-3 minutes administration time
Ages 12+ through adult with extensive validation across age groups
4-point frequency scale (0-3) for response options
Dual functionality as screening tool and severity measure
Public domain – free for all uses worldwide
Depression Dimensions Assessed
Anhedonia – Loss of interest or pleasure in doing things
Depressed mood – Feeling down, depressed, or hopeless
Sleep disturbance – Trouble sleeping or sleeping too much
Fatigue – Feeling tired or having little energy
Appetite changes – Poor appetite or overeating
Guilt/worthlessness – Negative self-evaluation and self-blame
Concentration problems – Difficulty focusing on activities
Psychomotor changes – Moving/speaking slowly or being restless
Suicidal ideation – Thoughts of death or self-harm
Research and Clinical Applications
Primary care screening – Standard depression detection in medical settings
Mental health assessment – Initial evaluation in psychiatric settings
Treatment monitoring – Track symptom changes during therapy
Clinical trials – Outcome measure in depression research
Healthcare quality – Performance measurement and quality improvement
Population health – Community mental health surveillance
Collaborative care – Communication tool across care teams
Assess depression symptoms experienced over the past 2 weeks.
Scoring and Interpretation
Response Format
Participants rate how often they have been bothered by each problem over the last 2 weeks using a 4-point frequency scale:
0 = Not at all
1 = Several days
2 = More than half the days
3 = Nearly every day
Complete PHQ-9 Items
“Over the last 2 weeks, how often have you been bothered by any of the following problems?”
Little interest or pleasure in doing things
Feeling down, depressed, or hopeless
Trouble falling or staying asleep, or sleeping too much
Feeling tired or having little energy
Poor appetite or overeating
Feeling bad about yourself — or that you are a failure or have let yourself or your family down
Trouble concentrating on things, such as reading the newspaper or watching television
Moving or speaking so slowly that other people could have noticed. Or the opposite — being so fidgety or restless that you have been moving around a lot more than usual
Thoughts that you would be better off dead, or of hurting yourself in some way
Functional Impairment Question
After the 9 items, participants answer:
“If you checked off any problems, how difficult have these problems made it for you to do your work, take care of things at home, or get along with other people?”
Not difficult at all
Somewhat difficult
Very difficult
Extremely difficult
(This question is not included in total score but provides important clinical context)
Alternative cutoffs: ≥8 for increased sensitivity; ≥12 for medical populations (Levis et al., 2019)
Meaningful change: ≥5 point reduction indicates clinically significant improvement (Löwe et al., 2004)
Research Evidence and Psychometric Properties
Reliability Evidence
Internal consistency: α = 0.86 (95% CI [0.85, 0.87]) in meta-analysis of 60 studies with 232,147 participants (Ajele & Idemudia, 2025)
Test-retest reliability: r = 0.84 over 48-hour interval with phone interview correlation of 0.84 (Kroenke et al., 2001)
Cross-cultural reliability: Consistent internal consistency (α = 0.80-0.89) across 30+ countries and languages (Levis et al., 2019)
Age group stability: Reliable across adolescents (α = 0.89), adults (α = 0.86), and elderly populations (α = 0.84) (various studies)
Diagnostic Accuracy
Primary care and general medical settings:
Sensitivity: 88% for major depression at ≥10 cutoff using semi-structured interviews as gold standard (Kroenke et al., 2001)
Specificity: 88% for major depression at ≥10 cutoff (Kroenke et al., 2001)
Area under ROC curve: 0.95 indicating excellent discriminative ability (Kroenke et al., 2001)
Positive predictive value: 32-56% depending on prevalence in population (Levis et al., 2019)
Negative predictive value: 95-98% across settings (Levis et al., 2019)
Meta-analytic evidence:
Individual participant data meta-analysis: 58 studies (17,357 participants) confirming ≥10 cutoff optimal for most settings (Levis et al., 2019)
Diagnostic algorithm accuracy: 27 validation studies showing sensitivity 64-88% and specificity 72-88% (Manea et al., 2015)
Validity Evidence
Convergent validity:
Beck Depression Inventory-II: r = 0.73-0.84 in multiple studies (Kroenke et al., 2001)
Hamilton Depression Rating Scale: r = 0.86 with clinician-administered measure (Kroenke et al., 2001)
Structured clinical interviews: High agreement with SCID and CIDI diagnoses (Levis et al., 2019)
Discriminant validity:
Anxiety measures: r = 0.60-0.65, showing overlap but distinctiveness (Kroenke et al., 2001)
Physical health measures: Lower correlations (r = 0.30-0.40) than with mental health measures (Kroenke et al., 2001)
Treatment Sensitivity
Reliable change index: ≥5 point change indicates clinically meaningful improvement (Löwe et al., 2004)
Effect size detection: Sensitive to small-to-moderate treatment effects (d = 0.3-0.8) in clinical trials (Löwe et al., 2004)
Therapy monitoring: Effectively tracks symptom changes across CBT, IPT, and medication trials (various studies)
Remission assessment: Score <5 commonly used as remission criterion in clinical trials (Kroenke et al., 2001)
Cross-Cultural Validation
Global validation: Validated in 49 studies across low- and middle-income countries (Carroll & Hook, 2020)
Language versions: Available in 80+ languages with consistent psychometric properties (multiple studies)
Cultural adaptation: Demonstrated reliability across diverse populations with sensitivity 64-88% depending on population and setting (Carroll & Hook, 2020)
Measurement invariance: Consistent factor structure across ethnic and cultural groups (various studies)
Special Populations
Adolescents (12-17 years):
Good reliability (α = 0.89) and validity for teen populations (Richardson et al., 2010)
Same cutoffs applicable with consideration of developmental context
Older adults (65+):
Valid and reliable but may underdetect depression due to somatic symptom overlap (Pocklington et al., 2016)
Consider higher cutoffs (≥12) or complementary assessment
Medical populations:
Higher cutoffs (≥12) may reduce false positives due to medical symptom overlap (Levis et al., 2019)
Remains valid in chronic illness, cancer, cardiac, and pain populations
Pregnant/postpartum women:
Adequate psychometric properties but Edinburgh Postnatal Depression Scale may be preferable for postpartum-specific assessment (Matthey et al., 2006)
Clinical Applications and Usage Guidelines
Primary Clinical Applications
Annual depression screening in primary care settings (USPSTF Grade B recommendation)
Initial mental health assessment in psychiatric and counseling settings
Treatment progress monitoring every 2-4 weeks during active therapy
Outcome measurement in healthcare quality improvement programs
Collaborative care models for systematic tracking across care teams
Copyright and Usage Responsibility: Check that you have the proper rights and permissions to use this assessment tool in your research. This may include purchasing appropriate licenses, obtaining permissions from authors/copyright holders, or ensuring your usage falls within fair use guidelines.
The PHQ-9 is in the public domain and freely available for all uses worldwide. Pfizer, which originally held the copyright, released the PHQ-9 “without copyright restriction and at no charge, providing unprecedented access to these valuable and widely used tools.” No permission is required to reproduce, translate, display, or distribute the PHQ-9 for clinical, research, or educational purposes.
Proper Attribution: When using or referencing this scale, cite the original development:
Kroenke, K., Spitzer, R. L., & Williams, J. B. (2001). The PHQ-9: Validity of a brief depression severity measure. Journal of General Internal Medicine, 16(9), 606-613.
Kroenke, K., Spitzer, R. L., & Williams, J. B. (2001). The PHQ-9: Validity of a brief depression severity measure. Journal of General Internal Medicine, 16(9), 606-613.
Diagnostic Accuracy Meta-Analyses:
Levis, B., Benedetti, A., Thombs, B. D., & DEPRESsion Screening Data (DEPRESSD) Collaboration. (2019). Accuracy of Patient Health Questionnaire-9 (PHQ-9) for screening to detect major depression: individual participant data meta-analysis. BMJ, 365, l1476.
Manea, L., Gilbody, S., & McMillan, D. (2015). A diagnostic meta-analysis of the Patient Health Questionnaire-9 (PHQ-9) algorithm scoring method as a screen for depression. General Hospital Psychiatry, 37(1), 67-75.
Reliability Generalization:
Ajele, K. W., & Idemudia, E. S. (2025). Charting the course of depression care: a meta-analysis of reliability generalization of the patient health questionnaire (PHQ-9) as the measure. Discover Mental Health, 5(1), 1-18.
Cross-Cultural Validation:
Carroll, H. A., & Hook, K. (2020). Establishing reliability and validity for mental health screening instruments in resource-constrained settings: Systematic review of the PHQ-9 and key recommendations. Journal of Affective Disorders, 262, 434-445.
Treatment Monitoring:
Löwe, B., Kroenke, K., Herzog, W., & Gräfe, K. (2004). Measuring depression outcome with a brief self-report instrument: Sensitivity to change of the Patient Health Questionnaire (PHQ-9). Journal of Affective Disorders, 81(1), 61-66.
Clinical Guidelines:
Siu, A. L., & US Preventive Services Task Force. (2016). Screening for depression in adults: US Preventive Services Task Force recommendation statement. JAMA, 315(4), 380-387.
Special Populations:
Richardson, L. P., McCauley, E., Grossman, D. C., McCarty, C. A., Richards, J., Russo, J. E., Rockhill, C., & Katon, W. (2010). Evaluation of the Patient Health Questionnaire-9 Item for detecting major depression among adolescents. Pediatrics, 126(6), 1117-1123.
Pocklington, C., Gilbody, S., Manea, L., & McMillan, D. (2016). The diagnostic accuracy of brief versions of the Geriatric Depression Scale: A systematic review and meta-analysis. International Journal of Geriatric Psychiatry, 31(8), 837-857.