A 2026 psychometric study of 354 Swedish participants found that 1-week versions of the PHQ-15 and SSD-12 mostly preserved internal consistency and construct validity, but reliability was acceptable mainly in clinical participants or when researchers averaged 2 timepoints.
Research Highlights
- Clinical tracking looked more defensible than healthy screening: Hybelius et al. studied 194 people with persistent physical symptoms and 160 healthy volunteers, and the 1-week forms worked better as repeated clinical measures than as stable healthy-volunteer screens.1
- Internal consistency was strong: In the pooled sample, conventional PHQ-15 showed alpha 0.88 and omega 0.88, while the 1-week PHQ-15 showed alpha 0.87 and omega 0.88.1
- SSD-12 consistency was even higher: The conventional SSD-12 reached alpha 0.96 and omega 0.96, and the 1-week SSD-12 reached alpha 0.96 and omega 0.97 in the pooled sample.1
- Healthy-volunteer reliability was the weak point: The 1-week SSD-12 had ICC 0.36 and r = 0.36 across about 15 days in healthy volunteers, far below the study team’s target.1
- Averaging improved repeated measurement: In the clinical sample, 2-timepoint averages reached ICC 0.77 for PHQ-15 and ICC 0.78 for SSD-12, which is the strongest practical argument for repeated tracking rather than one-off interpretation.1
Persistent physical symptoms are bodily symptoms such as pain, fatigue, dizziness, gastrointestinal distress, or cardiopulmonary sensations that remain clinically important even when a single structural disease does not fully explain the burden. Measurement is hard because 2 people can report similar symptom counts while differing sharply in fear, attention, avoidance, reassurance-seeking, and disability.
The PHQ-15 and SSD-12 split that problem into 2 related but different questions. PHQ-15 means Patient Health Questionnaire 15, a 15-item 0-30 scale that counts how much common bodily symptoms bothered someone. SSD-12 means Somatic Symptom Disorder B-criteria scale 12, a 12-item scale that measures symptom preoccupation: health anxiety, persistent worry, attention to symptoms, and behavior shaped by symptoms.
One-Week Versions Mostly Preserved Internal Consistency
Shorter recall windows are attractive because many trials and digital-care programs need repeated measurement every week or 2. A 4-week scale can blur treatment change, while a 1-week scale should be more sensitive to recent shifts in symptoms and symptom-related thinking.
Hybelius et al. tested conventional and revised 1-week versions in 194 participants with persistent physical symptoms and 160 healthy volunteers. The clinical group came from Swedish trials of internet-delivered exposure therapy and healthy lifestyle promotion, while the volunteer group provided a cleaner test-retest setting without planned treatment change.
Internal consistency asks whether items on a scale tend to move together. It does not prove validity by itself, but it flags whether a total score is at least measuring a coherent construct. On that narrow question, the 1-week forms held up well:
- PHQ-15: conventional alpha 0.88 and omega 0.88 in the pooled sample; 1-week alpha 0.87 and omega 0.88.
- SSD-12: conventional alpha 0.96 and omega 0.96; 1-week alpha 0.96 and omega 0.97.
- Clinical subsample: PHQ-15 reliability coefficients were lower than in the pooled sample, but SSD-12 total scores still reached alpha 0.86 to 0.88 across conventional and 1-week versions.
Those numbers support the basic use of total scores, especially for SSD-12. They do not mean the scales are interchangeable across recall windows. A person can endorse fewer symptoms when asked about the past week than about the past month, and the 2026 researchers explicitly did not build conversion algorithms between conventional and 1-week scores.
PHQ-15 Structure Was Messier Than a Clean Symptom Score
PHQ-15 is often treated as a simple somatic symptom count. Factor analysis asks a deeper question: whether the items behave like one underlying symptom-burden dimension, several body-system dimensions, or a combination of both.
The 2026 results were mixed. Contrary to the prediction, the PHQ-15 did not show a clean, stable factor structure. Pain and fatigue items created the hardest boundary problem, sometimes acting like one combined pain/fatigue factor and sometimes behaving as 2 local factors.
That result fits prior PHQ-15 work rather than invalidating the scale. Kroenke et al. introduced the PHQ-15 as a clinically useful symptom-severity measure in 2002, and a 2024 systematic review by Hybelius et al. found broad measurement support while still noting item redundancy and structural uncertainty.23
Practical interpretation: PHQ-15 total scores are useful for symptom burden, but body-system subscales should be interpreted cautiously unless a study has validated the exact scoring model it uses.
SSD-12 Split Into 3 Symptom-Preoccupation Factors
SSD-12 is not another symptom-count scale. It measures the B-criteria side of somatic symptom disorder: the cognitive, emotional, and behavioral response to symptoms. That distinction is clinically useful because high symptom burden and high symptom preoccupation can diverge.
The original SSD-12 paper by Toussaint et al. validated a 12-item measure for DSM-5 somatic symptom disorder B criteria, and later population work supported its validity and norms.45 Hybelius et al. found that a simple 1-factor solution did not fit adequately in the 2026 data. The better-fitting model separated 3 highly related factors:
- Expectation of a chronic course: the belief that symptoms will persist or not disappear.
- Health anxiety: worry that symptoms signal serious illness or bodily danger.
- Symptom focus and impairment: attention to symptoms and difficulty shifting focus away from them.
The factors were highly correlated, with rs = 0.85 to 0.94, so the total score still carries much of the clinical signal. But the 3-factor structure warns against treating symptom preoccupation as a single psychological flavor. For one patient, the dominant problem may be catastrophic expectation; for another, it may be attention capture and functional narrowing.
Reliability Was Acceptable Mainly With Clinical Averaging
Test-retest reliability asks whether a score stays reasonably stable when the underlying state should not have changed much. For weekly monitoring, this is the key practical gate: if a measure jumps around in stable people, apparent improvement or worsening may be noise.
The healthy-volunteer results were the caution signal. Conventional PHQ-15 reached ICC 0.61 and r = 0.63 across about 15 days, while 1-week PHQ-15 reached ICC 0.58 and r = 0.58. SSD-12 looked similar for the conventional version, with ICC 0.62 and r = 0.63, but the 1-week SSD-12 fell to ICC 0.36 and r = 0.36.
Clinical repeated measurement looked better, even though the estimate was imperfect because participants had begun treatment. Over the first 16.3 days, 1-week PHQ-15 reached ICC 0.63 and r = 0.67; averaging 2 timepoints raised ICC to 0.77. For SSD-12, the same clinical comparison reached ICC 0.64 and r = 0.75; averaging 2 timepoints raised ICC to 0.78.
- One-off healthy screening: the 1-week SSD-12 reliability estimate of ICC 0.36 was too low for confident single-score interpretation.
- Clinical trend tracking: 2-timepoint averages crossed ICC 0.77 for both 1-week scales in the clinical data.
- Score conversion: conventional and 1-week versions should not be treated as interchangeable because same-day conversion data were not available.

That pattern argues for repeated trend interpretation. A single 1-week score can be noisy, especially in healthy or low-symptom settings. Two or more timepoints make the signal sturdier and better match how these scales are likely to be used in therapy trials, symptom-monitoring programs, and stepped-care follow-up.
One-Week Scores Should Not Become Diagnostic Shortcuts
The strongest use case is clinical tracking, not diagnosis. PHQ-15 and SSD-12 are patient-reported outcome measures: they quantify symptom burden and symptom-related preoccupation, but they do not decide whether symptoms are medically explained, whether somatic symptom disorder is present, or which treatment should be chosen.
COSMIN reporting guidance for patient-reported outcome measures emphasizes that validity depends on the intended use, population, and interpretation context.6 The 2026 evidence fits that framework. One-week PHQ-15 and SSD-12 forms are plausible for repeated measurement in people already being followed for persistent physical symptoms. They are weaker as stand-alone healthy-population screens, and the poor healthy-volunteer SSD-12 reliability is a direct warning against overreading a single 1-week score.
Evidence-strength note: this was a psychometric validation study, not a treatment-outcome trial. It can support claims about reliability, factor structure, and construct validity in this Swedish online sample. It cannot show that using 1-week forms improves care, predicts long-term recovery, or diagnoses somatic symptom disorder without clinical assessment.
Questions About PHQ-15 and SSD-12 Symptom Tracking
Should clinics switch from 4-week to 1-week PHQ-15 and SSD-12 versions?
For repeated symptom tracking in active care, the 1-week versions are defensible when scores are interpreted as trends rather than isolated verdicts. For intake screening, baseline characterization, or population surveys, conventional versions still have the broader evidence base.
Why did healthy volunteers look less reliable than clinical participants?
Low-symptom samples often have restricted score ranges, so small day-to-day shifts can look large statistically. In clinical samples, higher symptom burden gives the scale more signal to track, but the 2026 clinical reliability estimates were collected during early treatment, so they are not pure no-change estimates.
Can PHQ-15 and SSD-12 scores be converted between 4-week and 1-week versions?
Not from this study. The researchers planned to consider conversion algorithms, but reliability concerns and the lack of same-day administration in the clinical sample made conversion inappropriate.
Which measure is more useful: PHQ-15 or SSD-12?
They answer different questions. PHQ-15 estimates how much bodily symptoms bother the person. SSD-12 estimates how much the person is preoccupied, worried, impaired, or behaviorally pulled around by those symptoms. In the 2026 clinical subsample, change in the 1-week PHQ-15 and change in the 1-week SSD-12 correlated r = 0.52, meaning they overlap but are not the same measure.
References
- Hybelius J, af Winklerfelt Hammarberg S, Hoffmann AA, et al. Measurement properties of the Patient Health Questionnaire 15 (PHQ-15) and Somatic Symptom Disorder B-criteria scale (SSD-12), including revised 1-week versions. Scientific Reports. 2026;16:13415. https://doi.org/10.1038/s41598-026-50290-y
- Kroenke K, Spitzer RL, Williams JB. The PHQ-15: validity of a new measure for evaluating the severity of somatic symptoms. Psychosomatic Medicine. 2002;64:258-266. https://doi.org/10.1097/00006842-200203000-00008
- Hybelius J, et al. Measurement properties of the Patient Health Questionnaire-15 and Somatic Symptom Scale-8: a systematic review and meta-analysis. JAMA Network Open. 2024;7:e2446603. https://doi.org/10.1001/jamanetworkopen.2024.46603
- Toussaint A, et al. Development and validation of the Somatic Symptom Disorder-B Criteria Scale (SSD-12). Psychosomatic Medicine. 2016;78:5-12. https://doi.org/10.1097/psy.0000000000000240
- Toussaint A, Lowe B, Brahler E, Jordan P. The Somatic Symptom Disorder-B Criteria Scale (SSD-12): factorial structure, validity and population-based norms. Journal of Psychosomatic Research. 2017;97:9-17. https://doi.org/10.1016/j.jpsychores.2017.03.017
- Gagnier JJ, Lai J, Mokkink LB, Terwee CB. COSMIN reporting guideline for studies on measurement properties of patient-reported outcome measures. Quality of Life Research. 2021;30:2197-2218. https://doi.org/10.1007/s11136-021-02822-4

Leave a Reply