04:00
2026-05-26
arxiv.org
machine-learning
A Multi-Probe Audit of Clinical-Interview Depression Detection Benchmarks
A multi-probe audit of clinical-interview depression detection benchmarks found that a lightweight hybrid text-plus-LLM-score model achieved a macro-F1 of 0.723 on the E-DAIC dataset under strict subj…