Skip to main content
eScholarship
Open Access Publications from the University of California

UC Irvine

UC Irvine Previously Published Works bannerUC Irvine

Content Coding of Psychotherapy Transcripts Using Labeled Topic Models

Abstract

Psychotherapy represents a broad class of medical interventions received by millions of patients each year. Unlike most medical treatments, its primary mechanisms are linguistic; i.e., the treatment relies directly on a conversation between a patient and provider. However, the evaluation of patient-provider conversation suffers from critical shortcomings, including intensive labor requirements, coder error, nonstandardized coding systems, and inability to scale up to larger data sets. To overcome these shortcomings, psychotherapy analysis needs a reliable and scalable method for summarizing the content of treatment encounters. We used a publicly available psychotherapy corpus from Alexander Street press comprising a large collection of transcripts of patient-provider conversations to compare coding performance for two machine learning methods. We used the labeled latent Dirichlet allocation (L-LDA) model to learn associations between text and codes, to predict codes in psychotherapy sessions, and to localize specific passages of within-session text representative of a session code. We compared the L-LDA model to a baseline lasso regression model using predictive accuracy and model generalizability (measured by calculating the area under the curve (AUC) from the receiver operating characteristic curve). The L-LDA model outperforms the lasso logistic regression model at predicting session-level codes with average AUC scores of 0.79, and 0.70, respectively. For fine-grained level coding, L-LDA and logistic regression are able to identify specific talk-turns representative of symptom codes. However, model performance for talk-turn identification is not yet as reliable as human coders. We conclude that the L-LDA model has the potential to be an objective, scalable method for accurate automated coding of psychotherapy sessions that perform better than comparable discriminative methods at session-level coding and can also predict fine-grained codes.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View