Speech patterns in phone conversations can spot people with early to moderate Alzheimer's dementia, a Japanese study suggested.
A machine-learning predictive model correctly identified people with Alzheimer's dementia with about 90% accuracy using audio files of phone conversations of 24 people with confirmed Alzheimer's and 99 healthy controls, reported Akihiro Shimoda, MPH, of McCann Health Worldwide Japan in Tokyo, and co-authors, in .
People with Alzheimer's disease are more likely to speak more slowly with longer pauses than others, Shimoda and co-authors noted, explaining that people with Alzheimer's spend more time finding the correct word, which produces broken messages that lack speech fluency.
What differentiated this study from others is that it focused on vocal features of everyday speech.
"We did not use text data extracted from voice data," Shimoda told 鶹ý. "We did not use voice data from cognitive tests in a clinical setting, but from daily telephone conversations."
The researchers also assessed pitch and intensity to identify voice characteristics of dementia patients.
"The results show the possibility of incorporating a prediction model into a phone app to conduct an initial assessment of dementia risk in older adults," Shimoda said. "It provides a possible way to identify dementia risk by using voice data of older adults, which are easier to obtain than conventional cognitive tests, biomarkers, or brain imaging."
However, an app based on this model is "far from something that can substitute for current dementia screening methods," he emphasized. "It would be an initial tool that's accessible and low-cost."
The study evaluated data from people 65 and older who participated in a dementia prevention program in Hachioji City from March to May 2020. The program included 1 to 2 months of weekday phone calls from an artificial intelligence (AI) computer program aimed at improving diet, physical activity, and social participation.
The phone interaction included an assessment of cognitive function based on the Japanese version of the (TICS-J) on the first day, then asked participants to talk freely about daily life for 1 minute and answer questions like "What did you do yesterday?"
Alzheimer's was diagnosed using or the before the program started. An Alzheimer's diagnosis was used as a binary variable to predict outcome.
People with severe Alzheimer's were excluded from the program. People in the Alzheimer's group had mild/moderate Alzheimer's disease or mild cognitive impairment.
The 99 healthy controls and 24 Alzheimer's patients yielded 1,465 and 151 audio files, respectively, with 81% randomly allocated to the training data set and 19% to validation data. After extracting vocal features from the audio files, the researchers developed models based on extreme gradient boosting (XGBoost), random forest, and logistic regression.
Prediction based on each audio file showed an area under the receiver operating characteristic curve (AUC) of 0.863 (95% CI 0.794–0.931) for XGBoost, 0.882 (95% CI 0.840–0.924) for random forest, and 0.893 (95% CI 0.832–0.954) for logistic regression.
Prediction based on each participant showed an AUC of 1.000 (95% CI 1.000–1.000) for XGBoost, 1.000 (95% CI 1.000–1.000) for random forest, and 0.972 (95% CI 0.918–1.000) for logistic regression. Prediction based on the TICS-J cognitive assessment was 0.917 (95% CI 0.918–1.000).
Both XGBoost and the TICS-J cognitive assessment had 100% sensitivity. XGBoost had 100% specificity, while TICS-J had 83.3% specificity.
The study had several limitations, Shimoda and co-authors acknowledged. The outcome variable was binary -- Alzheimer's disease or healthy controls -- and ignored variations in speech characteristics at different stages of dementia. In addition, the sample size was small, audio quality differed for some participants, and because the TICS-J assessment was conducted by an AI program, its limited speech recognition ability may have affected results.
The researchers also relied on superficial vocal features like pitch and intensity. "Further research could include natural language processing of speech content and sentence structure analysis in order to reduce information loss and increase model prediction performance," the team wrote.
Disclosures
This research was funded by McCann Health Worldwide Japan Inc. All researchers are employees of McCann.
Primary Source
PLOS ONE
Shimoda A, et al "Dementia risks identified by vocal features via telephone conversations: A novel machine learning prediction model" PLOS ONE 2021; DOI: 10.1371/journal.pone.0253988.