Questionnaire- versus voice-based screening for laryngeal disorders

摘要

The usefulness of questionnaire and voice data to screen for laryngeal disorders is explored. Answers to 14 questions form a questionnaire data vector. Twenty-three variables computed by the commercial “Dr.Speech” software from a digital voice recording of a sustained phonation of the vowel sound/a/constitute a voice data vector. Categorization of the data into a healthy class and two classes of disorders, namely diffuse and nodular mass lesions of vocal folds is the task pursued in this work. Visualization of data and automated decisions is also an important aspect of this work. To make the categorization, a support vector machine (SVM) is designed based on genetic search. Linear as well as nonlinear canonical correlation analysis (CCA) is employed, to study relations between the questionnaire and voice data sets. The curvilinear component analysis, performing nonlinear mapping into a two-dimensional space, is used for visualizing data and decisions. Data from 240 patients were used in the experimental studies. It was found that the questionnaire data provide more information for the categorization than the voice data. There are 3–4 common directions along which the statistically significant variations of the questionnaire and voice data occur. However, the linear relations between the variations occurring in the two data sets are not strong. On the other hand, very strong linear relations were observed between the nonlinear variates obtained from the questionnaire data and linear ones computed from the voice data. Questionnaire data carry great potential for preventive health care in laryngology.