Acoustic Analysis of Infant Cry Signals
Naithani, Gaurav (2015)
Naithani, Gaurav
2015
Master's Degree Programme in Information Technology
Tieto- ja sähkötekniikan tiedekunta - Faculty of Computing and Electrical Engineering
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2015-06-03
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-201505211414
https://urn.fi/URN:NBN:fi:tty-201505211414
Tiivistelmä
Crying is the first means of communication for an infant through which it expresses its physiological and psychological needs. Infant cry analysis is the investigation of infant cry vocalizations in order to extract social and communicative information about infant behavior, and diagnostic information about infant health. This thesis is part of a larger study whose objective is to analyze the acoustic properties of infant cry signals and use it for early assessment of neurological developmental issues in infants.
This thesis deals with two research problems in the context of infant cry signals: audio segmentation of cry recordings in order to extract relevant acoustic parts, and fundamental frequency (F0) estimation of the extracted acoustic regions. The extracted acoustic regions are relevant for extracting parameters useful for drawing correlation with developmental outcomes of the infants. Fundamental frequency (F0) is one such potentially useful parameter whose variation has been found to correlate with cases of neurological insults in infants. The cry recordings are captured in realistic hospital environments under varied contexts like infant crying out of hunger, pain etc. A hidden Markov model (HMM) based audio segmentation system is proposed. The performance of the system is evaluated for different configurations of HMM states, number of component Gaussians, and using different combinations of audio features. Frame based accuracy of 88.5 % is achieved. YIN algorithm, a popular F0 estimation algorithm, is utilized to deal with the fundamental frequency estimation problem, and a method to discard unreliable F0 estimates is suggested. The statistics associated with distribution of F0 estimates corresponding to different components of cry signals are reported.
This work would be followed up to find meaningful correlations between extracted F0 estimates and developmental outcomes of the infants. Moreover, other acoustic parameters would also be investigated for the same purpose.
This thesis deals with two research problems in the context of infant cry signals: audio segmentation of cry recordings in order to extract relevant acoustic parts, and fundamental frequency (F0) estimation of the extracted acoustic regions. The extracted acoustic regions are relevant for extracting parameters useful for drawing correlation with developmental outcomes of the infants. Fundamental frequency (F0) is one such potentially useful parameter whose variation has been found to correlate with cases of neurological insults in infants. The cry recordings are captured in realistic hospital environments under varied contexts like infant crying out of hunger, pain etc. A hidden Markov model (HMM) based audio segmentation system is proposed. The performance of the system is evaluated for different configurations of HMM states, number of component Gaussians, and using different combinations of audio features. Frame based accuracy of 88.5 % is achieved. YIN algorithm, a popular F0 estimation algorithm, is utilized to deal with the fundamental frequency estimation problem, and a method to discard unreliable F0 estimates is suggested. The statistics associated with distribution of F0 estimates corresponding to different components of cry signals are reported.
This work would be followed up to find meaningful correlations between extracted F0 estimates and developmental outcomes of the infants. Moreover, other acoustic parameters would also be investigated for the same purpose.