Estimation of dominant sound source with three microphone array
Kashentsev, Boris (2015)
Kashentsev, Boris
2015
Master's Degree Programme in Information Technology
Tieto- ja sähkötekniikan tiedekunta - Faculty of Computing and Electrical Engineering
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2015-06-03
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-201505201400
https://urn.fi/URN:NBN:fi:tty-201505201400
Tiivistelmä
Several real-life applications require a system that would reliably locate and track a single speaker. This can be achieved by using visual or audio data. Processing of an incoming signal to obtain the location of a source is known as Direction of Arrival (DOA) estimation. The basic setting in audio based DOA estimation is a set of microphones situated in known locations. The signal is captured by each of the microphones, and the signals are analyzed by one of the following methods: steered beamformer based method; subspace based method; or time delay estimation based method.
The aim of this thesis is to review different classes of existing methods for DOA estimation and to create an application for visualizing the dominant sound source direction around a three-microphone array in real time. In practice, the objective is to enhance an algorithm for a DOA estimation proposed by Nokia Research Center. As visualization of dominant sound source creates a basis for many audio related applications, a practical example of such applications is developed.
The proposed algorithm is based on time delay estimation method and utilizes cross correlation. Several enhancements are developed to the initial algorithm to improve its performance. The proposed algorithm is evaluated by comparing it with one of the most common methods, general cross correlation with phase transform (GCC PHAT). The evaluation includes testing all algorithms on three types of signals: speech signal arriving from a stationary location, speech signal arriving from a moving source, and a transient signal. Additionally, using the proposed algorithm, a computer application with a video tracker is developed.
The results show that the initially proposed algorithm does not perform as well as GCC PHAT. The enhancements improve the algorithm performance notably, although they did not bring the efficiency of the algorithm to the level of GCC PHAT when processing speech signals. In case of transient signals, the enhanced algorithm was superior to GCC PHAT. The video tracker was able to successfully track the dominant sound source.
The aim of this thesis is to review different classes of existing methods for DOA estimation and to create an application for visualizing the dominant sound source direction around a three-microphone array in real time. In practice, the objective is to enhance an algorithm for a DOA estimation proposed by Nokia Research Center. As visualization of dominant sound source creates a basis for many audio related applications, a practical example of such applications is developed.
The proposed algorithm is based on time delay estimation method and utilizes cross correlation. Several enhancements are developed to the initial algorithm to improve its performance. The proposed algorithm is evaluated by comparing it with one of the most common methods, general cross correlation with phase transform (GCC PHAT). The evaluation includes testing all algorithms on three types of signals: speech signal arriving from a stationary location, speech signal arriving from a moving source, and a transient signal. Additionally, using the proposed algorithm, a computer application with a video tracker is developed.
The results show that the initially proposed algorithm does not perform as well as GCC PHAT. The enhancements improve the algorithm performance notably, although they did not bring the efficiency of the algorithm to the level of GCC PHAT when processing speech signals. In case of transient signals, the enhanced algorithm was superior to GCC PHAT. The video tracker was able to successfully track the dominant sound source.