Similarity Measures for Content-Based Audio Retrieval
Helen, Marko (2009)
Helen, Marko
Tampere University of Technology
2009
Tieto- ja sähkötekniikan tiedekunta - Faculty of Computing and Electrical Engineering
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-201004261110
https://urn.fi/URN:NBN:fi:tty-201004261110
Tiivistelmä
Personal multimedia databases contain thousands of items and other databases on the Internet may contain even billions of items. Finding a particular item manually from such databases becomes overwhelming and thus automatic search engines are required to lighten the job. Query by example refers to automatically finding multimedia items from a database, which are similar to the example provided by the user. This is an important task in modern multimedia databases.
This thesis deals with automatic query by example of audio samples. The emphasis is on representation and distance measures between two audio signals, which are used to estimate the similarity between these two signals. The thesis also covers computational issues, which are highly important when it comes to practical implementation of the algorithms.
Two different audio signal representations are proposed. These representations are interconnected, since the first separates drums from a polyphonic music signal although the same approach could be used to separate other parts of the original signal as well, for example, harmonic instruments. The second representation models the harmonic sound using only a few parameters. The proposed method is based on Mel frequency cepstral coefficients, which are further modeled using attack-decay-sustain-release curves with temporal evolution of harmonic instruments.
Most distance measures, used in audio signal processing, are based on dividing a signal into frames, extracting perceptually motivated features from each frame, and calculating the distance between the features. Most of the proposed distance measures use Gaussian mixture models to estimate the probability density functions of the framewise features and calculate the distance between the Gaussian mixture models. However, the thesis also introduces a parameter free distance measurement. This is based on compression ratios of audio signals and i hence it removes the user influence on the results, since no features or other parameters need to be set.
In a query by example application, the similarity between the example provided by the user and each database item need to be calculated in order to obtain a ranked list of database samples. However, in practical applications this operation is very time-consuming if the database contains millions of items. The proposed method applies keysample transformation to reduce the series of feature vectors, used to represent each signal, into a single feature vector. The database is then clustered and the search is restricted to only a few clusters, thus saving retrieval time with some loss of accuracy.
This thesis deals with automatic query by example of audio samples. The emphasis is on representation and distance measures between two audio signals, which are used to estimate the similarity between these two signals. The thesis also covers computational issues, which are highly important when it comes to practical implementation of the algorithms.
Two different audio signal representations are proposed. These representations are interconnected, since the first separates drums from a polyphonic music signal although the same approach could be used to separate other parts of the original signal as well, for example, harmonic instruments. The second representation models the harmonic sound using only a few parameters. The proposed method is based on Mel frequency cepstral coefficients, which are further modeled using attack-decay-sustain-release curves with temporal evolution of harmonic instruments.
Most distance measures, used in audio signal processing, are based on dividing a signal into frames, extracting perceptually motivated features from each frame, and calculating the distance between the features. Most of the proposed distance measures use Gaussian mixture models to estimate the probability density functions of the framewise features and calculate the distance between the Gaussian mixture models. However, the thesis also introduces a parameter free distance measurement. This is based on compression ratios of audio signals and i hence it removes the user influence on the results, since no features or other parameters need to be set.
In a query by example application, the similarity between the example provided by the user and each database item need to be calculated in order to obtain a ranked list of database samples. However, in practical applications this operation is very time-consuming if the database contains millions of items. The proposed method applies keysample transformation to reduce the series of feature vectors, used to represent each signal, into a single feature vector. The database is then clustered and the search is restricted to only a few clusters, thus saving retrieval time with some loss of accuracy.
Kokoelmat
- Väitöskirjat [4891]