An Improved Nearest Neighbor Based Entropy Estimator with Local Ellipsoid Correction and its Application to Evaluation of MCMC Posterior Samples
Lu, Chien (2018)
Lu, Chien
2018
Matematiikan ja tilastotieteen tutkinto-ohjelma - Degree Programme in Mathematics and Statistics
Luonnontieteiden tiedekunta - Faculty of Natural Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2018-07-30
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:uta-201808032343
https://urn.fi/URN:NBN:fi:uta-201808032343
Tiivistelmä
Entropy estimation is an important technique to summarize the uncertainty of a distribution underlying a set of samples. It ties to important research problems in fields such as statistics, machine learning and so on. The k-nearest neighbor (kNN) estimator is one widely used classical nonparametric method although it suffers bias issue especially when the dimensionality of the data is high.
In this thesis, an improved kNN entropy estimator is developed. The proposed method has the advantage of a learning a local ellipsoid to be used in the estimation, in order to mitigate the bias issue which results from the local uniformity. Several numerical experiments have been conducted and the results have shown that the proposed approach can efficiently reduce the bias especially in when the dimension is high.
Another studied topic in this thesis is the evaluation of the correctness of the posterior samples when conducting Bayesian inferences. This thesis demonstrates that the proposed estimator can be applied to such a task. We show that the simulation-based approach is more efficient and discriminative than a lower bound based method by one simple experiment, and the proposed kNN estimation can improve the accuracy of the state-of-the-art simulation-based approach.
In this thesis, an improved kNN entropy estimator is developed. The proposed method has the advantage of a learning a local ellipsoid to be used in the estimation, in order to mitigate the bias issue which results from the local uniformity. Several numerical experiments have been conducted and the results have shown that the proposed approach can efficiently reduce the bias especially in when the dimension is high.
Another studied topic in this thesis is the evaluation of the correctness of the posterior samples when conducting Bayesian inferences. This thesis demonstrates that the proposed estimator can be applied to such a task. We show that the simulation-based approach is more efficient and discriminative than a lower bound based method by one simple experiment, and the proposed kNN estimation can improve the accuracy of the state-of-the-art simulation-based approach.