Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/146308
|
Title: | 探討兩資料集之相關性 Exploring the correlation between two datasets |
Authors: | 李其軒 Li, Qi-Xuan |
Contributors: | 鄭宗記 Cheng, Tsung-Chi 李其軒 Li, Qi-Xuan |
Keywords: | Mantel 檢定 典型相關分析 RV係數 PROTEST 距離共變異數檢定 歐氏距離 馬氏距離 皮爾森相關係數距離 Mantel test Canonical correlation analysis RV coefficient PROTEST Distance covariance test Euclidean distance Mahalanobis distance Pearson correlation distance |
Date: | 2023 |
Issue Date: | 2023-08-02 13:04:38 (UTC+8) |
Abstract: | 在生物統計或生態統計研究中,衡量兩組多維度資料集相關性是重要課題,統計方法中衡量兩資料集相關性除了典型相關係數分析(canonical correlation analysis)外,本研究探討其他方法,包括Mantel檢定(Mantel test)、RV係數(RV coefficient)、PROTEST(Procrustean randomization test)、距離共變異數檢定(distance covariance test),並且比較這幾種方法在不同的資料形態下優劣。Mantel檢定以及距離共變異數檢定需要透過距離來衡量資料集的相關性,本文除了使用Mantel檢定以及距離共變異數檢定常見的歐氏距離(Euclidean distance)外,也加入馬氏距離(Mahalanobis distance)和皮爾森相關係數距離(Pearson correlation distance),比較不同距離方法是否影響檢定效果。透過電腦模擬一般多元常態分配資料以及模擬非常態分配資料,針對每個模型分配改變資料的樣本數、資料的維度、資料變數的變異數,並且依據每種檢定的檢定力(power)和檢定力圖(power curve),來比較各檢定的效果,最後利用美國黃鶯(American wood warbler)音符結構與鳥鳴聲、小白鼠基因與體內脂肪酸兩實證資料集觀察各檢定的檢定結果。 In biological statistics or ecological statistics research, assessing the correlation between two multidimensional datasets is an important topic. In addition to canonical correlation analysis, this study explores other methods for measuring the correlation between two datasets. These methods include the Mantel test, RV coefficient, PROTEST (Procrustean randomization test), and distance covariance test. The study compares the performance of these methods under different data structures. The Mantel test and distance covariance test require the use of distance measures to quantify the similarity between datasets. In this study, besides the commonly used Euclidean distance, Mahalanobis distance and Pearson correlation distance are also employed to examine whether different distance measures affect the test results. Computer simulations are conducted using multivariate normal distribution data and non-normal distribution data. The sample size, dimensionality of the data, and variance of the data variables are varied for each simulated model. The effectiveness of each test is compared based on the test power and power curves. Finally, the empirical datasets of American wood warbler song structures and gene expression with hepatic fatty acids in mice are used to observe the test results of each method. |
Reference: | Abdi, H. (2011). Conguence: Congruence coefficient, RV-coefficient, and Mantel coefficient. pp. 1-15. Buskirk, J.V. (1997). Independent evolution of song structure and note structure in American wood warblers. Proceedings of the Royal Society of London. Series B: Biological Sciences, 264(1382), pp. 755-761. Diniz-Filho, J. A., Soares, T. N., Lima, J. S., Dobrovolski, R., Landeiro, V. L., de Campos Telles, M. P., Rangel, T. F., & Bini, L. M. (2013). Mantel test in population genetics. Genetics and molecular biology, 36(4), pp. 475-485. Dow, M. M., & Cheverud, J. M. (1985). Comparison of distance matrices in studies of population structure and genetic microdifferentiation: quadratic assignment. American journal of physical anthropology, 68(3), pp. 367-373. Dutilleul, P., Stockwell, J.D., Frigon, D., & Legendre, P. (2000). The Mantel test versus Pearson`s correlation analysis Assessment of the differences for biological and environmental studies. Journal of Agricultural Biological and Environmental Statistics, 5(2), pp. 131-150. Escoufier, Y. (1973). Le traitement des variables vectorielles. Biometrics, 29, pp. 751-760. Ghorbani, H.R. (2019). Mahalanobis distance and its application for detecting multivariate outliers. Facta Universitatis Series Mathematics and Informatics, 34(3), pp. 583-595. González, I. ., Déjean, S., Martin, P. . G. P., & Baccini, A. (2008). CCA: An R Package to Extend Canonical Correlation Analysis. Journal of Statistical Software, 23(12), pp. 1-14. Goslee, S.C., & Urban, D.L. (2007). The ecodist Package for Dissimilarity-based Analysis of Ecological Data. Journal of Statistical Software, 22(7), pp. 1-19. Härdle W. K., & Simar L.. (2015). "Canonical Correlation Analysis". Applied Multivariate Statistical Analysis., pp. 321-330. Hotelling, H. (1935). The most predictable criterion. Journal of Educational Psychology, 26, pp. 139-142. Husson, F., Lê, S., Mazet, J. (2007). FactoMineR: Factor Analysis and Data Mining with R. R package version 1.05. https://CRAN.R-project.org/package=FactoMineR Jackson, D. A. (1995). PROTEST: a Procrustean randomization test of community environment concordance. Écoscience, 2(3), pp. 297-303. Josse, J., Pagès, J., & Husson, F. (2008). Testing the significance of the RV coefficient. Computational Statistics & Data Analysis, 53(1), pp. 82-91. Legendre, P. and Legendre, L. (1998). Numerical ecology (2nd ed.). Amsterdam: Elsevier. Legendre, P., & Fortin, M. J. (2010). Comparison of the Mantel test and alternative approaches for detecting complex multivariate relationships in the spatial analysis of genetic data. Molecular ecology resources, 10(5), pp. 831-844. Legendre, P., Fortin, M., & Borcard, D. (2015). Should the Mantel test be used in spatial analysis? Methods in Ecology and Evolution, 6(11), pp. 1239-1247. Liu, G., Yang, S., Liu, W., Wang, S., Tai, P., Kou, F., Jia, W., Han, K., Liu, M., & He, Y. (2020). Canonical Correlation Analysis on the Association Between Sleep Quality and Nutritional Status Among Centenarians in Hainan. Frontiers in public health, 8, pp. 1-7. Lyu, J., & Nadarajah , S. (2022). New bivariate and multivariate log-normal distributions as models for insurance data. Results in Applied Mathematics, 14(87), pp. 1-26. Mahalanobis, P.C. (1936). On the generalized distance in statistics. Proceedings of the National Institute of Science of India, 2(1), pp. 49-55. Mantel N. (1967). The detection of disease clustering and a generalized regression approach. Cancer research, 27(2), pp. 209-220. Mantel, N., & Valand, R. S. (1970). A technique of nonparametric multivariate analysis. Biometrics, 26(3), pp. 547-558. Martin, P. G., Guillou, H., Lasserre, F., Déjean, S., Lan, A., Pascussi, J. M., Sancristobal, M., Legrand, P., Besse, P., & Pineau, T. (2007). Novel aspects of PPARalpha-mediated regulation of lipid and xenobiotic metabolism revealed through a nutrigenomic study. Hepatology (Baltimore, Md.), 45(3), pp. 767-777. McLachlan, G.J. (1999). Mahalanobis distance. Resonance, 4(6), pp. 20-26. Oksanen, F.J., et al. (2017). Vegan: Community Ecology Package. R package Version 2.4-3. https://CRAN.R-project.org/package=vegan. Omelka, M., & Hudecová, Š. (2013). A comparison of the Mantel test with a generalised distance covariance test. Environmetrics, 24(7), pp. 449-460. Peres-Neto, P. R., & Jackson, D. A. (2001). How well do multivariate data sets match? The advantages of a Procrustean superimposition approach over the Mantel test. Oecologia, 129(2), pp. 169-178. Silva, A., Dias, C.T., Cecon, P., & Rêgo, E. (2015). An alternative procedure for performing a power analysis of Mantel`s test. Journal of Applied Statistics, 42(9), pp. 1984-1992. Stöckl, S., & Hanke, M. (2014). Financial Applications of the Mahalanobis Distance. Applied Economics and Finance, 1(2), pp. 78-84. Székely, Gá. J., Rizzo, M. L. & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The annals of statistics, 35, pp. 2769-2794. van Schaik, C. P., Ancrenaz, M., Borgen, G., Galdikas, B., Knott, C. D., Singleton, I., Suzuki, A., Utami, S. S., & Merrill, M. (2003). Orangutan cultures and the evolution of material culture. Science (New York, N.Y.), 299(5603), pp. 102-105. |
Description: | 碩士 國立政治大學 統計學系 110354014 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0110354014 |
Data Type: | thesis |
Appears in Collections: | [統計學系] 學位論文
|
Files in This Item:
File |
Description |
Size | Format | |
401401.pdf | | 36225Kb | Adobe PDF2 | 0 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|