Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/139264
|
Title: | 應用實價登錄建立以聚類方法之堆疊泛化房價預測模型 -以桃園市區分建物房價資料為例 Predicting Housing Prices using Clustering-based Stacked Generator- A study on Taoyuan City Actual Price Registration Data |
Authors: | 黃允亭 Huang, Yun-Ting |
Contributors: | 陳樹衡 鄧筱蓉 黃允亭 Huang, Yun-Ting |
Keywords: | 特徵選取 聚類分析 機器學習 集成學習 堆疊泛化 實價登錄 房價預測 |
Date: | 2022 |
Issue Date: | 2022-03-01 17:52:29 (UTC+8) |
Abstract: | 本研究探討結合聚類分析的堆疊泛化模型對台灣房價預測的適用性。利用最新可用的桃園市實價登錄資料, 本研究首先拓展了Trivedi et. al (2015) 的聚類分析集成學習方法,建立了一個聚類分析的兩層堆疊泛化模型。第一層聚類分析群模型分別由Lasso,KNN以及決策樹建立,第二層元模型分別由線性迴歸、隨機森林以及XGBoost所建立。接下來用此拓展的兩層聚類分析堆疊泛化模型預測了桃園市房價資料,並與其他機器學習模型,包括線性迴歸、隨機森林和XGBoost,比較他們的預測結果。 This research explores the applicability of combining clustering technique with stacked generalization for Taiwan housing prices prediction. Taking advantage of the most currently available Taoyuan City Actual Price Registration Data, we first expanded the clustering-based ensemble learning method by Trivedi et al. (2015) to develop two-layer clustering-based stacked generalizers. In the first layer, three machine learning methods (Lasso, KNN and Decision Tree) were used to construct the cluster models. In the second layer, Linear Regression, Random Forest and XGBoost were used to build meta models. These developed stacked generalizers are then used to predict housing prices in the Taoyuan City. Their prediction accuracies are then compared with that from other machine learning methods, including Linear Regression, Random Forest and XGBoost. |
Reference: | [1] Altman, N. S. (1992). An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. The American Statistician, 46(3), 175–185. [2] Breiman, L. (1996a). Bagging Predictors. Machine Learning, 24(2), 123–140. [3] Breiman, L. (1996b). Stacked Regressions Leo Breiman. Machine Learning, 24(1), 49–64. [4] Breiman, L. (2001). Random Forests. Machine Learning, 45, 5–32. [5] Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification And Regression Trees. Chapman & Hall/CRC, 368. [6] Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. KDD, 785–794. [7] Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics, 7(1), 1–26. [8] Frank, A. and Asuncion, A. (2010). UCI machine learning repository. http://archive.ics.uci.edu/ml. [9] Freund, Y. (1995). Boosting a Weak Learning Algorithm by Majority. Information and Computation, 121(2), 256–285. [10] Efron, B. (1997). A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci., 1(55), 119–139. [11] Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22. [12] Friedman, J. H. (2001). Boosting a Weak Learning Algorithm by Majority. Greedy Function Approximation: A Gradient Boosting Machine, 29(5), 1189–1232. [13] Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using Support Vector Machines. Machine Learning, 46(1-3), 389–422. [14] Ho, T. K. (1995). Random Decision Forests. Proceedings of 3rd International Conference on Document Analysis and Recognition, 278–282. [15] Huang, S.Y. and Yu, F. and Tsaih, R. H. and Huang, Y. (2014). Resistant Learning on the Envelope Bulk for Identifying Anomalous Patterns. 2014 International Joint Conference on Neural Networks (IJCNN), 3303–3310. [16] Schapire, R. E. (1990). The Strength of Weak Learnability. Machine Learning, 5, 197–227. [17] Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288. [18] Ting, K.M. & Witten, I.H. (1997). Stacked generalization: when does it work?. Hamilton, New Zealand: University of Waikato, Department of Computer Science. [19] Trivedi, S., Pardos, Z. A., & Heffernan, N. T. (2015). The Utility of Clustering in Prediction Tasks. ArXiv:1509.06163. [20] Van der laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). Super Learner. Statistical Applications in Genetics and Molecular Biology, 6(25). [21] Wolpert, D. H. (1992). Stacked Generalization. Neural Networks, 5(2), 241–259. [22] Zou, H., & Hastie, T. (2005). Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 67(2), 301–320. [23] 何睿婷,(2018)。基於異質集成學習方法的房價預測。通訊世界,10,pp.296-297。 [24] 吳晏榕,(2010)。房價指數應用在銀行資產重估之研究。未出版之碩士論文,政治大學,經濟學研究所,台北市。 [25] 洪淑娟、雷立芬,(2010)。根據中古屋、預售屋/新成屋房價與總體經濟變數互動關係之研究。臺灣銀行季刊,61(1),pp.155-167。 [26] 洪鴻智、張能政,(2006)。不動產估價人員之價值探索過程:估價程序與參考點的選擇。建築與規劃學報,7(1),pp.71-90。 [27] 郁嘉綾,(2018)。應用大數據於杭州市房地產價格模型之建立。未出版之碩士論文,政治大學,統計學研究所,台北市。 [28] 張曦方,(1994)。住宅樓層價差之探討–以台北市為例。未出版之碩士論文,政治大學,地政學研究所,台北市。 [29] 陳敬筌,(2019)。應用深度學習預測區域住房平均價格— 以台北市實價登錄為例。未出版之碩士論文,銘傳大學,資訊管理學系碩士在職專班,台北市。 [30] 陳樹衡、郭子文、棗厥庸,(2007)。以決策樹之迴歸樹建構住宅價格模型-臺灣地區之實證分析。住宅學報,16(1),pp.1-20。 [31] 馮世傑,(2014)。房價影響變數之探討-以台北市為例。未出版之碩士論文。東吳大學,國際貿易學研究所,台北市。 [32] 黃佳鈴、張金鶚, (2005)。從房地價格分離探討地價指數與公告土地現值評估。台灣土地研究;8(2),pp.73-106。 [33] 楊博文、曹布陽,(2017)。基於集成學習的房價預測模型。電腦知識與技術,13(29),pp.191-194。 [34] 蔡育政,(2009)。影響房地產價格因素之研究:以台中市北屯區、西屯區、南屯區、中區、東區為例。未出版之碩士論文,朝陽科技大學,財務金融研究所,台中市。 [35] 蔡育展,(2017)。機器學習與房地產估價。未出版之碩士論文,政治大學,資訊管理學研究所,台北市。 [36] 蔡瑞煌、高明志、張金鶚,(1999)。類神經網路應用於房地產估價之研究。住宅學報,8,pp.1-20。 [37] 賴碧瑩,(2007)。應用類神經網路於電腦輔助大量估價。住宅學報,16(2),pp.43-65。 [38] 謝明穎,(2017) 。運用機器學習方法建構房價預測視覺化平台。未出版之碩士論文。輔仁大學,統計資訊學系應用統計研究所,新北市。 |
Description: | 碩士 國立政治大學 經濟學系 107258025 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0107258025 |
Data Type: | thesis |
DOI: | 10.6814/NCCU202200343 |
Appears in Collections: | [經濟學系] 學位論文
|
Files in This Item:
File |
Description |
Size | Format | |
802501.pdf | | 4583Kb | Adobe PDF2 | 470 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|