查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于Stacking集成模型的郑州市二手房成交价格预测研究
姓名：	胥纳青
学号：	20201221055
保密级别：	公开
论文语种：	chi
学科代码：	025200
学科名称：	经济学 - 应用统计
学生类型：	硕士
学位级别：	经济学硕士
学位年度：	2023
培养单位：	西安科技大学
院系：	理学院
专业：	应用统计
研究方向：	金融统计
第一导师姓名：	丁正生
第一导师单位：	西安科技大学
论文提交日期：	2023-06-13
论文答辩日期：	2023-06-06
论文外文题名：	Research on the forecast of second-hand house transaction price in Zhengzhou based on Stacking integrated model
论文中文关键词：	二手房 ; 价格预测 ; 影响因素 ; Stacking集成模型
论文外文关键词：	Second-hand houses ; Price prediction ; Influencing factors ; Stacking integrated model
论文中文摘要：	︿房地产是我国的一大支柱产业，房价问题也是人们一直关注的重点问题。近年来，随着城市中心地区可供开发的土地日益减少，二手房交易变得愈发活跃，对二手房价格预测的需求量也逐渐增加。精准预测二手房价格不仅可以为政府部门、房地产开发商、房产中介和房屋买卖双方提供科学的决策依据，还可以推动房地产市场平稳健康的发展。目前国内对二手房价格的预测研究主要集中在挂牌价格上，而在现实中挂牌价格往往与成交价格不符，难以体现房屋的真实价值。因此，本文将以郑州市主城八区的二手房成交价格为研究对象，深入分析房价影响因素并建立预测模型，对二手房价格进行一房一价的精准预测。首先，基于特征价格理论，从区位特征、建筑特征、邻里特征和交易特征四个维度进行综合考量构建候选特征集，并利用网络爬虫和百度地图API技术获取相应数据。随后根据数据获取情况对数据进行清洗和变换处理，并利用包装法与嵌入法对特征进行选择，在删除无关和冗余特征后，最终保留46个特征变量，用于后续建模分析。其次，基于处理后的数据，利用K近邻、多层感知机、支持向量机、随机森林等六个单一算法模型，对二手房价格进行预测分析，并以决定系数、均方根误差和平均绝对误差为评价指标来衡量各个模型的预测效果，经综合对比发现随机森林的预测效果最优。此外，利用随机森林、XGBoost和LightGBM三个模型能够提取重要变量的特性，充分挖掘郑州市二手房成交价格的重要影响因素，结果显示物业管理费、所属区域、建筑面积等因素对郑州市二手房成交价格的影响较大。最后，为进一步提升预测精度和弥补单一模型的不足，构建了以多层感知机、支持向量机、随机森林、XGBoost和LightGBM为初级学习器，以K近邻为次级学习器的Stacking集成模型，并利用该模型及其他文章中提出的融合模型，对二手房价格进行预测分析。以上诸多模型预测结果的对比分析表明，本文构建的Stacking集成模型的预测效果最优，具有较高的预测精度和较强的泛化能力，为二手房成交价格预测提供了新思路和新方法。﹀
论文外文摘要：	︿ Real estate is a major pillar industry in China, and housing prices have been the key issue of continuous concern to people. In recent years, with the diminishing land available for development in urban centers, second-hand housing transactions have become more and more active, and the demand for the forecast of second-hand housing prices has gradually increased thereupon. Accurate forecasting of secondary property prices not only provides a scientific decision-making basis for government departments, real estate developers, real estate agents, as well as house buyers and sellers, but also promotes the stable and healthy development of the real estate market. At present, the domestic research on the prediction of second-hand house prices mainly focuses on the listing price, while the listing price often does not match the transaction price in reality, which is difficult to reflect the real value of the house. Therefore, in this thesis, the second-hand house transaction prices in eight districts of the main city of Zhengzhou will be taken as the research object, and the influencing factors of house prices will be analyzed in depth and a prediction model will be established to make accurate prediction of second-hand house prices for one house-one price. First of all, based on the hedonic price theory, the candidate feature set is constructed by considering four dimensions: location features, building features, neighborhood features and transaction features, and the corresponding data are obtained by using web crawlers and Baidu map API technology. Then the data are cleaned and transformed according to the data acquisition, and the features are selected using the packing and embedding methods, and after removing irrelevant and redundant features, 46 feature variables are finally retained for subsequent modeling analysis. Secondly, based on the processed data, six single prediction models such as K-nearest neighbor, multilayer perceptron, support vector machine, and random forest are empirically analyzed, and the coefficient of determination, root mean square error, and mean absolute error are used as evaluation indexes to measure the prediction effect of each model, and the best prediction effect of random forest is found after comprehensive comparison. In addition, the three models of random forest, XGBoost and LightGBM are able to extract the characteristics of important variables and fully explore the important influencing factors of the transaction prices of second-hand houses in Zhengzhou, and the results show that factors such as property management fee, belonging area and floor area have a greater influence on the transaction prices of second-hand houses in Zhengzhou. Finally, in order to further improve the prediction accuracy and make up for the shortcomings of a single model, a Stacking integrated model with multilayer perceptron, support vector machine, random forest, XGBoost and LightGBM as primary learners and K-nearest neighbor as secondary learners is constructed, and the fusion model proposed in this model and the fusion model proposed in other theses are used to predict and analyze the second-hand house prices. The comparative analysis of the prediction results of many models above shows that the Stacking integrated model constructed in this thesis has the best prediction effect, high prediction accuracy and strong generalization ability, and it provides a new idea and method for the prediction of second-hand house transaction prices. ﹀
参考文献：	︿ [1] Rosen S. Hedonic prices and implicit markets: product differentiation in pure competition[J]. Journal of political economy, 1974, 82(01): 34-55. [2] Freeman. Hedonic prices, property values and measuring environmental benefits: a survey of the issues[J]. The Scandinavian Journal of Economics, 1979, 81(02): 154-173. [3] Sirmans S, Macpherson D, Zietz E. The composition of hedonic pricing models[J]. Journal of real estate literature, 2005, 13(01): 1-44. [4] Ruza O, Lavrinenko O, Zelcs R. Sustainable development of real estate market: impact of the micro and meso level factors[J]. Journal of Security and Sustainability Issues, 2014, 3(04): 45-60. [5] Higgins D M, Rezaei A, Wood P. The value of a tram station on local house prices: an hedonic modelling approach[J]. Pacific Rim Property Research Journal, 2019, 25(03): 217-227. [6] 王旭育. 基于Hedonic模型的上海住宅特征价格研究[D]. 上海: 同济大学, 2006. [7] 孙玉环. 基于海量交易数据的房地产特征价格模型的构建[J]. 统计与决策, 2011, 27(02): 9-13. [8] 温海珍, 李旭宁, 张凌. 城市景观对住宅价格的影响——以杭州市为例[J]. 地理研究, 2012, 31(10): 1806-1814. [9] 刘璧婷, 李星野. 基于Hedonic理论的住宅特征价格模型——以上海为例[J]. 金融经济, 2013, 32(12): 38-42. [10] 刘降斌, 朱婷婷. 特征价格法在房地产价值评估中的适用性分析[J]. 对外经贸, 2020, 34(11): 61-63. [11] 杨剩富, 张鹏, 邹秋丽. 教育资源差异对住房价格的影响——以武汉市江岸区为例[J]. 资源科学, 2021, 43(04): 790-798. [12] 张怡, 陈坤阳, 王刚, 周雯雯, 刘琅嘉. 城市轨道交通对沿线二手住宅价格的空间差异化影响研究——以成都地铁3号线为例[J]. 建筑经济, 2022, 43(S2): 417-422. [13] Worzala E, Lenk M, Silva A. An exploration of neural networks and its application to real estate valuation[J]. Journal of Real Estate Research, 1995, 10(02): 185-201. [14] Selim H. Determinants of house prices in Turkey: Hedonic regression versus artificial neural network[J]. Expert systems with Applications, 2009, 36(02): 2843-2852. [15] Antipov E A, Pokryshevskaya E B. Mass appraisal of residential apartments: An application of Random forest for valuation and a CART-based approach for model diagnostics[J]. Expert systems with applications, 2012, 39(02): 1772-1778. [16] Park B, Bae J K. Using machine learning algorithms for housing price prediction: The case of Fairfax County, Virginia housing data[J]. Expert systems with applications, 2015, 42(06): 2928-2934. [17] Rahman S N A, Maimun N H A, Razali M N M, Ismail S. The artificial neural network model (ANN) for Malaysian housing market analysis[J]. Planning Malaysia, 2019, 17(09): 1-9. [18] Terregrossa S J, Ibadi M H. Combining housing price forecasts generated separately by hedonic and artificial neural network models[J]. Asian Journal of Economics, Business and Accounting, 2021, 21(01): 130-148. [19] Zaki J, Nayyar A, Dalal S, Ali Z H. House price prediction using hedonic pricing model and machine learning techniques[J]. Concurrency and Computation: Practice and Experience, 2022, 34(27): e7342. [20] 徐戈, 张科. 基于随机森林模型的房产价格评估[J]. 统计与决策, 2014, 30(17): 22-25. [21] 陈奕佳. 基于随机森林理论的北京市二手房估价模型研究[D]. 北京: 北京交通大学, 2015. [22] 武田艳, 占建军, 严韦. 基于MIV-BP型网络实验的房地产价格影响因素研究[J]. 数学的实践与认识, 2015, 45(18): 43-50. [23] 周怡君. 重庆市商品房价格的影响因素分析与房价预测[D]. 重庆: 重庆大学, 2017. [24] 杨博文, 曹布阳. 基于集成学习的房价预测模型[J]. 电脑知识与技术, 2017, 13(29): 191-194. [25] 张志锋, 崔亚东, 崔霄. 基于XGBoost的二手房房价预测模型[J]. 数字技术与应用, 2019, 37(11): 178-180. [26] 姚冲, 闭鑫业. 基于集成学习的武汉二手房估价模型研究[J]. 商讯, 2019, 37(10): 114-115. [27] 戴昊. 基于Stacking理论的北京二手房交易价格预测研究[D]. 鞍山: 辽宁科技大学, 2019. [28] 陈诗沁, 王洪伟. 基于机器学习的房地产批量评估模型[J]. 统计与决策, 2020, 36(09): 181-185. [29] 李函谕, 魏嘉银, 卢友军. 基于随机森林的深圳二手房价格预测与分析[J]. 现代信息科技, 2021, 5(15): 100-104. [30] 胡晓伟, 马春梅, 孔祥山, 李凤银. 基于XGBoost的深圳二手房价格预测[J]. 曲阜师范大学学报(自然科学版), 2022, 48(01): 57-65. [31] 赵士哲. 基于大数据分析的武汉市二手房价格评估实证研究[D]. 武汉: 华中师范大学, 2022. [32] Ahmadi E, Garcia-Arce A, Masel D T, Reich E, Puckey J, Maff R. A metaheuristic-based stacking model for predicting the risk of patient no-show and late cancellation for neurology appointments[J]. IISE Transactions on Healthcare Systems Engineering, 2019, 9(03): 272-291. [33] Satapathy S K, Bhoi A K, Loganathan D, Khandelwal B, Barsocchi P. Machine learning with ensemble stacking model for automated sleep staging using dual-channel EEG signal[J]. Biomedical Signal Processing and Control, 2021, 69: 102898. [34] Almadani M, Kheimi M. Stacking artificial intelligence models for predicting water quality parameters in rivers[J]. Journal of Ecological Engineering, 2023, 24(02): 152-164. [35] 丁岚, 骆品亮. 基于Stacking集成策略的P2P网贷违约风险预警研究[J]. 投资研究, 2017, 36(04): 41-54. [36] 曹再辉, 余东先, 施进发, 宗思生. 两层分类器模型应用于个人信用评估[J]. 控制工程, 2019, 26(12): 2231-2234. [37] 王辉, 李昌刚. Stacking集成学习方法在销售预测中的应用[J]. 计算机应用与软件, 2020, 37(08): 85-90. [38] 张宏鸣, 陈丽君, 刘雯, 韩文霆, 张姝茵, 张凡. 基于Stacking集成学习的夏玉米覆盖度估测模型研究[J]. 农业机械学报, 2021, 52(07): 195-202. [39] 郭朝有, 许喆, 姚乾. 基于Stacking的机舱设备剩余寿命预测方法[J]. 中国舰船研究, 2022, 17(06): 118-125. [40] 王志强, 王姿旖, 倪安发. 基于Stacking集成学习的区块链异常交易检测技术研究[J]. 信息安全研究, 2023, 9(02): 98-108. [41] 阮连法, 张跃威, 张鑫. 基于特征价格与SVM的二手房价格评估[J]. 技术经济与管理研究, 2008, 29(05): 75-78. [42] Butler R V. The specification of hedonic indexes for urban housing[J]. Land economics, 1982, 58(01): 96-108. [43] 李树泉. 基于LightGBM模型的二手房价格评估研究[D]. 北京: 北京化工大学, 2020. [44] 张望舒, 马立平. 城市二手房价格评估方法研究——基于Lasso-GM-RF组合模型对北京市二手房价格的分析[J]. 价格理论与实践, 2020, 40(09): 172-175+180. [45] 古志婷, 宋泽芳, 李元. 基于LASSO变量选择与多因子模型的增强型指数基金的构造研究[J]. 数理统计与管理, 2020, 39(03): 417-428. [46] 刘顺祥. 从零开始学Python数据分析与挖掘[M]. 北京: 清华大学出版社, 2020: 237-251+294. [47] 李航. 统计学习方法[M]. 北京: 清华大学出版社, 2019: 49-53. [48] Cortes C, Vapnik V. Support-vector networks[J]. Machine learning, 1995, 20(03): 273-297. [49] 刘渝根, 陈超. 基于人工蜂群算法优化支持向量机的接地网腐蚀速率预测模型[J]. 电力自动化设备, 2019, 39(05): 182-186+200. [50] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016: 37+98-105. [51] 张有健, 陈晨, 王再见. 深度学习算法的激活函数研究[J]. 无线电通信技术, 2021, 47(01): 115-120. [52] Biau G. Analysis of a random forests model[J]. The Journal of Machine Learning Research, 2012, 13(01): 1063-1095. [53] Chen T, He T, Benesty M, Khotilovich V, Tang Y, Cho H, Chen K, Mitchell R, Cano I, Zhou T, Li M, Xie J, Lin M, Geng Y, Li Y. Xgboost: extreme gradient boosting[J]. R package version 0.4-2, 2015, 1(04): 1-4. [54] Suenaga D, Takase Y, Abe T, Orita G, Ando S. Prediction accuracy of Random Forest, XGBoost, LightGBM, and artificial neural network for shear resistance of post-installed anchors[C]//Structures. Elsevier, 2023, 50: 1252-1263. [55] Osman A I A, Ahmed A N, Chow M F, Huang Y F, El-Shafie A. Extreme gradient boosting (Xgboost) model to predict the groundwater levels in Selangor Malaysia[J]. Ain Shams Engineering Journal, 2021, 12(02): 1545-1556. [56] Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T Y. Lightgbm: A highly efficient gradient boosting decision tree[J]. Advances in neural information processing systems, 2017, 30: 3146-3154. [57] 冯易, 王杜娟, 胡知能, 崔少泽. 基于改进LightGBM集成模型的胃癌存活性预测方法[J/OL]. 中国管理科学: 1-15[2023-03-19]. [58] 王芳杰, 王福建, 王雨晨, 边驰. 基于LightGBM算法的公交行程时间预测[J]. 交通运输系统工程与信息, 2019, 19(02): 116-121. [59] 朱文广, 李映雪, 杨为群, 刘小春, 熊宁, 周成, 王丽. 基于K-折交叉验证和Stacking融合的短期负荷预测[J]. 电力科学与技术学报, 2021, 36(01): 87-95. [60] 闫铭, 赵玲. 长春市二手房价格影响因素研究[J]. 沈阳工程学院学报(社会科学版), 2021, 17(02): 35-39. [61] 董卓亚. 基于百度地图JavaScriptAPI的通信地图展示[J]. 电子设计工程, 2013, 21(18): 73-76. [62] 周党生. 大数据背景下数据预处理方法研究[J]. 山东化工, 2020, 49(01): 110-111+122. [63] 张德然. 统计数据中异常值的检验方法[J]. 统计研究, 2003, 20(05): 53-55. [64] 罗珞珈, 郭岩, 王洋, 付琨. 利用鱼眼视图的轨迹可视化方法[J]. 重庆大学学报, 2017, 40(05): 81-87. [65] 黄文川. 多算法模型的房价预测应用[D]. 重庆: 重庆大学, 2021. ﹀
中图分类号：	F299.23
开放日期：	2023-06-14

附件下载