论文中文题名: | 我国典型城市空气污染特征及PM2.5预测分析 |
姓名: | |
学号: | 19201221002 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 025200 |
学科名称: | 经济学 - 应用统计 |
学生类型: | 硕士 |
学位级别: | 经济学硕士 |
学位年度: | 2022 |
培养单位: | 西安科技大学 |
院系: | |
专业: | |
研究方向: | 大数据分析 |
第一导师姓名: | |
第一导师单位: | |
论文提交日期: | 2022-06-22 |
论文答辩日期: | 2022-06-09 |
论文外文题名: | PM2.5 Prediction Analysis and Air Pollution Characteristics of Typical Chinese Cities |
论文中文关键词: | |
论文外文关键词: | Air pollution ; PM2.5 prediction ; Cluster analysis ; Ensemble learning |
论文中文摘要: |
随着我国城市化进程加快,城市建设、生产生活所造成的空气污染日趋严重。频繁出现的雾霾天气对人们身心造成了极大的伤害。分析空气污染特征、实现对主要污染物 PM2.5(细颗粒物)浓度的预测,有利于人们提前做好相应的防护措施。 论文主要工作和取得的研究结果如下: 对2014-2020年6项空气污染物的数据进行预处理和主成分分析后,使用均值漂移算法结合K-均值算法对选取的3个主成分进行聚类分析,将113个环保重点城市分为3类。3类城市群的空气质量在空间分布上呈现出由南向北环状式递减特征,同一类型城市在地理位置分布中表现出聚集性。在3类城市群中,由6项空气污染物的类均值分析表明O3 (臭氧)具有“冬季低、夏季高”、其他5项空气污染物具有“冬季高、夏季低”的时间分布特征。 对3个典型城市(昆明市、北京市、西安市)的年度、季度、月度空气污染特征进行描述性统计分析。年度分布特征:西安市和北京市空气污染物的年均值整体有下降趋势,空气污染治理情况整体向好。昆明市空气污染物的年均值呈现波动变化,且一直保持在国家限值之内;季度分布特征:3个城市的空气质量均为夏季最好,西安市和北京市冬季空气污染较为严重,主要污染物为PM2.5 、SO2 (二氧化硫)、CO (一氧化碳),昆明市在其他季节中均有轻度污染;月度分布特征: O3分布特征表现为“倒U”型,峰值区间在5到9月,谷值区间在11月到次年2月,其他空气污染物的分布特征则与O3 相反。 将3个典型城市2014-2020年的空气污染物和气象因素作为特征变量与目标变量PM2.5进行相关分析,得到PM2.5 与特征变量的相关程度。本研究选用了机器学习中的梯度提升树模型、线性支持向量机模型,由上述两种模型用Lasso回归堆叠而成的模型对典型城市PM2.5 的日均值进行预测。3种模型的测试结果表明:按模型在测试数据集上的均方根误差、平均绝对误差大小排序均为:线性支持向量机>梯度提升树>Stacking(堆叠器),决定系数在测试数据集上的排序则与之相反。由此可得结论:在对PM2.5 日均值预测中,Stacking集成模型的预测效果较好。 |
论文外文摘要: |
As urbanization accelerates in China, the air pollution caused by urban construction, production and life is becoming more and more serious. People have suffered greatly as a result of the frequent occurrence of hazy weather, both physically and psychologically. The analysis of air pollution characteristics, as well as the prediction of PM2.5(fine particulate matter) concentrations of major pollutants, will assist people in taking appropriate protective measures ahead of time. The following are the main points of the thesis and the research findings: The selected three principal components were clustered using the Mean Shift algorithm combined with the K-Means algorithm to classify the 113 key cities for environmental protection into three categories after pre-processing and Principal Components Analysis of the data of six air pollutants from 2014 to 2020. The spatial distribution of air quality in the three types of urban agglomerations exhibited a circular decreasing characteristic from south to north, and the same type of cities exhibited geographical agglomeration. The analysis of the mean values of the six air pollutants in the three urban groups revealed that O3 (ozone) had a temporal distribution that was “low in winter and high in summer”, while the other five air pollutants had a temporal distribution that was “high in winter and low in summer”. Descriptive statistics were used to examine the annual, quarterly, and monthly air pollution characteristics of three typical cities (Kunming, Beijing, and Xi'an). Annual distribution characteristics: the annual average values of air pollutants in Xi'an and Beijing were decreasing, and the overall situation of air pollution control was improving. The annual average values of air pollutants in Kunming fluctuated but stayed within the national limit values. Quarterly distribution characteristics: summer air quality was the best in all three cities, while winter air pollution was worse in Xi'an and Beijing, with PM2.5 , SO2(sulfur dioxide), and CO(carbon monoxide) as the main pollutants, and there was light pollution in Kunming in all other seasons. Monthly distribution characteristics: The distribution of O3 was “inverted U” shaped, with peaks from May to September and troughs from November to February of the following year, whereas the distribution of other air pollutants was the inverse of O3 . The correlation analysis was performed using the air pollutants and meteorological factor of three typical cities from 2014 to 2020 as characteristic variables and the target variable PM2.5 as the target variable, and the degree of correlation between PM2.5 and characteristic variables was obtained. In this study, the thesis chose the Gradient Boosting Decision Tree model in machine learning, the Linear Support Vector Machine model, combined these two models with Lasso, were used in this thesis to predict the daily PM2.5 average value of a typical city. The test results of the three models showed that the Root Mean Squared Error and the Mean Absolute Error of the models on the test data set were ranked as followed: Linear Support Vector Machine > Gradient Boosting Decision Tree > Stacking, but the ranking of the coefficient of determination on the test dataset was reverse. It could be concluded that in the prediction of daily average PM2.5 values, the prediction effect of Stacking ensemble model was better. |
中图分类号: | X51 |
开放日期: | 2022-06-22 |