研究报告

  • 芦华,谢旻,吴钲,刘伯骏,高阳华,陈贵川,李振亮.基于机器学习的成渝地区空气质量数值预报PM2.5订正方法研究[J].环境科学学报,2020,40(12):4419-4431

  • 基于机器学习的成渝地区空气质量数值预报PM2.5订正方法研究
  • Adjusting PM2.5 prediction of the numerical air quality forecast model based on machine learning methods in Chengyu region
  • 基金项目:国家重点研发计划(No.2018YFC0213502);重庆市气象局开放式研究基金项目(No.KFJJ-201607);重庆市气象局创新团队项目(No.ZHCXTD-202023,ZHCXTD-202003);重庆市科委技术创新与应用示范项目(No.cstc2018jszx-zdyfxmX0003);中央高校基本科研业务费专项资金资助项目(No.020714380047)
  • 作者
  • 单位
  • 芦华
  • 重庆市气象科学研究所, 重庆 401147
  • 谢旻
  • 南京大学大气科学学院, 南京 210023
  • 吴钲
  • 重庆市气象科学研究所, 重庆 401147
  • 刘伯骏
  • 重庆市气象台, 重庆 401147
  • 高阳华
  • 重庆市气象科学研究所, 重庆 401147
  • 陈贵川
  • 重庆市气象科学研究所, 重庆 401147
  • 李振亮
  • 重庆市环境科学研究院, 重庆 401147
  • 摘要:空气质量预报对于大气污染防治、打赢蓝天保卫战意义重大.本研究基于重庆市气象局的中尺度天气模式(WRF)和空气质量数值预报模式(CMAQ)的预报产品,采用2018年4个代表月份(1、4、7、10月,分别代表冬、春、夏和秋季)成渝地区22个观测站点的PM2.5浓度和气象要素观测数据,建立基础特征变量数据集(包括训练数据集和测试数据集),通过调整模型参数,并利用训练数据集采用机器学习方法(Lasso回归、随机森林回归、深度学习RNN-LSTM)进行模型训练,订正了成渝地区PM2.5数值预报.其中,通过Lasso回归算法对成渝地区4个区域分别进行变量优选,优化模型,利用测试数据集对模型进行测试并检验评估.结果表明,基于3种机器学习方法订正后的PM2.5小时浓度相比CMAQ模式模拟预报结果,偏差显著降低,相关系数显著提高.其中,随机森林回归和RNN-LSTM的订正效果优于Lasso回归,区域统计与站点统计结果较为一致;Lasso回归订正后的均方根误差减小50%左右,相关系数达70%,随机森林回归和RNN-LSTM订正后的均方根误差减小70%左右,相关系数达90%,随机森林回归与RNN-LSTM订正后的偏差范围相比Lasso回归集中范围更窄,最大概率分布更集中;3种方法对不同季节的订正效果与全年一致,其中,冬季订正效果更为显著.研究结果可为提高我国重点城市群区域—成渝地区PM2.5浓度的大气污染预报能力提供有益参考.
  • Abstract:In order to improve the PM2.5 forecast capacity in Chengyu region, three machine learning models, including the lasso regression, the random forest regression, and the deep learning RNN-LSTM, were applied to adjusting the predicted PM2.5 concentrations from air quality model. The training data set and test data set consist of the prediction products from the Community Multiscale Air Quality Model (CMAQ) and the Weather Research and Forecasting Model (WRF), and the observed PM2.5 concentrations and meteorological parameters from 22 ground monitoring stations in Chengyu region. These data sets covered January, April, July and October of 2018, representing wintertime, springtime, summertime and autumntime, respectively. The Lasso regression algorithm was used optimize the variable selection in four separate sub-regions in Chengyu region. Tests of the models showed that the deviations between predicted and observed PM2.5 concentrations were obviously reduced and the correlation coefficient was significantly improved in both regional and single-site perspective. The performance of the random forest regression and the RNN-LSTM correction were better than that of the lasso regression. The root mean square error was reduced by about 50% for the lasso regression correction and about 70% for the random forest and the RNN-LSTM correction. Accordingly, the correlation coefficient was about 0.7 for the lasso regression correction and about 0.9 for the random forest and the RNN-LSTM correction. Furthermore, the random forest regression and the RNN-LSTM provided smaller variation range and more concentrated probability distribution of the PM2.5 deviations. The adjusting methods improved the PM2.5 prediction in all seasons and better performance was observed in winter. These adopted models could be applied in Chengyu region to improve the prediction ability of PM2.5 concentration.

  • 摘要点击次数: 835 全文下载次数: 974