研究报告

  • 肖诗霖,王杨君,田梦悦,陆贵斌,陈艳松,姜森,童欢欢,黄凌,张韵,李莉.基于机器学习利用低成本传感器数据构建城市近地面PM2.5浓度的高分辨率空间分布[J].环境科学学报,2022,42(9):440-451

  • 基于机器学习利用低成本传感器数据构建城市近地面PM2.5浓度的高分辨率空间分布
  • Estimating the near-ground PM2.5 concentration distribution with high spatial and temporal resolution based on machine learning method using low-cost sensor observations
  • 基金项目:国家自然科学基金(No.42075144, 42005112)
  • 作者
  • 单位
  • 肖诗霖
  • 上海大学环境与化学工程学院,上海 200444
  • 王杨君
  • 上海大学环境与化学工程学院,上海 200444
  • 田梦悦
  • 河北先河环保科技股份有限公司,石家庄 050000
  • 陆贵斌
  • 上海大学经济学院,上海 200444
  • 陈艳松
  • 上海大学环境与化学工程学院,上海 200444
  • 姜森
  • 上海大学环境与化学工程学院,上海 200444
  • 童欢欢
  • 上海大学环境与化学工程学院,上海 200444
  • 黄凌
  • 上海大学环境与化学工程学院,上海 200444
  • 张韵
  • 上海大学环境与化学工程学院,上海 200444
  • 李莉
  • 上海大学环境与化学工程学院,上海 200444
  • 摘要:PM2.5是当前影响中国城市空气质量的一种重要污染物,它对人体健康有不利影响,高时空间分辨率的PM2.5浓度分布是城市大气污染 精细化防控和健康影响评估的重要基础.低成本传感器在中国城市空气质量监测领域得到越来越多的应用,但由于其数据质量不够高,其价值没有得到充分的挖掘.本研究以在长三角地区蚌埠市城区作为研究案例,首先基于国控站监测数据对低成本传感器观测数据进行校准.接着通过6种机器学习模型的预测性能的比较选取随机森林作为本研究的核心方法.然后利用随机森林方法结合土地利用类型、人口密度和通过WRF模式模拟得到的气象数据、监测数据等多源数据,构建了时间分辨率为1 h、空间分辨率为0.005°(接近500 m)的近地面PM2.5浓度的时空分布.最后对几种方法的结果进行了比较和评估.反演得到的高分辨率PM2.5浓度的10倍交叉验证结果显示,空间R2=0.95,RMSE=9.32 μg·m-3,时间R2=0.88,RMSE=9.24 μg·m-3.结果表明,低成本传感器观测数据可以帮助人们构建具有更高精度的高时空分辨率的PM2.5浓度 分布,使得低成本传感器数据的时空特征价值得到更好的利用,同时也可为低成本传感器更好地服务于空气质量精细化管理提供一种新的 应用思路.
  • Abstract:In recent years, PM2.5 has been one of the important pollutants which affect the air quality in many cities in China, and it has been linked to adverse impact on human health. PM2.5 concentration distribution with high spatial-temporal resolution is the important for the prevention and control of urban air pollution and health impact assessment. Low-cost sensors are increasingly being applied in urban air quality monitoring in China, but their value has not been fully exploited. In this study, taking Bengbu City, a typical city in the Yangtze River Delta (YRD) region, as a case, firstly, the low-cost sensor data were calibrated with the observations from the national controlled monitoring stations, and then based on the prediction performance comparison of six machine learning methods, Random Forest was used as the core method to fuse data from the land use types, population density, low-cost sensor (LCS) observations, monitoring data of national control stations (NCS), as well as the meteorological data obtained with the Weather Research Forecasting (WRF) model. In this way, a concentration distribution of PM2.5 with a temporal (hourly) and spatial (0.005°, close to 500 meters) resolution was estimated. Finally, the results of several methods were compared and evaluated. Based on10-fold cross-validation,the spatial (R2=0.95, RMSE=9.32 μg·m-3) and temporal (R2=0.88, RMSE=9.24 μg·m-3) evaluation results indicate that the estimated PM2.5 concentrations with the high-resolution were agreed well with NCS observations. Generally, the results show that the low-cost sensor observations help us to construct the PM2.5 concentration distribution with high temporal and spatial resolution with higher accuracy by making use of the spatiotemporal features of the low-cost sensor data. In addition, this study demonstrates the potential of low-cost sensors to better serve the refined management of air quality and health impact assessment.

  • 摘要点击次数: 191 全文下载次数: 321