邓丽,邬群勇,杨水荣.融合SSAE深度特征学习和LSTM网络的PM2.5小时浓度预测[J].环境科学学报,2020,40(9):3422-3434
融合SSAE深度特征学习和LSTM网络的PM2.5小时浓度预测
- Use of stack sparse auto-encoder (SSAE) deep feature learning and long short-term memory (SSAE-LSTM) neural network for the prediction of hourly PM2.5 concentration
- 基金项目:国家自然科学基金(No.41471333);中央引导地方科技发展专项(No.2017L3012);厦门市海洋与渔业发展专项资金项目(No.18CFW030HJ10)
- 邓丽
- 1. 数字中国研究院(福建), 福州 350003;2. 福州大学空间数据挖掘与信息共享教育部重点实验室, 福州 350108;3. 卫星空间信息技术综合应用国家地方联合工程研究中心, 福州 350108
- 邬群勇
- 1. 数字中国研究院(福建), 福州 350003;2. 福州大学空间数据挖掘与信息共享教育部重点实验室, 福州 350108;3. 卫星空间信息技术综合应用国家地方联合工程研究中心, 福州 350108
- 杨水荣
- 1. 数字中国研究院(福建), 福州 350003;2. 福州大学空间数据挖掘与信息共享教育部重点实验室, 福州 350108;3. 卫星空间信息技术综合应用国家地方联合工程研究中心, 福州 350108
- 摘要:精准的PM2.5小时浓度短期预测,可以有效地提高空气污染的预报预警能力.针对传统的PM2.5预测模型中存在的影响因素考虑不全面且影响因素选择方法适用性不强等问题,本文提出一种融合栈式稀疏自编码器(Stack Sparse Auto-Encoder,SSAE)和长短期记忆神经网络(Long-Short Term Memory,LSTM)的PM2.5小时浓度预测模型.SSAE-LSTM模型综合考虑了时间因素、空间因素、气象因素和空气污染物因素等多种因素对PM2.5的影响,采用SSAE以无监督方式自动提取PM2.5抽象影响特征,实现特征的压缩和降维;然后以提取的抽象特征作为LSTM模型的输入,建立PM2.5时间序列预测模型,挖掘PM2.5历史序列中的长期依赖特征.为了验证方法的有效性,本文基于2016—2018年京津冀城市群71个空气监测站点的空气数据和气象数据,建立SSAE-LSTM模型对各个站点的PM2.5浓度进行离线训练和预测实验.预测结果表明,SSAE-LSTM模型预测精度高于其它预测模型,在所有测试集上的一致性指数(IA)高达0.99,均方根误差RMSE与平均绝对误差MAE降到了13.98和7.90.此外,分析了SSAE-LSTM模型在不同季节的适用性,71个空气监测站点在春、夏、秋、冬4个季节测试集的预测值和实测值均有很好的线性关系,决定系数分别是0.86、0.92、0.96、0.93.对北京市万寿西宫站点的预测结果表明,SSAE-LSTM模型可以用于不同空气质量情况下的PM2.5小时浓度预报,且具有应用上的可行性和可靠性.
- Abstract:Accurate prediction of PM2.5 concentration at the hour scale can improve the capabilities of air pollution forecast. Traditional models for PM2.5 prediction rely on a number of selected factors, while the selection of factors is hard to conduct. In this paper, we propose a new model using a Stack Sparse Auto-Encoder (SSAE) and a Long-Short Term Memory (LSTM) neural network for PM2.5 concentration prediction, referred to as SSAE-LSTM. We use the SSAE method to automatically extract high-level features with respect to PM2.5 based upon several factors regarding time, space, weather and air pollution. The extracted features are then used to build a time-series LSTM model that is able to extract effective spatial-temporal features. The proposed SSAE-LSTM model was evaluated using datasets of air pollution and weather collected from 71 air monitoring sites in Beijing-Tianjing-Hebei urban areas from 2016 to 2018, and compared with existing methods. Experimental results show that the predicted results by the proposed method had a higher accuracy than that of existing methods, having an IA index over 0.99 for all testing datasets, while the corresponding RMSE and MAE were dropped to 13.98 and 7.90, respectively. Furthermore, we applied the SSAE-LSTM to the datasets of 71 air monitoring sites covering the spring, summer, autumn, and winter seasons. Results showed that the predicted values were highly correlated with ground-truth values for all seasons, with coefficients of 0.86, 0.92, 0.96, 0.93, respectively. Moreover, the predicted results of the Wanshouxigong site in Beijing revealed that the SSAE-LSTM model can be effectively used for PM2.5 prediction at various air quality conditions.