For decades, PM2.5 has changed the radiation balance on Earth, increasing environmental and health risks, but it was not until 2013 that it was widely monitored in China. Long term historical records of PM2.5 with high temporal resolution are essential, but they are lacking in research and environmental management. In this dataset, we reconstructed a site based dataset of PM2.5 every 6 hours from 1960 to 2020, combining long-term visibility, conventional meteorological observations, emissions, and elevation. The concentration of PM2.5 at each station is estimated based on the advanced machine learning model LightGBM, which utilizes the spatial characteristics of the surrounding 20 meteorological stations. Our model performs comparable or even better in annual cross validation (CV) compared to previous studies (R2=0.7) and spatial CV (R2=0.76), with advantages in long-term recording and high temporal resolution. The model also reconstructed a 0.25 ° × A 0.25 °, 6-hour grid PM2.5 dataset was created by merging spatial features. The results showed that PM2.5 pollution gradually deteriorated or continued from an interdecadal scale before 2010, but eased in the following decade. Although the turning points vary in different regions, PM2.5 has significantly decreased in key areas since 2013 due to clean air actions. Especially in 2020, the annual average of PM2.5 was almost at its lowest historical level since 1960. This PM2.5 dataset provides high-resolution spatiotemporal changes, laying the foundation for research related to air pollution, climate change, and atmospheric chemistry reanalysis
| collect time | 1960/01/01 - 2020/12/31 |
|---|---|
| collect place | China |
| data size | 3.6 GiB |
| Data spatial resolution (/ M) | 0.25 |
| Data time resolution | year |
1. PM2.5 observation data: Hourly PM2.5 data from all stations from 2013 to 2020 are from the National Environmental Monitoring Center of China (CNEMC), http://www.cnemc.cn ); The measurement values of PM2.5 from the US embassies in Beijing and Shanghai before 2013 were used for independent verification and evaluation( http://www.stateair.net/web/historical )
2. Visibility and conventional meteorological data: Meteorological observation data collected from the National Meteorological Information Center (NMIC) includes 6-hour records from 1960 to 2020 and gradually increasing hourly records after 2013;
3. Emissions Inventory and Altitude: The historical anthropogenic emissions from 1960 to 2012 were derived from the Beijing Global Emissions Inventory, which was compiled using a bottom-up approach with a time resolution of 1 month intervals( http://inventory.pku.edu.cn ); The current anthropogenic emissions during the period 2013-2020 are derived from the China Multi Resolution Emissions Inventory (MEIC), http://meicmodel.org ); 30 meter elevation data from the Global Digital Elevation Model (GDEM) 2nd edition( https://earthexplorer.usgs.gov );
4. Auxiliary data: The Monthly Normalized Difference Vegetation Index (NDVI) product is sourced from the Distributed Active Archive Center (LADDS DAAC) of the Primary Atmospheric Archive and Distribution System, https://ladsweb.modaps.eosdis.nasa.gov )Obtain; Land cover classification data from the National Geographic Information Resource Catalog Service System( https://www.webmap.cn/mapDataAction.do?method=globalLandCover ); Population data from the World Grid Population 4th Edition (GPWv4, https://sedac.ciesin.columbia.edu/data/collection/gpw-v4 ) P>
For each PM2.5 site, extract five variables as time inputs, including year, month, day, hour, and year day. Longitude and latitude variables are used as positional inputs. The visibility, relative humidity, and temperature of the nearest meteorological station to each PM2.5 are used as basic meteorological inputs, and the distance between these two stations is also added as a feature. Previous research has developed a novel feature engineering method that incorporates peripheral influences by extracting spatial features. Specifically, each PM2.5 site matches the other 19 closest stations except for the nearest weather station. Five variables were selected from 19 sites, including longitude, latitude, temperature, visibility, and relative humidity. Then, the maximum, minimum, average, skewness, and standard deviation of these five variables were calculated separately. These features generated using surrounding conditions are also considered inputs. After extracting spatiotemporal features, a total of 71 features were used as inputs for model training. In order to reduce computation and training time while ensuring accuracy, the top 40 features sorted by importance during the small sample testing process were used for subsequent model training and post reporting. These features include visibility, temporal characteristics, spatial characteristics, emission characteristics, and altitude P>
The data quality is good P>
This work is licensed under
CC BY 4.0 (Creative Commons Attribution 4.0 International License).
| # | title | file size |
|---|---|---|
| 1 | Gridded PM25 dataset_6-hourly_1960-1990.zip | 1.2 GiB |
| 2 | Gridded PM25 dataset_6-hourly_1991-2020.zip | 1.2 GiB |
| 3 | Gridded PM25 dataset_daily.zip | 598.8 MiB |
| 4 | Gridded PM25 dataset_monthly.zip | 19.5 MiB |
| 5 | Gridded PM25 dataset_yearly.zip | 1.8 MiB |
| 6 | Site-based PM25 dataset_6-hourly.zip | 500.7 MiB |
| 7 | Site-based PM25 dataset_daily.zip | 144.8 MiB |
| 8 | Site-based PM25 dataset_monthly.zip | 5.2 MiB |
| 9 | Site-based PM25 dataset_yearly.zip | 994.1 KiB |
| # | category | title | author | year |
|---|---|---|---|---|
| 1 | paper | Reconstructing 6-hourly PM2.5 datasets from 1960 to 2020 in China | J,Zhong,X,Zhang,K,Gui,J,Liao,Y,Fei,L,Jiang,L,Guo,L,Liu,H,Che,Y,Wang,D,Wang,Z,Zhou | 2022-07-12 |
B9RkdM
XEQrN_v5
©Copyright 2005-. Northwest Institute of Eco-Environment and Resources, CAS.
Donggang West Road 320, Lanzhou, Gansu, China (730000)

