In recent years, the health risks of fine particulate matter (PM2.5) and environmental ozone (O3) have been widely recognized. Accurate estimation of PM2.5 and O3exposure is crucial for supporting health risk analysis and environmental policy formulation. The purpose of this dataset study is to construct a high-performance random forest model with a distance of 1 km × Estimating the daily average concentration of PM2.5 and the maximum 8-hour average concentration of ozone in China from 2005 to 2017 with a spatial resolution of 1 km (O3-8 hmax). Model variables include meteorological variables, satellite data, chemical transfer model outputs, geographic variables, and socio-economic variables. A random forest model based on 10 fold cross validation was established and spatiotemporal validation was conducted to evaluate model performance. According to our sample based partitioning method, the average model fitting R2values of daily, monthly, and annual estimates of PM2.5 in the test dataset are 0.85, 0.88, and 0.90, respectively; The R2values of O3-8 hmax were 0.77, 0.77, and 0.69, respectively. Meteorological variables and their hysteresis values will significantly affect the estimated values of PM2.5 and O3-8 hmax. During the period from 2005 to 2017, the concentration of PM2.5 showed an overall downward trend, while the concentration of environmental O3showed an upward trend. From 2005 to 2017, the spatial patterns of PM2.5 and O3-8 hmax remained almost unchanged, but the temporal trend exhibited spatial characteristics.
| collect time | 2005/01/01 - 2017/12/31 |
|---|---|
| collect place | China |
| data size | 1004.2 MiB |
| data format | csv |
| Coordinate system | WGS84 |
The model variables used in this study mainly include Aqua AOD for PM2.5 modeling, GEOS-Chem chemical transport model output for O3modeling, and some variables shared by PM2.5 and O3. Variables shared by PM2.5 and O3: 13 meteorological variables (including boundary layer height, surface pressure, 2 meter dew point temperature, evaporation, albedo, low cloud cover, medium cloud cover, high cloud cover, total precipitation, 10 meter U-shaped component, 10 meter V-shaped component, as well as geographic and socio-economic variables such as Digital Elevation Model (DEM), Normalized Difference Vegetation Index (NDVI), population, Gross Domestic Product (GDP) Highway network and dummy variables (including season, month, and province). In short, most model variables are processed into 1 km based on standard grids using interpolation methods such as inverse distance weighting and bilinear algorithms in ArcGIS 10.2 and Python 2.7 × 1 km resolution. AOD is processed by ENVI 5.3+IDL, extracted from standard grids using ArcPy, and then subjected to inverse distance weighted interpolation to obtain 1 km × 1 km resolution data. For long-term variables, corresponding monthly and annual level values are allocated daily.
The model variables in this study include meteorological variables, geographic variables, socio-economic variables, satellite data, and chemical transfer model outputs from 2013 to 2017. We obtained monitoring data on PM2.5 daily average concentration and O3daily maximum 8-hour average concentration (O3-8 hmax) at 1479 stations from 2013 to 2017. 1 km has been established nationwide (35.55 ° N to 43.12 ° N, 112.95 ° E to 120.35 ° E) × A standard grid of 1 km, with a total of 9495025 grid units. The grid coordinate system is WGS-84, and the grid projection is the Albert projection. We constructed a high-performance random forest model (temporal resolution: daily; spatial resolution: 1 km) × 1 km) and estimated the grid daily average PM2.5 concentration and O3-8 hmax concentration in China from 2005 to 2017.
The cross validation results indicate that the estimated PM2.5 and O3-8 hmax concentrations match the observed PM2.5 and O3-8 hmax concentrations reasonably, and the fitting test - R2values are relatively high. According to the sample based partitioning method, the estimated daily, monthly, and annual concentrations of PM2.5 were tested with R2values of 0.85, 0.88, and 0.90, respectively. Similarly, the estimated daily, monthly, and annual O3-8 hmax concentrations were tested with R3values of 0.77, 0.77, and 0.69, respectively. The daily root mean square error and maximum root mean square error of PM2.5 are 17.72, respectively μ G/m3 and 9.37 μ G/m3; Daily root mean square error and maximum root mean square error for O3-8 hmax, respectively 23.10 μ G/m3and 15.43 μ G/m3. At the provincial/municipal level, the PM2.5 estimation results of Shanghai, Beijing, Hubei, Hebei, and Sichuan rank in the top five with relatively high test R2 (≥ 0.90), while the PM2.5 estimation results of Tibet, Qinghai, Gansu, Anhui, and Yunnan rank in the bottom with relatively low test R2 values (<0.70). The R2values of the O3-8 hmax estimation results in Beijing, Chongqing, Shanghai, Tianjin, and Henan are relatively high (≥ 0.83), ranking in the top five; However, the R2values of the O3-8 hmax estimation results in Gansu, Anhui, Heilongjiang, Guizhou, and Tibet are relatively low (<0.62), ranking poorly.
This work is licensed under
CC BY 4.0 (Creative Commons Attribution 4.0 International License).
| # | title | file size |
|---|---|---|
| 1 | 2005_pm.rar | 33.9 MiB |
| 2 | 2006_pm.rar | 33.9 MiB |
| 3 | 2007_pm.rar | 34.0 MiB |
| 4 | 2008_pm.rar | 34.1 MiB |
| 5 | 2009_pm.rar | 33.9 MiB |
| 6 | 2010_pm.rar | 33.8 MiB |
| 7 | 2011_pm.rar | 33.9 MiB |
| 8 | 2012_pm.rar | 34.0 MiB |
| 9 | 2013_pm.rar | 34.0 MiB |
| 10 | 2014_pm.rar | 33.8 MiB |
| # | category | title | author | year |
|---|---|---|---|---|
| 1 | paper | Full-coverage 1\,km daily ambient PM$_{2.5 | R,Ma,J,Ban,Q,Wang,Y,Zhang,Y,Yang,S,Li,W,Shi,Z,Zhou,J,Zang,T,Li | 2022 |
92iNe6
S1diforh
©Copyright 2005-. Northwest Institute of Eco-Environment and Resources, CAS.
Donggang West Road 320, Lanzhou, Gansu, China (730000)

