{
    "created": "2023-09-21 16:56:38",
    "updated": "2026-04-27 19:45:57",
    "id": "f8eb2dc3-7336-48b9-aae1-98bd27fad833",
    "version": 11,
    "ds_topic": null,
    "title_cn": "基于多变量随机森林模型的中国1公里日环境PM2.5和O3浓度全覆盖数据集（2005-2017 年）",
    "title_en": "Full coverage dataset of 1-kilometer daily environmental PM2.5 and O3 concentrations in China based on multivariate random forest model (2005-2017)",
    "ds_abstract": "<p>&emsp;&emsp;近年来，细颗粒物（PM2.5）和环境臭氧（O<sub>3</sub>）的健康风险已得到广泛认可。对 PM2.5 和 O<sub>3</sub> 暴露的准确估计对于支持健康风险分析和环境政策制定非常重要。本数据集研究旨在构建高性能的随机森林模型，以 1 km × 1 km 的空间分辨率估算 2005-2017 年中国PM2.5 日平均浓度和臭氧日最大 8 h 平均浓度（O<sub>3</sub>-8 hmax）。模型变量包括气象变量、卫星数据、化学传输模型输出、地理变量和社会经济变量。建立了基于 10 倍交叉验证的随机森林模型，并进行了时空验证以评估模型性能。根据我们基于样本的划分方法，测试数据集的 PM2.5 日、月和年估计值的平均模型拟合 R<sub>2</sub> 值分别为 0.85、0.88 和 0.90；O<sub>3</sub>-8 hmax 的R<sub>2</sub> 值分别为 0.77、0.77 和 0.69。气象变量及其滞后值会显著影响 PM2.5 和 O<sub>3</sub>-8 hmax 的估计值。2005-2017 年期间，PM2.5 浓度总体呈下降趋势，而环境 O<sub>3</sub> 浓度则呈上升趋势。2005 年至 2017 年期间，PM2.5 和 O<sub>3</sub>-8 hmax 的空间模式几乎没有变化，但时间趋势具有空间特征。",
    "ds_source": "<p>&emsp;&emsp;本研究使用的模型变量主要包括用于 PM2.5 建模的 Aqua AOD、用于 O<sub>3</sub> 建模的 GEOS-Chem 化学传输模型输出以及 PM2.5 和 O<sub>3</sub>共享的一些变量。PM2.5 和 O<sub>3</sub> 共享的变量：13 个气象变量（包括边界层高度、表面气压、2 米露点温度、蒸发、反照率、低云量、中云量、高云量、总降水量、10 米 U 型分量、10 米 V 型分量、 此外还有地理和社会经济变量，如数字高程模型（DEM）、归一化差异植被指数（NDVI）、人口、国内生产总值（GDP）、公路网和虚拟变量（包括季节、月份和省份）。简而言之，大多数模型变量都是在 ArcGIS 10.2 和Python 2.7 中使用插值方法（如反距离加权和双线性算法）将其处理为基于标准网格的 1 km × 1 km 分辨率。AOD 由 ENVI 5.3+IDL 处理，使用 ArcPy 提取到标准网格，然后进行反距离加权插值，得到 1 km×1 km 分辨率的数据。对于长期变量，每天分配相应的月度和年度水平值。",
    "ds_process_way": "<p>&emsp;&emsp;本研究的模型变量包括 2013-2017 年的气象变量、地理变量、社会经济变量、卫星数据和化学传输模型输出。获得了 2013-2017 年 1479 个站点的 PM2.5 日均浓度和 O<sub>3</sub> 日最大8 小时平均浓度（O<sub>3</sub>-8 hmax）监测数据。在全国（35.55°N 至43.12°N，112.95°E 至 120.35°E）建立了 1 km × 1 km 的标准网格，共 9495025 个网格单元。网格坐标系为 WGS-84，网格投影为阿尔伯斯投影。我们构建了高性能随机森林模型（时间分辨率：日；空间分辨率：1 km × 1 km），并估算了 2005-2017 年中国的网格日均PM2.5 浓度和 O<sub>3</sub>-8 hmax 浓度。",
    "ds_quality": "<p>&emsp;&emsp;交叉验证结果表明，估算的 PM2.5 和 O<sub>3</sub>-8 hmax 浓度与观测到的 PM2.5 和 O<sub>3</sub>-8 hmax 浓度匹配合理，拟合检验-R<sub>2</sub> 值较高。根据基于样本的划分方法，PM2.5 日、月和年估计浓度的检验-R<sub>2</sub> 值分别为 0.85、0.88 和 0.90 同样，估算的每日、每月和每年 O<sub>3</sub>-8 hmax 浓度的检验-R<sub>3</sub> 值分别为 0.77、0.77 和 0.69。PM2.5 的日均均方根误差和最大均方根误差分别为 17.72μg/m3 和 9.37μg/m3；O<sub>3</sub>-8 hmax 的日均均方根误差和最大均方根误差分别\n为 23.10μg/m<sup>3</sup> 和 15.43μg/m<sup>3</sup>。在省/市层面，上海、北京、湖北、河北和四川的 PM2.5 估算结果以相对较高的检验 R2（≥ 0.90）排名前五，而西藏、青海、甘肃、安徽和云南的PM2.5 估算结果则以相对较低的检验 R2 值（< 0.70）排名较后。北京、重庆、上海、天津和河南的 O<sub>3</sub>-8 hmax 估测结果的检验 R<sub>2</sub> 值相对较高（≥ 0.83），排名前五；而甘肃、安徽、黑龙江、贵州和西藏的 O<sub>3</sub>-8 hmax 估测结果的检验 R<sub>2</sub> 值相对较低（＜ 0.62），排名较差。",
    "ds_acq_start_time": "2005-01-01 00:00:00",
    "ds_acq_end_time": "2017-12-31 00:00:00",
    "ds_acq_place": "中国",
    "ds_acq_lon_east": 120.51666666666667,
    "ds_acq_lat_south": 35.86666666666667,
    "ds_acq_lon_west": 112.91666666666667,
    "ds_acq_lat_north": 43.2,
    "ds_acq_alt_low": null,
    "ds_acq_alt_high": null,
    "ds_share_type": "login-access",
    "ds_total_size": 1053006181,
    "ds_files_count": 28,
    "ds_format": "csv",
    "ds_space_res": "",
    "ds_time_res": "",
    "ds_coordinate": "WGS84",
    "ds_projection": "",
    "ds_thumbnail": "f8eb2dc3-7336-48b9-aae1-98bd27fad833.png",
    "ds_thumb_from": 0,
    "ds_ref_way": "",
    "paper_ref_way": "",
    "ds_ref_instruction": "",
    "ds_from_station": null,
    "organization_id": "0a4269e1-65f4-45f1-aeba-88ea3068eebf",
    "ds_serv_man": "敏玉芳",
    "ds_serv_phone": "09314967596",
    "ds_serv_mail": "ncdc@lzb.ac.cn",
    "doi_value": "",
    "subject_codes": [
        "170.15",
        "170.45"
    ],
    "quality_level": 3,
    "publish_time": "2023-09-22 11:02:24",
    "last_updated": "2026-01-14 10:48:42",
    "protected": false,
    "protected_to": null,
    "lang": "zh",
    "cstr": "11738.11.NCDC.ZENODO.DB4015.2023",
    "i18n": {
        "en": {
            "title": "Full coverage dataset of 1-kilometer daily environmental PM2.5 and O3 concentrations in China based on multivariate random forest model (2005-2017)",
            "ds_format": "csv",
            "ds_source": "<p>&emsp;&emsp; The model variables used in this study mainly include Aqua AOD for PM2.5 modeling, GEOS-Chem chemical transport model output for O<sub>3</sub>modeling, and some variables shared by PM2.5 and O<sub>3</sub>. Variables shared by PM2.5 and O<sub>3</sub>: 13 meteorological variables (including boundary layer height, surface pressure, 2 meter dew point temperature, evaporation, albedo, low cloud cover, medium cloud cover, high cloud cover, total precipitation, 10 meter U-shaped component, 10 meter V-shaped component, as well as geographic and socio-economic variables such as Digital Elevation Model (DEM), Normalized Difference Vegetation Index (NDVI), population, Gross Domestic Product (GDP) Highway network and dummy variables (including season, month, and province). In short, most model variables are processed into 1 km based on standard grids using interpolation methods such as inverse distance weighting and bilinear algorithms in ArcGIS 10.2 and Python 2.7 ×  1 km resolution. AOD is processed by ENVI 5.3+IDL, extracted from standard grids using ArcPy, and then subjected to inverse distance weighted interpolation to obtain 1 km × 1 km resolution data. For long-term variables, corresponding monthly and annual level values are allocated daily.",
            "ds_quality": "<p>&emsp;&emsp; The cross validation results indicate that the estimated PM2.5 and O<sub>3</sub>-8 hmax concentrations match the observed PM2.5 and O<sub>3</sub>-8 hmax concentrations reasonably, and the fitting test - R<sub>2</sub>values are relatively high. According to the sample based partitioning method, the estimated daily, monthly, and annual concentrations of PM2.5 were tested with R<sub>2</sub>values of 0.85, 0.88, and 0.90, respectively. Similarly, the estimated daily, monthly, and annual O<sub>3</sub>-8 hmax concentrations were tested with R<sub>3</sub>values of 0.77, 0.77, and 0.69, respectively. The daily root mean square error and maximum root mean square error of PM2.5 are 17.72, respectively μ G/m3 and 9.37 μ G/m3; Daily root mean square error and maximum root mean square error for O<sub>3</sub>-8 hmax, respectively\n23.10 μ G/m<sup>3</sup>and 15.43</sup> μ G/m<sup>3</sup>. At the provincial/municipal level, the PM2.5 estimation results of Shanghai, Beijing, Hubei, Hebei, and Sichuan rank in the top five with relatively high test R2 (≥ 0.90), while the PM2.5 estimation results of Tibet, Qinghai, Gansu, Anhui, and Yunnan rank in the bottom with relatively low test R2 values (<0.70). The R<sub>2</sub>values of the O<sub>3</sub>-8 hmax estimation results in Beijing, Chongqing, Shanghai, Tianjin, and Henan are relatively high (≥ 0.83), ranking in the top five; However, the R<sub>2</sub>values of the O<sub>3</sub>-8 hmax estimation results in Gansu, Anhui, Heilongjiang, Guizhou, and Tibet are relatively low (<0.62), ranking poorly.",
            "ds_ref_way": "",
            "ds_abstract": "<p>   In recent years, the health risks of fine particulate matter (PM2.5) and environmental ozone (O<sub>3</sub>) have been widely recognized. Accurate estimation of PM2.5 and O<sub>3</sub>exposure is crucial for supporting health risk analysis and environmental policy formulation. The purpose of this dataset study is to construct a high-performance random forest model with a distance of 1 km × Estimating the daily average concentration of PM2.5 and the maximum 8-hour average concentration of ozone in China from 2005 to 2017 with a spatial resolution of 1 km (O<sub>3</sub>-8 hmax). Model variables include meteorological variables, satellite data, chemical transfer model outputs, geographic variables, and socio-economic variables. A random forest model based on 10 fold cross validation was established and spatiotemporal validation was conducted to evaluate model performance. According to our sample based partitioning method, the average model fitting R<sub>2</sub>values of daily, monthly, and annual estimates of PM2.5 in the test dataset are 0.85, 0.88, and 0.90, respectively; The R<sub>2</sub>values of O3-8 hmax were 0.77, 0.77, and 0.69, respectively. Meteorological variables and their hysteresis values will significantly affect the estimated values of PM2.5 and O<sub>3</sub>-8 hmax. During the period from 2005 to 2017, the concentration of PM2.5 showed an overall downward trend, while the concentration of environmental O<sub>3</sub>showed an upward trend. From 2005 to 2017, the spatial patterns of PM2.5 and O<sub>3</sub>-8 hmax remained almost unchanged, but the temporal trend exhibited spatial characteristics.</p>",
            "ds_time_res": "",
            "ds_acq_place": "China",
            "ds_space_res": "",
            "ds_projection": "",
            "ds_process_way": "<p>&emsp;&emsp; The model variables in this study include meteorological variables, geographic variables, socio-economic variables, satellite data, and chemical transfer model outputs from 2013 to 2017. We obtained monitoring data on PM2.5 daily average concentration and O<sub>3</sub>daily maximum 8-hour average concentration (O<sub>3</sub>-8 hmax) at 1479 stations from 2013 to 2017. 1 km has been established nationwide (35.55 ° N to 43.12 ° N, 112.95 ° E to 120.35 ° E) ×  A standard grid of 1 km, with a total of 9495025 grid units. The grid coordinate system is WGS-84, and the grid projection is the Albert projection. We constructed a high-performance random forest model (temporal resolution: daily; spatial resolution: 1 km) ×  1 km) and estimated the grid daily average PM2.5 concentration and O<sub>3</sub>-8 hmax concentration in China from 2005 to 2017.",
            "ds_ref_instruction": ""
        }
    },
    "submit_center_id": "ncdc",
    "data_level": 0,
    "license_type": "CC BY 4.0",
    "doi_reg_from": "reg_outside",
    "cstr_reg_from": "reg_outside",
    "doi_not_reg_reason": null,
    "cstr_not_reg_reason": null,
    "ds_topic_tags": [
        "随机森林",
        "PM2.5",
        "O3浓度"
    ],
    "ds_subject_tags": [
        "大气科学",
        "地理学"
    ],
    "ds_class_tags": [],
    "ds_locus_tags": [
        "中国"
    ],
    "ds_time_tags": [
        2005,
        2006,
        2007,
        2008,
        2009,
        2010,
        2011,
        2012,
        2013,
        2014,
        2015,
        2016,
        2017
    ],
    "ds_contributors": [
        {
            "true_name": "李甜甜",
            "email": "litiantian@nieh.chinacdc.cn",
            "work_for": "中国疾病预防控制中心环境与人口健康重点实验室",
            "country": "中国"
        }
    ],
    "ds_meta_authors": [
        {
            "true_name": "李甜甜",
            "email": "litiantian@nieh.chinacdc.cn",
            "work_for": "中国疾病预防控制中心环境与人口健康重点实验室",
            "country": "中国"
        }
    ],
    "ds_managers": [
        {
            "true_name": "李甜甜",
            "email": "litiantian@nieh.chinacdc.cn",
            "work_for": "中国疾病预防控制中心环境与人口健康重点实验室",
            "country": "中国"
        }
    ],
    "category": "其他"
}