{
    "created": "2024-01-24 14:55:21",
    "updated": "2026-05-05 20:11:07",
    "id": "a00e30ec-7ad0-4573-a342-958cc01da645",
    "version": 8,
    "ds_topic": null,
    "title_cn": "中国PM2.5 1km空间分布数据集（2015-2018年）",
    "title_en": "China PM2.5 1km Spatial Distribution Dataset (2015-2018)",
    "ds_abstract": "<p>&emsp;&emsp;目前，在各种大气污染物的模拟中，独立痕量气体的模拟受到关键遥感产品分辨率不足的制约，导致模拟可靠性不足。本研究将空间采样和参数卷积相结合，利用地面观测、遥感产品、气象数据、援助数据和随机 ID 优化 LightGBM。通过上述技术和大气污染物序列模拟，我们得到了 2015-2018 年中国大部分地区 PM2.5 每日 1 公里分辨率的无缝产品。通过随机抽样、随机站点抽样、特定区域验证、不同模型比较以及不同研究的横向比较，我们验证了我们对多种大气污染物空间分布的模拟是可靠和有效的。",
    "ds_source": "<p>&emsp;&emsp;本研究使用的数据包括中国PM2.5每日地面监测数据。此外，还使用了遥感数据、气象数据和辅助数据。",
    "ds_process_way": "<p>&emsp;&emsp;基于随机ID、空间采用、参数卷积和其他方法的多污染物通用机器学习模型，可在预测大气污染物浓度变化时更好地考虑多种因素，并优化对污染物空间分布的估计。我们使用CV和视觉定性分析来评估模型结果。将LightGBM、LSTM 和RF-Ps与我们的模型进行比较，以评估其性能。最后，我们使用 SHAP 尝试解释模型的输出结果。",
    "ds_quality": "<p>&emsp;&emsp;随机样本的 CV 值为：PM2.5 的 R<sup>2</sup> 为 0.88，均方根误差为 9.91 µg/m<sup>3</sup>。结合 SHapley Additive exPlanations（SHAP）方法，明确了模拟过程中不同参数的作用，并证实了参数卷积的积极作用。",
    "ds_acq_start_time": "2015-01-01 00:00:00",
    "ds_acq_end_time": "2018-03-18 00:00:00",
    "ds_acq_place": "中国",
    "ds_acq_lon_east": null,
    "ds_acq_lat_south": null,
    "ds_acq_lon_west": null,
    "ds_acq_lat_north": null,
    "ds_acq_alt_low": null,
    "ds_acq_alt_high": null,
    "ds_share_type": "login-access",
    "ds_total_size": 49907647678,
    "ds_files_count": 1174,
    "ds_format": "gz 和 GeoTIFF",
    "ds_space_res": "1km",
    "ds_time_res": "年",
    "ds_coordinate": "WGS84",
    "ds_projection": "",
    "ds_thumbnail": "a00e30ec-7ad0-4573-a342-958cc01da645.png",
    "ds_thumb_from": 2,
    "ds_ref_way": "",
    "paper_ref_way": "",
    "ds_ref_instruction": "",
    "ds_from_station": null,
    "organization_id": "0a4269e1-65f4-45f1-aeba-88ea3068eebf",
    "ds_serv_man": "敏玉芳",
    "ds_serv_phone": "0931-4967596",
    "ds_serv_mail": "ncdc@lzb.ac.cn",
    "doi_value": "",
    "subject_codes": [
        "170.15"
    ],
    "quality_level": 3,
    "publish_time": "2024-01-29 10:39:41",
    "last_updated": "2026-01-14 10:08:29",
    "protected": false,
    "protected_to": null,
    "lang": "zh",
    "cstr": "11738.11.NCDC.ZENODO.DB4184.2024",
    "i18n": {
        "en": {
            "title": "China PM2.5 1km Spatial Distribution Dataset (2015-2018)",
            "ds_format": "gz and GeoTIFF",
            "ds_source": "<p>&emsp; &emsp; The data used in this study includes daily ground monitoring data of PM2.5 in China. In addition, remote sensing data, meteorological data, and auxiliary data were also used.",
            "ds_quality": "<p>&emsp; &emsp; The CV value of the random sample is: the R<sup>2</sup>of PM2.5 is 0.88, and the root mean square error is 9.91 µ g/m<sup>3</sup>. By combining the SHapley Additive exPlans (SHAP) method, the roles of different parameters in the simulation process were clarified, and the positive effect of parameter convolution was confirmed.",
            "ds_ref_way": "",
            "ds_abstract": "<p>    At present, in the simulation of various atmospheric pollutants, the simulation of independent trace gases is constrained by the insufficient resolution of key remote sensing products, resulting in insufficient simulation reliability. This study combines spatial sampling and parameter convolution to optimize LightGBM using ground observations, remote sensing products, meteorological data, aid data, and random IDs. Through the above techniques and simulation of atmospheric pollutant sequences, we obtained seamless products with a daily resolution of 1 kilometer for PM2.5 in most parts of China from 2015 to 2018. Through random sampling, random site sampling, specific area validation, comparison of different models, and horizontal comparison of different studies, we have verified that our simulation of the spatial distribution of various atmospheric pollutants is reliable and effective.</p>",
            "ds_time_res": "年",
            "ds_acq_place": "China",
            "ds_space_res": "1km",
            "ds_projection": "",
            "ds_process_way": "<p>&emsp; &emsp; A multi pollutant universal machine learning model based on random IDs, spatial adoption, parameter convolution, and other methods can better consider multiple factors when predicting changes in atmospheric pollutant concentrations and optimize the estimation of pollutant spatial distribution. We use CV and visual qualitative analysis to evaluate the model results. Compare LightGBM, LSTM, and RF Ps with our model to evaluate their performance. Finally, we use SHAP to attempt to explain the output results of the model.",
            "ds_ref_instruction": ""
        }
    },
    "submit_center_id": "ncdc",
    "data_level": 0,
    "license_type": "CC BY 4.0",
    "doi_reg_from": "reg_outside",
    "cstr_reg_from": "reg_outside",
    "doi_not_reg_reason": null,
    "cstr_not_reg_reason": null,
    "is_paper_in_submitting": false,
    "ds_topic_tags": [
        "空气污染物",
        "机器学习模型优化",
        "空气污染物空间分布产品",
        "SHAP"
    ],
    "ds_subject_tags": [
        "大气科学"
    ],
    "ds_class_tags": [],
    "ds_locus_tags": [
        "中国"
    ],
    "ds_time_tags": [
        2015,
        2016,
        2017,
        2018
    ],
    "ds_contributors": [
        {
            "true_name": "叶红",
            "email": "hye@iue.ac.cn",
            "work_for": "中国科学院城市环境研究所",
            "country": "中国"
        }
    ],
    "ds_meta_authors": [
        {
            "true_name": "叶红",
            "email": "hye@iue.ac.cn",
            "work_for": "中国科学院城市环境研究所",
            "country": "中国"
        }
    ],
    "ds_managers": [
        {
            "true_name": "叶红",
            "email": "hye@iue.ac.cn",
            "work_for": "中国科学院城市环境研究所",
            "country": "中国"
        }
    ],
    "category": "生态"
}