{
    "created": "2024-08-23 11:59:36",
    "updated": "2026-05-03 08:38:30",
    "id": "59772e67-7fcb-42aa-a688-120b203fee15",
    "version": 7,
    "ds_topic": null,
    "title_cn": "印度长期每日地面颗粒物浓度（LongPMInd）重建数据集（1980-2022年）",
    "title_en": "Reconstructing long-term (1980–2022) daily ground particulate matter concentrations in India (LongPMInd)",
    "ds_abstract": "<p>&emsp;&emsp;该数据集包含 PM 的研究领域、特征重要性、空间和时间模式2.5和 PM10，以及估计年死亡率的不确定性。\n在这项工作中，开发了一种基于光梯度提升机 （LightGBM） 的简单结构化、高效且稳健的模型，以融合多源数据并估计印度长期 （1980-2022） 历史每日地面 PM 浓度 （LongPMInd）。LightGBM 模型在样本外、场外和年外交叉验证 （CV） 测试中显示出良好的准确性R2值分别为 0.77、0.70 和 0.66。PM 之间的性能差距较小2.5训练和测试（delta RMSE 为 1.06、3.83 和 7.74微克 m−3） 表示过拟合风险较低。具有很强的泛化能力，可公开访问、长期、高质量的每日 PM2.5和 PM10然后重建产品（10 公里，1980-2022 年）。这表明印度在印度恒河平原 （IGP） 经历了严重的 PM 污染，尤其是在冬季。自 2000 年以来，大多数地区的 PM 浓度显著增加。  转折点发生在 2018 年，当时印度政府启动了国家清洁空气计划，PM2.5大多数地区的浓度下降。重度 PM2.5污染导致归因过早死亡率持续增加，从 2000 年的 0.73 （95 % 置信区间 （CI） [0.65， 0.80]） 增加到 2019 年的 1.22 （95 % CI [1.03， 1.41]），特别是在 IGP 中，归因死亡率从 36 万增加到 60 万。LongPMInd 有可能支持空气质量管理、公共卫生倡议和应对气候变化工作的多种应用。",
    "ds_source": "<p>&emsp;&emsp;PM2.5和PM10的地面观测2018-2022年期间在印度的数据采集是从 CPCB 空气质量监测网络收集的（https://www.cpcb.nic.in）。 由于极值会影响模型稳健性，因此排除了底部和顶部 0.01% 的观测数据。使用了涵盖 1980-2022 年的第五代 ECMWF 大气再分析数据集 ERA5-Land。根据特征的相对重要性选择特征，而相对重要性则根据它们的增益计算，并包括几个相对重要性较高的气象因素。还收集了涵盖 1980-2022 年的现代研究和应用回顾性分析第 2 版 （MERRA-2） 的数据产品，数据包括气溶胶光学厚度和气溶胶成分和前体（黑碳、有机碳、硫酸盐、灰尘和二氧化硫)。",
    "ds_process_way": "<p>&emsp;&emsp;模型构建：在这项研究中，LightGBM是一种高效的梯度提升决策树（GBDT），用于估计PM2.5和PM10，使用网格搜索交叉验证（CV）方法来选择最佳超参数。设计了超参数选择算法（增补中的算法 S1）来保证模型的泛化能力。执行循环以增加模型复杂性，然后在模型预测的RMSE没有显着降低或训练和预测的RMSE 之间的差异没有显着增加时结束循环并返回超参数。根据特征的相对重要性来选择特征。10个气象特征、6个与排放相关的特征和总气溶胶消光用于训练LightGBM和估计PM浓度。",
    "ds_quality": "<p>&emsp;&emsp;MERRA-2 是由 NASA 发布和维护的全球空气污染再分析数据集;它已广泛用于印度地区的 PM 污染研究，其可靠性已得到广泛分析（Gueymard 和 Yang，2020 年;Navinya等人，2020 年;Buchard等人，2017 年）。对于 MERRA-2 AOD，使用 AERONET 观测的评估表明，MERRA-2 在大多数地区的表现优于哥白尼大气监测服务 （CAMS）（Gueymard 和 Yang，2020 年）。Kumar 等人（2023 年）预测了地面 PM2.5仅使用 MERRA-2 和机器学习方法在印度的浓度，证明了 MERRA-2 数据的可靠性。",
    "ds_acq_start_time": "1980-01-01 00:00:00",
    "ds_acq_end_time": "2022-12-31 00:00:00",
    "ds_acq_place": "印度",
    "ds_acq_lon_east": null,
    "ds_acq_lat_south": null,
    "ds_acq_lon_west": null,
    "ds_acq_lat_north": null,
    "ds_acq_alt_low": null,
    "ds_acq_alt_high": null,
    "ds_share_type": "login-access",
    "ds_total_size": 9085533876,
    "ds_files_count": 88,
    "ds_format": "nc",
    "ds_space_res": "",
    "ds_time_res": "",
    "ds_coordinate": "无",
    "ds_projection": "",
    "ds_thumbnail": "59772e67-7fcb-42aa-a688-120b203fee15.png",
    "ds_thumb_from": 2,
    "ds_ref_way": "",
    "paper_ref_way": "",
    "ds_ref_instruction": "",
    "ds_from_station": null,
    "organization_id": "0a4269e1-65f4-45f1-aeba-88ea3068eebf",
    "ds_serv_man": "敏玉芳",
    "ds_serv_phone": "0931-4967596",
    "ds_serv_mail": "ncdc@lzb.ac.cn",
    "doi_value": "",
    "subject_codes": [
        "170.15"
    ],
    "quality_level": 3,
    "publish_time": "2024-08-29 09:18:19",
    "last_updated": "2026-01-14 10:36:31",
    "protected": false,
    "protected_to": null,
    "lang": "zh",
    "cstr": "11738.11.NCDC.ZENODO.DB6667.2024",
    "i18n": {
        "en": {
            "title": "Reconstructing long-term (1980–2022) daily ground particulate matter concentrations in India (LongPMInd)",
            "ds_format": "nc",
            "ds_source": "<p>&emsp; &emsp; The ground observations of PM2.5 and PM10 during the period of 2018-2022 in India were collected from the CPCB air quality monitoring network（ https://www.cpcb.nic.in ）. Due to the impact of extreme values on model robustness, the bottom and top 0.01% of observed data were excluded. The fifth generation ECMWF atmospheric reanalysis dataset ERA5 Land covering the years 1980-2022 was used. Select features based on their relative importance, which is calculated based on their gain and includes several meteorological factors with higher relative importance. We also collected data products from the Modern Research and Application Retrospective Analysis Second Edition (MERRA-2) covering the years 1980-2022, including aerosol optical thickness and aerosol composition and precursors (black carbon, organic carbon, sulfates, dust, and sulfur dioxide).",
            "ds_quality": "<p>&emsp; &emsp; MERRA-2 is a global air pollution reanalysis dataset released and maintained by NASA; It has been widely used in PM pollution research in the Indian region, and its reliability has been extensively analyzed (Gueymard and Yang, 2020; Navinya et al., 2020; Buchard et al., 2017). For MERRA-2 AOD, evaluations using AERONET observations indicate that MERRA-2 performs better than Copernicus Atmospheric Monitoring Service (CAMS) in most regions (Gueymard and Yang, 2020). Kumar et al. (2023) predicted the concentration of ground PM2.5 in India using only MERRA-2 and machine learning methods, demonstrating the reliability of MERRA-2 data.",
            "ds_ref_way": "",
            "ds_abstract": "<p>    This dataset includes the research areas of PM, feature importance, spatial and temporal patterns 2.5 and PM10, as well as uncertainty in estimating annual mortality rates.\nIn this work, a simple structured, efficient, and robust model based on LightGBM was developed to fuse multi-source data and estimate India's long-term (1980-2022) historical daily ground PM concentration (LongPMIn). The LightGBM model showed good accuracy with R2 values of 0.77, 0.70, and 0.66 in out of sample, out of field, and out of year cross validation (CV) tests, respectively. The performance gap between PMs is small, and 2.5 training and testing (delta RMSE of 1.06, 3.83, and 7.74 micrograms m-3) indicate a low risk of overfitting. Has strong generalization ability, can publicly access, long-term, high-quality daily PM2.5 and PM10, and then reconstruct the product (10 kilometers, 1980-2022). This indicates that India has experienced severe PM pollution in the Indian Ganges Plain (IGP), especially during winter. Since 2000, PM concentrations have significantly increased in most regions. The turning point occurred in 2018, when the Indian government launched the National Clean Air Program, resulting in a decrease in PM2.5 concentrations in most areas. Severe PM2.5 pollution has led to a continuous increase in attributed premature mortality rates, rising from 0.73 (95% confidence interval (CI) [0.65, 0.80]) in 2000 to 1.22 (95% CI [1.03, 1.41]) in 2019, particularly in IGP where attributed mortality rates increased from 360000 to 600000. LongPMIn has the potential to support various applications in air quality management, public health initiatives, and climate change response.</p>",
            "ds_time_res": "",
            "ds_acq_place": "India",
            "ds_space_res": "",
            "ds_projection": "",
            "ds_process_way": "<p>&emsp; &emsp; Model construction: In this study, LightGBM is an efficient gradient boosting decision tree (GBDT) used to estimate PM2.5 and PM10, and the grid search cross validation (CV) method is used to select the optimal hyperparameters. Designed a hyperparameter selection algorithm (algorithm S1 in the supplement) to ensure the generalization ability of the model. Execute a loop to increase model complexity, and then end the loop and return hyperparameters when the RMSE predicted by the model does not significantly decrease or the difference between the RMSE trained and predicted does not significantly increase. Select features based on their relative importance. 10 meteorological features, 6 emission related features, and total aerosol extinction are used to train LightGBM and estimate PM concentration.",
            "ds_ref_instruction": ""
        }
    },
    "submit_center_id": "ncdc",
    "data_level": 0,
    "license_type": "CC BY 4.0",
    "doi_reg_from": "reg_outside",
    "cstr_reg_from": "reg_outside",
    "doi_not_reg_reason": null,
    "cstr_not_reg_reason": null,
    "is_paper_in_submitting": false,
    "ds_topic_tags": [
        "印度",
        "PM2.5",
        "PM10",
        "浓度"
    ],
    "ds_subject_tags": [
        "大气科学"
    ],
    "ds_class_tags": [],
    "ds_locus_tags": [
        "印度"
    ],
    "ds_time_tags": [
        1980,
        1981,
        1982,
        1983,
        1984,
        1985,
        1986,
        1987,
        1988,
        1989,
        1990,
        1991,
        1992,
        1993,
        1994,
        1995,
        1996,
        1997,
        1998,
        1999,
        2000,
        2001,
        2002,
        2003,
        2004,
        2005,
        2006,
        2007,
        2008,
        2009,
        2010,
        2011,
        2012,
        2013,
        2014,
        2015,
        2016,
        2017,
        2018,
        2019,
        2020,
        2021,
        2022
    ],
    "ds_contributors": [
        {
            "true_name": "张宏亮",
            "email": "zhanghl@fudan.edu.cn",
            "work_for": "复旦大学",
            "country": "中国"
        }
    ],
    "ds_meta_authors": [
        {
            "true_name": "张宏亮",
            "email": "zhanghl@fudan.edu.cn",
            "work_for": "复旦大学",
            "country": "中国"
        }
    ],
    "ds_managers": [
        {
            "true_name": "张宏亮",
            "email": "zhanghl@fudan.edu.cn",
            "work_for": "复旦大学",
            "country": "中国"
        }
    ],
    "category": "生态"
}