{
    "created": "2023-10-17 17:46:37",
    "updated": "2026-06-27 05:20:06",
    "id": "305c42b1-9f3b-4664-9a16-668aaf12670e",
    "version": 9,
    "ds_topic": null,
    "title_cn": "中国6小时PM2.5数据集重建（1960-2020年）",
    "title_en": "Reconstructing 6-hourly PM2.5 datasets from 1960 to 2020 in China（1960-2020）",
    "ds_abstract": "<p>&emsp;&emsp;几十年来，PM2.5改变了地球上的辐射平衡，增加了环境和健康风险，但直到2013年才在中国得到广泛监测。历史长期 PM2.5具有高时间分辨率的记录是必不可少的，但对于研究和环境管理都缺乏。在该数据集中，我们重建了一个基于站点的PM2.5从 1960 年到 2020 年每隔 6 小时一次的数据集，结合了长期能见度、常规气象观测、排放和高程。PM2.5每个站点的浓度都是根据先进的机器学习模型LightGBM估算的，该模型利用了周围20个气象站的空间特征。我们的模型在按年交叉验证（CV）方面的表现与以前的研究相当甚至更好（R2=0.7） 和空间 CV （R2=0.76），在长期记录和高时间分辨率方面更具优势。该模型还重建了一个 0.25°×0.25°、6 小时网格化 PM2.5数据集，通过合并空间要素。结果显示PM2.5污染逐渐恶化或在2010年之前从年代际规模持续，但在接下来的十年中有所缓解。尽管不同地区的转折点各不相同，但PM2.52013年后，由于清洁空气行动，关键区域的质量浓度显着下降。特别是PM2.5的年平均值2020年几乎处于1960年以来的历史最低值。该PM2.5数据集提供高分辨率的时空变化，为与空气污染、气候变化和大气化学再分析相关的研究奠定了基础。</p>",
    "ds_source": "<p>&emsp;&emsp;1.PM2.5观测数据：每小时PM2.5自2013-2020 年所有站点的数据均来自中国国家环境监测中心（CNEMC，http://www.cnemc.cn）； 2013 年之前 PM2.5美国驻北京和上海大使馆的测量值是用于独立验证评估（http://www.stateair.net/web/historical）；</p>\n<p>&emsp;&emsp;2.能见度和常规气象数据：从国家气象信息中心（NMIC）收集的气象观测数据包括 1960-2020 年间的 6 小时记录和 2013 年以后逐渐增加的小时记录；</p>\n<p>&emsp;&emsp;3.排放清单和海拔高度:1960-2012 年的历史人为排放量来自北京全球排放清单，该清单采用自下而上的方法编制，时间分辨率为 1 个月间隔（http://inventory.pku.edu.cn);2013-2020 年期间的当前人为排放量来自《中国多分辨率排放清单》（MEIC，http://meicmodel.org）;30 米高程数据来自全球数字高程模型（GDEM）第 2 版（https://earthexplorer.usgs.gov);</p>\n<p>&emsp;&emsp;4.辅助数据：月归一化差异植被指数（NDVI）产品从一级大气档案和分发系统分布式主动档案中心（LADDS DAAC，https://ladsweb.modaps.eosdis.nasa.gov）获取； 土地覆被分类数据来自国家地理信息资源目录服务系统（https://www.webmap.cn/mapDataAction.do?method=globalLandCover）； 人口数据来自世界网格人口第 4 版（GPWv4，https://sedac.ciesin.columbia.edu/data/collection/gpw-v4）。</p>",
    "ds_process_way": "<p>&emsp;&emsp;对于每个 PM2.5 站点，提取五个变量作为时间输入，包括年、月、日、时和年日。经度和纬度变量作为位置输入。将离每个 PM2.5 最近的气象站的能见度、相对湿度和温度作为基本气象输入，这两个站点之间的距离也被添加为一个特征。之前的研究开发了一种新颖的特征工程方法，通过提取空间特征将周边影响纳入其中。具体来说，每个 PM2.5 站点都匹配了除最近气象站之外的其余 19 个最近站点。从 19 个站点中选取了五个变量，包括经度、纬度、温度、能见度和相对湿度。然后，分别计算了这五个变量的最大值、最小值、平均值、偏度值和标准偏差。这些利用周边条件生成的特征也被视为输入。时空特征提取后，共有 71 个特征被用作模型训练的输入。为了在保证精度的前提下减少计算和训练时间，在小样本测试过程中按重要性排序的前 40 个特征被用于接下来的模型训练和后报。这些特征包括能见度、时间特征、空间特征、排放特征和海拔高度。</p>",
    "ds_quality": "<p>&emsp;&emsp;数据质量良好。</p>",
    "ds_acq_start_time": "1960-01-01 00:00:00",
    "ds_acq_end_time": "2020-12-31 00:00:00",
    "ds_acq_place": "中国",
    "ds_acq_lon_east": 135.0388888888889,
    "ds_acq_lat_south": 3.5,
    "ds_acq_lon_west": 73.5,
    "ds_acq_lat_north": 53.5,
    "ds_acq_alt_low": null,
    "ds_acq_alt_high": null,
    "ds_share_type": "login-access",
    "ds_total_size": 3878416718,
    "ds_files_count": 10,
    "ds_format": "nc",
    "ds_space_res": "0.25",
    "ds_time_res": "年",
    "ds_coordinate": "无",
    "ds_projection": "WGS_1984",
    "ds_thumbnail": "305c42b1-9f3b-4664-9a16-668aaf12670e.png",
    "ds_thumb_from": 0,
    "ds_ref_way": "",
    "paper_ref_way": "",
    "ds_ref_instruction": "",
    "ds_from_station": null,
    "organization_id": "0a4269e1-65f4-45f1-aeba-88ea3068eebf",
    "ds_serv_man": "敏玉芳",
    "ds_serv_phone": "09314967596",
    "ds_serv_mail": "ncdc@lzb.ac.cn",
    "doi_value": "",
    "subject_codes": [
        "170.15",
        "170.45"
    ],
    "quality_level": 3,
    "publish_time": "2023-10-23 10:47:07",
    "last_updated": "2026-01-14 11:06:21",
    "protected": false,
    "protected_to": null,
    "lang": "zh",
    "cstr": "11738.11.NCDC.ZENODO.DB4051.2023",
    "i18n": {
        "en": {
            "title": "Reconstructing 6-hourly PM2.5 datasets from 1960 to 2020 in China（1960-2020）",
            "ds_format": "",
            "ds_source": "<p>&emsp;&emsp; 1. PM2.5 observation data: Hourly PM2.5 data from all stations from 2013 to 2020 are from the National Environmental Monitoring Center of China (CNEMC), http://www.cnemc.cn ）; The measurement values of PM2.5 from the US embassies in Beijing and Shanghai before 2013 were used for independent verification and evaluation（ http://www.stateair.net/web/historical ）</p>\n<p>&emsp;&emsp; 2. Visibility and conventional meteorological data: Meteorological observation data collected from the National Meteorological Information Center (NMIC) includes 6-hour records from 1960 to 2020 and gradually increasing hourly records after 2013；</p>\n<p>&emsp;&emsp; 3. Emissions Inventory and Altitude: The historical anthropogenic emissions from 1960 to 2012 were derived from the Beijing Global Emissions Inventory, which was compiled using a bottom-up approach with a time resolution of 1 month intervals（ http://inventory.pku.edu.cn ）; The current anthropogenic emissions during the period 2013-2020 are derived from the China Multi Resolution Emissions Inventory (MEIC), http://meicmodel.org ）; 30 meter elevation data from the Global Digital Elevation Model (GDEM) 2nd edition（ https://earthexplorer.usgs.gov ）；</p>\n<p>&emsp;&emsp; 4. Auxiliary data: The Monthly Normalized Difference Vegetation Index (NDVI) product is sourced from the Distributed Active Archive Center (LADDS DAAC) of the Primary Atmospheric Archive and Distribution System, https://ladsweb.modaps.eosdis.nasa.gov ）Obtain; Land cover classification data from the National Geographic Information Resource Catalog Service System（ https://www.webmap.cn/mapDataAction.do?method=globalLandCover ）; Population data from the World Grid Population 4th Edition (GPWv4, https://sedac.ciesin.columbia.edu/data/collection/gpw-v4 ）</ P>",
            "ds_quality": "<p>&emsp;&emsp; The data quality is good</ P>",
            "ds_ref_way": "",
            "ds_abstract": "<p>  For decades, PM2.5 has changed the radiation balance on Earth, increasing environmental and health risks, but it was not until 2013 that it was widely monitored in China. Long term historical records of PM2.5 with high temporal resolution are essential, but they are lacking in research and environmental management. In this dataset, we reconstructed a site based dataset of PM2.5 every 6 hours from 1960 to 2020, combining long-term visibility, conventional meteorological observations, emissions, and elevation. The concentration of PM2.5 at each station is estimated based on the advanced machine learning model LightGBM, which utilizes the spatial characteristics of the surrounding 20 meteorological stations. Our model performs comparable or even better in annual cross validation (CV) compared to previous studies (R2=0.7) and spatial CV (R2=0.76), with advantages in long-term recording and high temporal resolution. The model also reconstructed a 0.25 ° × A 0.25 °, 6-hour grid PM2.5 dataset was created by merging spatial features. The results showed that PM2.5 pollution gradually deteriorated or continued from an interdecadal scale before 2010, but eased in the following decade. Although the turning points vary in different regions, PM2.5 has significantly decreased in key areas since 2013 due to clean air actions. Especially in 2020, the annual average of PM2.5 was almost at its lowest historical level since 1960. This PM2.5 dataset provides high-resolution spatiotemporal changes, laying the foundation for research related to air pollution, climate change, and atmospheric chemistry reanalysis</p>",
            "ds_time_res": "年",
            "ds_acq_place": "China",
            "ds_space_res": "0.25",
            "ds_projection": "",
            "ds_process_way": "<p>&emsp;&emsp; For each PM2.5 site, extract five variables as time inputs, including year, month, day, hour, and year day. Longitude and latitude variables are used as positional inputs. The visibility, relative humidity, and temperature of the nearest meteorological station to each PM2.5 are used as basic meteorological inputs, and the distance between these two stations is also added as a feature. Previous research has developed a novel feature engineering method that incorporates peripheral influences by extracting spatial features. Specifically, each PM2.5 site matches the other 19 closest stations except for the nearest weather station. Five variables were selected from 19 sites, including longitude, latitude, temperature, visibility, and relative humidity. Then, the maximum, minimum, average, skewness, and standard deviation of these five variables were calculated separately. These features generated using surrounding conditions are also considered inputs. After extracting spatiotemporal features, a total of 71 features were used as inputs for model training. In order to reduce computation and training time while ensuring accuracy, the top 40 features sorted by importance during the small sample testing process were used for subsequent model training and post reporting. These features include visibility, temporal characteristics, spatial characteristics, emission characteristics, and altitude</ P>",
            "ds_ref_instruction": ""
        }
    },
    "submit_center_id": "ncdc",
    "data_level": 0,
    "recommendation_value": 0,
    "license_type": "https://creativecommons.org/licenses/by/4.0/",
    "doi_reg_from": "reg_outside",
    "cstr_reg_from": "reg_outside",
    "doi_not_reg_reason": null,
    "cstr_not_reg_reason": null,
    "is_paper_in_submitting": false,
    "belong_to_nieer": false,
    "ds_topic_tags": [
        "PM2.5",
        "空气质量",
        "中国"
    ],
    "ds_subject_tags": [
        "大气科学",
        "地理学"
    ],
    "ds_class_tags": [],
    "ds_locus_tags": [
        "中国"
    ],
    "ds_time_tags": [
        1960,
        1961,
        1962,
        1963,
        1964,
        1965,
        1966,
        1967,
        1968,
        1969,
        1970,
        1971,
        1972,
        1973,
        1974,
        1975,
        1976,
        1977,
        1978,
        1979,
        1980,
        1981,
        1982,
        1983,
        1984,
        1985,
        1986,
        1987,
        1988,
        1989,
        1990,
        1991,
        1992,
        1993,
        1994,
        1995,
        1996,
        1997,
        1998,
        1999,
        2000,
        2001,
        2002,
        2003,
        2004,
        2005,
        2006,
        2007,
        2008,
        2009,
        2010,
        2011,
        2012,
        2013,
        2014,
        2015,
        2016,
        2017,
        2018,
        2019,
        2020
    ],
    "ds_contributors": [
        {
            "true_name": "张晓叶",
            "email": "xiaoye@cma.gov.cn",
            "work_for": "中国气象科学研究院",
            "country": "中国"
        }
    ],
    "ds_meta_authors": [
        {
            "true_name": "张晓叶",
            "email": "xiaoye@cma.gov.cn",
            "work_for": "中国气象科学研究院",
            "country": "中国"
        }
    ],
    "ds_managers": [
        {
            "true_name": "张晓叶",
            "email": "xiaoye@cma.gov.cn",
            "work_for": "中国气象科学研究院",
            "country": "中国"
        }
    ],
    "category": "大气本底"
}