{
    "created": "2026-03-09 17:16:33",
    "updated": "2026-04-29 15:21:06",
    "id": "e44b23cc-0aad-43ec-90c5-9eb1882c696f",
    "version": 12,
    "ds_topic": null,
    "title_cn": "中国区域AI-Ready MODIS积雪覆盖比例数据集（2000-2022年）",
    "title_en": "ChinaAI-FSC: A Comprehensive AI-Ready MODIS Fractional Snow Cover Dataset for China (2000-2022)",
    "ds_abstract": "<p>&emsp;&emsp;ChinaAI-FSC，是中国区域首个大规模、标准化、面向人工智能应用的积雪覆盖比例（FSC）样本集，覆盖2000年至2022年共计22个积雪季，有效填补了长期积雪监测领域的关键空白。该数据集包含47,728个样本（每个样本为128×128的MODIS像元切片），可支持基于“点”和“面”尺度的AI-FSC建模。通过构建结构化、透明的技术流程——涵盖系统化的样本制备、严格的质量控制、时空样本划分以及标准化的元数据——确保了数据集在AI应用中的可复现性、物理一致性及互操作性。研究采用创新的“四层—四域—十五指标”评估体系，从数据、信息、系统和应用四个维度对数据集的可靠性及人工智能就绪度进行了系统评估。三个典型应用案例——（1）六种机器学习/深度学习模型（人工神经网络、支持向量回归、随机森林、卷积神经网络、UNet、残差网络）的基准测试；（2）标准MODIS FSC的验证；（3）全国无缝FSC制图——充分验证了ChinaAI-FSC的数据质量、可靠性与可用性。通过提供协调一致、经过验证且文档完备的样本，ChinaAI-FSC为AI驱动的积雪覆盖制图、长期监测及冰冻圈—水文模型构建奠定了统一的数据基础，有力推动了冰冻圈科学研究中可复现、可互操作及下一代研究范式的发展。",
    "ds_source": "<p>&emsp;&emsp;ChinaAI-FSC 数据集构建所采用的数据源主要包括三类：（1）高分辨率参考影像：Landsat-5 TM、Landsat-7 ETM+、Landsat-8 OLI、Landsat-9 OLI-2 的 Collection 2 Level-2 地表反射率产品，以及 Sentinel-2A/2B MSI Level-2A 地表反射率产品，来源于 USGS Earth Explorer 和 ESA Copernicus 数据平台，用于生成高精度积雪覆盖比例参考标签。（2）MODIS 数据：采用全球 500 米无缝地表反射率数据集（SDC500）获取 MODIS 波段 1–7 反射率，并结合 MOD10A1 标准积雪产品与 MCD12Q1 土地覆盖产品，作为模型输入特征的主要来源。（3）辅助数据：包括全球年度森林覆盖度数据（GLOBAMP FTC）、中国及周边地区每日 1 公里全天气地表温度数据（TRIMS LST）、SRTM 数字高程模型及其衍生的地形因子（高程、坡度、坡向、地形起伏度、地表粗糙度），以及经纬度和日序等地理与时间因子。所有数据均经过统一的投影转换与空间重采样，匹配至 MODIS 分辨率与地理坐标系。",
    "ds_process_way": "<p>&emsp;&emsp;（1）高分辨率参考积雪覆盖比例计算：基于改进的 SNOMAP 算法，利用 Landsat 与 Sentinel-2 影像生成 30 米分辨率的二值积雪图，并以 MODIS 像元为中心、1.5 倍像元大小为半径的邻域内，统计高分辨率积雪像元比例，得到 MODIS 尺度的积雪覆盖比例参考值。（2）特征变量提取：共提取 20 个特征变量，包括 MODIS 地表反射率（波段 1–7）、归一化差值积雪指数、归一化差值植被指数、土地覆盖类型、地表温度、森林覆盖度、地形因子（高程、坡度、坡向、地形起伏度、地表粗糙度）、经纬度及日序。（3）样本生成：将研究区划分为 0.64°×0.64° 的规则格网，每个格网对应 128×128 个 MODIS 像元，形成空间图块样本，共生成原始样本 166,763 个。（4）质量控制：从像元级数据有效性、光谱—积雪物理一致性、温度—积雪能量平衡一致性、地形调节一致性、土地覆盖与森林冠层效应一致性、跨变量一致性等六个维度，开展多层次、多约束的质量控制，最终筛选出 47,728 个高质量样本。（5）样本划分：按照空间不重叠原则，以 2:1:1 的比例将样本划分为训练集、验证集和测试集，确保模型评估的泛化能力。",
    "ds_quality": "<p>&emsp;&emsp;（1）物理一致性：基于 20 个特征变量与积雪覆盖比例之间的相关性分析，显示积雪覆盖比例与归一化差值积雪指数、可见光反射率呈显著正相关，与地表温度、短波红外反射率呈显著负相关，与高程、地形等因子呈现合理的物理关系，且空间上特征与积雪分布高度吻合，表明数据集具有良好的物理一致性。（2）独立验证：利用 2013–2020 年 507 个气象站的积雪深度观测数据对参考积雪覆盖比例进行验证，总体精度达 0.944，其中山地和森林区域的精度分别为 0.970 和 0.906，验证了参考标签的可靠性。（3）人工智能就绪度评估：创新性地构建了“四层—四域—十五指标”评估体系，从数据、信息、系统、应用四个维度对数据集进行全面评价，确认其在数据清洗、多源融合、元数据完备性、空间组织、算法适用性等方面均达到较高就绪水平。（4）代表性应用验证：通过六种主流机器学习与深度学习模型的基准测试、MODIS 标准产品对比验证、以及全国无缝积雪覆盖比例制图三个应用案例，进一步证实了数据集在支持高精度、大尺度、跨区域积雪覆盖比例建模方面的可靠性、代表性与泛化能力。",
    "ds_acq_start_time": "2000-01-01 00:00:00",
    "ds_acq_end_time": "2022-12-31 00:00:00",
    "ds_acq_place": "中国",
    "ds_acq_lon_east": 140.0,
    "ds_acq_lat_south": 15.0,
    "ds_acq_lon_west": 70.0,
    "ds_acq_lat_north": 50.0,
    "ds_acq_alt_low": null,
    "ds_acq_alt_high": null,
    "ds_share_type": "open-access",
    "ds_total_size": 65372543067,
    "ds_files_count": 95455,
    "ds_format": "*.tif",
    "ds_space_res": "0.005°",
    "ds_time_res": "日",
    "ds_coordinate": "WGS84",
    "ds_projection": "",
    "ds_thumbnail": "e44b23cc-0aad-43ec-90c5-9eb1882c696f.png",
    "ds_thumb_from": 2,
    "ds_ref_way": "",
    "paper_ref_way": "",
    "ds_ref_instruction": "",
    "ds_from_station": null,
    "organization_id": "0a4269e1-65f4-45f1-aeba-88ea3068eebf",
    "ds_serv_man": "李红星",
    "ds_serv_phone": "0931-4967592",
    "ds_serv_mail": "ncdc@lzb.ac.cn",
    "doi_value": "",
    "subject_codes": [
        "170.4510"
    ],
    "quality_level": 3,
    "publish_time": "2026-03-18 08:51:53",
    "last_updated": "2026-03-23 17:05:59",
    "protected": false,
    "protected_to": null,
    "lang": "zh",
    "cstr": "11738.11.NCDC.ZENODO.DB7177.2026",
    "i18n": {
        "en": {
            "title": "ChinaAI-FSC: A Comprehensive AI-Ready MODIS Fractional Snow Cover Dataset for China (2000-2022)",
            "ds_format": "*.tif",
            "ds_source": "<p>&emsp; &emsp; The ChinaAI-FSC dataset was constructed using three main categories of data sources. (1) High-resolution reference imagery: Landsat-5 TM, Landsat-7 ETM+, Landsat-8 OLI, and Landsat-9 OLI-2 Collection 2 Level-2 Surface Reflectance products, along with Sentinel-2A/2B MSI Level-2A Surface Reflectance products, were obtained from the USGS Earth Explorer and ESA Copernicus Data Space Ecosystem platforms, respectively. These images were used to generate high-accuracy fractional snow cover reference labels. (2) MODIS data: The seamless 500 m global land surface reflectance dataset (SDC500) was used to extract MODIS bands 1–7 reflectance. The standard MODIS snow product (MOD10A1) and land cover product (MCD12Q1) were also incorporated as input features. (3) Auxiliary data: These include the global annual fractional tree cover dataset (GLOBAMP FTC), the daily 1 km all-weather land surface temperature dataset over China and surrounding regions (TRIMS LST), the SRTM digital elevation model and its derived topographic attributes (elevation, slope, aspect, terrain relief, surface roughness), as well as geographic location (longitude and latitude) and Julian day. All datasets were reprojected and resampled to a unified geographic coordinate system and spatial resolution consistent with the MODIS products.",
            "ds_quality": "<p>&emsp; &emsp;(1) Physical consistency: Correlation analysis between the 20 feature variables and FSC revealed physically interpretable patterns, with FSC showing strong positive correlations with NDSI and visible reflectance bands, strong negative correlations with LST and shortwave-infrared reflectance, and reasonable relationships with elevation and topographic variables. Spatial coherence between FSC and key drivers was also visually confirmed, demonstrating robust physical consistency. (2) Independent validation: The reference FSC labels were validated against in-situ snow depth observations from 507 meteorological stations spanning 2013–2020, achieving an overall accuracy of 0.944. Stratified validation yielded accuracies of 0.970 in mountainous regions and 0.906 in forested areas, confirming the reliability of the reference labels across diverse surface conditions. (3) AI-readiness assessment: A novel \"Four Layers-Four Domains-Fifteen Attributes (4L-4D-15A)\" evaluation framework was introduced to assess dataset readiness from data, information, system, and application perspectives. The dataset achieved high readiness levels across all dimensions, confirming its suitability for direct use in AI workflows in terms of data cleaning, multi-source integration, metadata completeness, spatial organization, and algorithmic adaptability. (4) Representative applications: Three use cases—benchmarking of six ML/DL models, validation of the standard MODIS FSC product, and nationwide seamless FSC mapping—further demonstrated the dataset's reliability, representativeness, and generalization capability for supporting high-accuracy, large-scale snow cover modeling across heterogeneous regions.",
            "ds_ref_way": "",
            "ds_abstract": "<p>&emsp; &emsp; ChinaAI-FSC, the first large-scale, standardized, AI-ready fractional snow cover (FSC) sample collection for China, spanning 22 snow seasons from 2000 to 2022 and addressing a critical gap in long-term snow monitoring. The dataset consists of 47,728 samples (each 128×128 MODIS-pixel tiles),  enable both point-scale and tile-scale spatially contextualized AI modelling. A structured and transparent workflow, encompassing systematic sample preparation, rigorous quality control, spatiotemporal sample partitioning, and standardized metadata, ensures reproducibility, physical consistency, and interoperability across machine learning and deep learning applications. Dataset reliability and AI-readiness were systematically evaluated using a novel “Four Layers-Four Domains-Fifteen Attributes (4L-4D-15A)” assessment protocol, covering data, information, system, and application dimensions. The quality, reliability, and usability of ChinaAI-FSC were demonstrated through three representative use cases: (1) benchmarking of six ML/DL models (ANN, SVR, RF, CNN, UNet, and ResNet), (2) validation of the standard MODIS FSC product, and (3) nationwide seamless FSC mapping. By providing harmonized, validated, and well-documented samples, ChinaAI-FSC establishes a unified foundation for AI-driven snow cover mapping, long-term monitoring, and cryosphere–hydrological modelling, promoting reproducible, interoperable, and next-generation research in cryospheric science.",
            "ds_time_res": "日",
            "ds_acq_place": "China",
            "ds_space_res": "0.005°",
            "ds_projection": "",
            "ds_process_way": "<p>&emsp; &emsp; (1) High-resolution reference FSC calculation: Binary snow maps at 30 m resolution were first generated from Landsat and Sentinel-2 imagery using an improved SNOMAP algorithm. For each MODIS pixel, the reference FSC value was computed as the proportion of snow-covered high-resolution pixels within a circular neighborhood centered on the MODIS pixel centroid, with a radius equal to 1.5 times the MODIS pixel size. (2) Feature variable extraction: A total of 20 feature variables were derived, including MODIS surface reflectance (bands 1–7), Normalized Difference Snow Index (NDSI), Normalized Difference Vegetation Index (NDVI), land cover type, land surface temperature, fractional tree cover, topographic variables (elevation, slope, aspect, terrain relief, surface roughness), geographic coordinates, and Julian day. (3) Sample generation: The study area was divided into regular 0.64° × 0.64° grid tiles, each corresponding to 128 × 128 MODIS pixels, forming spatially contiguous tile-based samples. A total of 166,763 original samples were generated. (4) Quality control: A multi-level quality control procedure was implemented across six dimensions: pixel-level data validity, spectral-snow physical consistency, temperature-snow energy balance consistency, topographic modulation consistency, land cover and forest canopy effect consistency, and cross-variable consistency. After rigorous screening, 47,728 high-quality samples were retained. (5) Sample partitioning: Samples were divided into training, validation, and testing subsets following a spatially disjoint 2:1:1 ratio to ensure robust model generalization.",
            "ds_ref_instruction": ""
        }
    },
    "submit_center_id": "ncdc",
    "data_level": 0,
    "license_type": "CC BY 4.0",
    "doi_reg_from": "reg_outside",
    "cstr_reg_from": "reg_outside",
    "doi_not_reg_reason": null,
    "cstr_not_reg_reason": null,
    "is_paper_in_submitting": false,
    "ds_topic_tags": [
        "积雪覆盖比例 (FSC)",
        "AI-Ready",
        "ML/DL",
        "MODIS"
    ],
    "ds_subject_tags": [
        "自然地理学"
    ],
    "ds_class_tags": [],
    "ds_locus_tags": [
        "中国"
    ],
    "ds_time_tags": [
        2000,
        2001,
        2002,
        2003,
        2004,
        2005,
        2006,
        2007,
        2008,
        2009,
        2010,
        2011,
        2012,
        2013,
        2014,
        2015,
        2016,
        2017,
        2018,
        2019,
        2020,
        2021,
        2022
    ],
    "ds_contributors": [
        {
            "true_name": "侯金亮",
            "email": "jlhours@lzb.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        },
        {
            "true_name": "黄春林",
            "email": "huangcl@lzb.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        },
        {
            "true_name": "张莹",
            "email": "zhang_y@lzb.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        }
    ],
    "ds_meta_authors": [
        {
            "true_name": "侯金亮",
            "email": "jlhours@lzb.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        }
    ],
    "ds_managers": [
        {
            "true_name": "张莹",
            "email": "zhang_y@lzb.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        }
    ],
    "category": "积雪"
}