{
    "created": "2025-06-03 15:57:53",
    "updated": "2026-07-21 10:49:22",
    "id": "7dd767bb-1be8-414e-a800-3a4291bfd39d",
    "version": 11,
    "ds_topic": null,
    "title_cn": "面向AI-Ready的黄土高原坝地标准化语义分割数据集",
    "title_en": "AI-Ready Standardized Semantic Segmentation Dataset for the silted land formed by check dams in the Loess Plateau",
    "ds_abstract": "<p>&emsp;&emsp;淤地坝作为黄土高原地区关键的水土保持工程，兼具控制土壤侵蚀与保障粮食安全的核心功能，但其智能化管理长期受限于数据获取效率低、模型泛化能力不足及标准化数据集缺失等技术瓶颈。本研究依托吉林一号0.75 m高分辨率遥感影像，以黄土高原典型流域韭园沟为采样区，构建了一套面向AI-Ready的黄土高原坝地标准化语义分割数据集。本数据集通过系统化样本制备流程，涵盖格网划分、样本筛选、像素级语义标注及图像增强等关键环节，形成了包含2920个样本的高精度数据集，有效确保数据空间表征的完整性和算法泛化能力需求。数据质量评估实验显示，基于该数据集训练的DenseUnet等模型在验证集上表现优异，平均交并比（mIoU）超过80%，总体精度（OA）达89%以上，且与公开数据集相比，其坝体提取结果的空间匹配性和可靠性显著提升。本数据集实现了复杂地貌环境下坝体与背景的精细化语义区分，填补了坝地智能识别领域标准化数据集的空白，不仅为淤地坝高精度时空制图、溃坝风险动态评估及水土保持效应量化研究提供了关键数据支撑，还为人工智能驱动的水土保持工程优化开辟了新的技术路径，对推进黄河流域生态保护与高质量发展具有重要的实践价值。</p>",
    "ds_source": "<p>&emsp;&emsp;本研究采用吉林一号宽幅01A卫星PMS02传感器获取的0.75米高分辨率影像作为核心数据源（原始影像数据访问地址：https://www.jl1mall.com/store/ ）。经过多时相检索与质量评估，选定2021年11月11日无云覆盖影像（云量≤0%），该数据采用WGS 1984坐标系UTM 6°带投影，覆盖面积548.30 km²，太阳高度角29.7493°，传感器侧摆角-2°。</p>",
    "ds_process_way": "<p>&emsp;&emsp;该数据集加工具体流程分为六个阶段实施：</p>\n<p>&emsp;&emsp;（1）格网划分阶段，将样本区原始影像规则裁剪为256×256像素网格单元，共生成3689个初始样本；</p>\n<p>&emsp;&emsp;（2）样本筛选阶段，通过目视解译剔除不含坝地的无效样本，保留584个有效像素网格单元；</p>\n<p>&emsp;&emsp;（3）语义标注阶段，基于Labelme开源平台完成全样本矢量化标注，构建以“gully”为统一地物类别的语义分割数据集；</p>\n<p>&emsp;&emsp;（4）标签转换阶段，通过GDAL库将JSON矢量标注文件批量栅格化为二值图像，确保标签与原始影像空间坐标严格匹配；</p>\n<p>&emsp;&emsp;（5）数据增强阶段，对原始样本实施镜翻转、旋转及亮度调整等几何-辐射组合变换，将样本量扩展至2920个；</p>\n<p>&emsp;&emsp;（6）数据集划分阶段：将增强后的数据集按6：2：2的比例划分为训练集、验证集和测试集三个独立子集。该标准化流程兼顾样本空间表征完整性与算法泛化需求，其产出的多尺度增强数据集可支撑主流卷积神经网络进行特征学习。</p>",
    "ds_quality": "<p>&emsp;&emsp;研究团队通过野外实地调查（利用无人机影像和现场定位）校正遥感解译数据的空间偏差，确保坝地语义分割数据的准确性；随后设计系统性控制变量实验，借助多模型架构测试（如mIoU超过80和OA超过89），证实数据集类别划分清晰、标注质量高、样本分布均衡；与公开数据集对比显示，本研究数据集在空间精度和可靠性上显著提升（如提取结果与实际地物吻合度更高）。该数据集具备高精度、一致性强和适配性广泛的特征，为后续研究与应用提供了坚实可靠的数据支撑。</p>",
    "ds_acq_start_time": "2021-11-11 00:00:00",
    "ds_acq_end_time": "2021-11-11 00:00:00",
    "ds_acq_place": "韭园沟流域",
    "ds_acq_lon_east": 110.43333333333334,
    "ds_acq_lat_south": 37.5,
    "ds_acq_lon_west": 110.25,
    "ds_acq_lat_north": 37.61666666666667,
    "ds_acq_alt_low": null,
    "ds_acq_alt_high": null,
    "ds_share_type": "open-access",
    "ds_total_size": 548166363,
    "ds_files_count": 2,
    "ds_format": "*.tif*.jpg",
    "ds_space_res": "",
    "ds_time_res": "",
    "ds_coordinate": "WGS84",
    "ds_projection": "",
    "ds_thumbnail": "7dd767bb-1be8-414e-a800-3a4291bfd39d.png",
    "ds_thumb_from": 2,
    "ds_ref_way": "",
    "paper_ref_way": "",
    "ds_ref_instruction": "",
    "ds_from_station": null,
    "organization_id": "952adb3f-3ede-4a94-942a-7de772f1bfc5",
    "ds_serv_man": "李红星",
    "ds_serv_phone": "0931-4967592",
    "ds_serv_mail": "ncdc@lzb.ac.cn",
    "doi_value": "",
    "subject_codes": [
        "170.45"
    ],
    "quality_level": 3,
    "publish_time": "2025-06-03 16:20:52",
    "last_updated": "2026-06-01 11:14:11",
    "protected": false,
    "protected_to": null,
    "lang": "zh",
    "cstr": "11738.11.NCDC.DPSL.DB6865.2025",
    "i18n": {
        "en": {
            "title": "AI-Ready Standardized Semantic Segmentation Dataset for the silted land formed by check dams in the Loess Plateau",
            "ds_format": "",
            "ds_source": "<p>&emsp;&emsp;This study utilizes 0.75-meter high-resolution imagery acquired by the PMS-02 sensor onboard the Jilin-1 Wideband 01A satellite as the core data source (Original data access URL: https://www.jl1mall.com/store/). Following multi-temporal retrieval and quality assessment, a cloud-free image (cloud cover ≤0%) captured on November 11, 2021 was selected. This dataset employs the WGS 1984 UTM Zone 6N projection, covering an area of 548.30 km², with a solar elevation angle of 29.7493° and a sensor look angle of -2°.</p>",
            "ds_quality": "<p>&emsp;&emsp;The research team corrected spatial deviations in remote sensing interpretation data through field surveys (utilizing drone imagery and on-site positioning), ensuring the accuracy of check dam semantic segmentation data; subsequently, systematic controlled-variable experiments were designed, and multi-model architecture testing (e.g., mIoU exceeding 80% and OA surpassing 89%) confirmed the dataset's clear category delineation, high annotation quality, and balanced sample distribution; comparisons with public datasets demonstrated significant improvements in spatial precision and reliability for this study's dataset (e.g., extraction results showed higher alignment with actual features). This dataset exhibits high precision, strong consistency, and extensive compatibility, providing a solid and reliable data foundation for subsequent research and applications.</p>",
            "ds_ref_way": "",
            "ds_abstract": "<p>&emsp;=As a key water and soil conservation project in the Loess Plateau, check dams have core functions of controlling soil erosion and ensuring food security. However, their intelligent management has long been limited by low data acquisition efficiency, insufficient model generalization ability, and lack of standardized data sets. Technical bottleneck. This study relies on the 0.75 m high-resolution remote sensing image of Jilin-1 and takes Jiuyuangou, a typical basin of the Loess Plateau as the sampling area, and builds a set of AI-Ready standardized semantic segmentation data set for dam sites on the Loess Plateau. Through a systematic sample preparation process, this dataset covers key aspects such as grid division, sample screening, pixel-level semantic annotation and image enhancement, forming a high-precision dataset containing 2920 samples, effectively ensuring the integrity of data space representation and algorithm generalization ability requirements. Data quality assessment experiments show that models such as DenseUnet trained based on this dataset perform well on the verification set, with the average cross-to-merge ratio (mIoU) exceeding 80%, and the overall accuracy (OA) reaching more than 89%. Compared with the public dataset, the spatial matching and reliability of the dam body extraction results are significantly improved. This dataset realizes the refined semantic distinction between dam bodies and background in complex landform environments, fills the gap in standardized datasets in the field of intelligent identification of dams. It not only provides high-precision spatio-temporal mapping of check dams, dynamic assessment of dam failure risks, and water and soil conservation. Quantitative research on effects provides key data support, and also opens up a new technical path for artificial intelligence-driven optimization of soil and water conservation projects. It has important practical value for promoting ecological protection and high-quality development in the Yellow River Basin. </p>",
            "ds_time_res": "",
            "ds_acq_place": "jiuyuangou watershed",
            "ds_space_res": "",
            "ds_projection": "",
            "ds_process_way": "<p>&emsp;&emsp;The complete dataset preprocessing process is divided into six stages as follows:</p>\r\n<p>&emsp;&emsp;(1) Grid division stage: The original image of the sample area is regularly cropped into 256×256 pixel grid units, generating 3689 initial samples in total;</p>\r\n<p>&emsp;&emsp;(2) Sample screening stage: Invalid samples without dam land are eliminated via visual interpretation, retaining 584 effective pixel grid units;</p>\r\n<p>&emsp;&emsp;(3) Semantic annotation stage: Vector annotation for all samples is completed using the open-source platform Labelme, constructing a semantic segmentation dataset with \"gully\" as the unified feature category;</p>\r\n<p>&emsp;&emsp;(4) Label conversion stage: The JSON vector annotation files are batch-rasterized into binary images using the GDAL library to ensure strict spatial coordinate matching with the original images;</p>\r\n<p>&emsp;&emsp;(5) Data augmentation stage: Geometric-radiometric transformations (e.g., mirror flipping, rotation, and brightness adjustment) are applied to the original samples, expanding the sample size to 2920;</p>\r\n<p>&emsp;&emsp;(6) Dataset division stage: The augmented dataset is divided into three independent subsets (training set, validation set, test set) at a ratio of 6:2:2.\r\nThis standardized process balances the integrity of sample spatial representation and algorithm generalization needs, and the output multi-scale augmented dataset supports feature learning for mainstream convolutional neural networks.</p>",
            "ds_ref_instruction": ""
        }
    },
    "submit_center_id": "ncdc",
    "data_level": 0,
    "recommendation_value": 0,
    "license_type": "https://creativecommons.org/licenses/by/4.0/",
    "doi_reg_from": "reg_local",
    "cstr_reg_from": "reg_local",
    "doi_not_reg_reason": null,
    "cstr_not_reg_reason": null,
    "is_paper_in_submitting": false,
    "belong_to_nieer": false,
    "ds_topic_tags": [
        "语义分割",
        "黄土高原坝地",
        "水土保持"
    ],
    "ds_subject_tags": [
        "地理学"
    ],
    "ds_class_tags": [],
    "ds_locus_tags": [
        "韭园沟流域"
    ],
    "ds_time_tags": [
        2021
    ],
    "ds_contributors": [
        {
            "true_name": "刘景琦",
            "email": "liujingqi@nieer.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        },
        {
            "true_name": "黄波",
            "email": "huangbo@lzb.ac.cn",
            "work_for": "中国科学院西北生态环境研究院",
            "country": "中国"
        },
        {
            "true_name": "张耀南",
            "email": "yaonan@lzb.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        },
        {
            "true_name": "敏玉芳",
            "email": "myf@lzb.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        }
    ],
    "ds_meta_authors": [
        {
            "true_name": "刘景琦",
            "email": "liujingqi@nieer.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        },
        {
            "true_name": "黄波",
            "email": "huangbo@lzb.ac.cn",
            "work_for": "中国科学院西北生态环境研究院",
            "country": "中国"
        },
        {
            "true_name": "张耀南",
            "email": "yaonan@lzb.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        },
        {
            "true_name": "敏玉芳",
            "email": "myf@lzb.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        }
    ],
    "ds_managers": [
        {
            "true_name": "刘景琦",
            "email": "liujingqi@nieer.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        }
    ],
    "category": "其他"
}