{
    "created": "2025-06-03 15:57:53",
    "updated": "2026-04-21 13:09:22",
    "id": "7dd767bb-1be8-414e-a800-3a4291bfd39d",
    "version": 7,
    "ds_topic": null,
    "title_cn": "面向AI-Ready的黄土高原坝地标准化语义分割数据集",
    "title_en": "AI-Ready Standardized Semantic Segmentation Dataset for the silted land formed by check dams in the Loess Plateau",
    "ds_abstract": "<p>&emsp;&emsp;淤地坝作为黄土高原地区关键的水土保持工程，兼具控制土壤侵蚀与保障粮食安全的核心功能，但其智能化管理长期受限于数据获取效率低、模型泛化能力不足及标准化数据集缺失等技术瓶颈。本研究依托吉林一号0.75米高分辨率遥感影像，以黄土高原典型流域韭园沟为采样区，构建了一套面向AI-Ready的黄土高原坝地标准化语义分割数据集。</p>",
    "ds_source": "<p>&emsp;&emsp;本研究采用吉林一号宽幅01A卫星PMS02传感器获取的0.75米高分辨率影像作为核心数据源（原始影像数据访问地址：https://www.jl1mall.com/store/ ）。经过多时相检索与质量评估，选定2021年11月11日无云覆盖影像（云量≤0%），该数据采用WGS 1984坐标系UTM 6°带投影，覆盖面积548.30 km²，太阳高度角29.7493°，传感器侧摆角-2°。</p>",
    "ds_process_way": "<p>&emsp;&emsp;该数据集加工具体流程分为六个阶段实施：\n</p>\n<p>&emsp;&emsp;（1）格网划分阶段，将样本区原始影像规则裁剪为256×256像素网格单元，共生成3689个初始样本；\n</p>\n<p>&emsp;&emsp;（2）样本筛选阶段，通过目视解译剔除不含坝地的无效样本，保留584个有效像素网格单元；\n</p>\n<p>&emsp;&emsp;（3）语义标注阶段，基于Labelme开源平台完成全样本矢量化标注，构建以“gully”为统一地物类别的语义分割数据集；\n</p>\n<p>&emsp;&emsp;（4）标签转换阶段，通过GDAL库将JSON矢量标注文件批量栅格化为二值图像，确保标签与原始影像空间坐标严格匹配；\n</p>\n<p>&emsp;&emsp;（5）数据增强阶段，对原始样本实施镜翻转、旋转及亮度调整等几何-辐射组合变换，将样本量扩展至2920个；\n</p>\n<p>&emsp;&emsp;（6）数据集划分阶段：将增强后的数据集按6：2：2的比例划分为训练集、验证集和测试集三个独立子集。该标准化流程兼顾样本空间表征完整性与算法泛化需求，其产出的多尺度增强数据集可支撑主流卷积神经网络进行特征学习。</p>",
    "ds_quality": "<p>&emsp;&emsp;研究团队通过野外实地调查（利用无人机影像和现场定位）校正遥感解译数据的空间偏差，确保坝地语义分割数据的准确性；随后设计系统性控制变量实验，借助多模型架构测试（如mIoU超过80和OA超过89），证实数据集类别划分清晰、标注质量高、样本分布均衡；与公开数据集对比显示，本研究数据集在空间精度和可靠性上显著提升（如提取结果与实际地物吻合度更高）。该数据集具备高精度、一致性强和适配性广泛的特征，为后续研究与应用提供了坚实可靠的数据支撑。</p>",
    "ds_acq_start_time": "2021-11-11 00:00:00",
    "ds_acq_end_time": "2021-11-11 00:00:00",
    "ds_acq_place": "韭园沟流域",
    "ds_acq_lon_east": 110.43333333333334,
    "ds_acq_lat_south": 37.61666666666667,
    "ds_acq_lon_west": 110.25,
    "ds_acq_lat_north": 37.5,
    "ds_acq_alt_low": null,
    "ds_acq_alt_high": null,
    "ds_share_type": "apply-access",
    "ds_total_size": 548166363,
    "ds_files_count": 2,
    "ds_format": "*.tif*.jpg",
    "ds_space_res": null,
    "ds_time_res": "",
    "ds_coordinate": "WGS84",
    "ds_projection": "",
    "ds_thumbnail": "7dd767bb-1be8-414e-a800-3a4291bfd39d.png",
    "ds_thumb_from": 2,
    "ds_ref_way": "",
    "paper_ref_way": "",
    "ds_ref_instruction": "",
    "ds_from_station": null,
    "organization_id": "52b7b79b-860c-49a5-9083-9a70cf8bed5a",
    "ds_serv_man": "李红星",
    "ds_serv_phone": "0931-4967592",
    "ds_serv_mail": "ncdc@lzb.ac.cn",
    "doi_value": "",
    "subject_codes": [],
    "quality_level": 3,
    "publish_time": "2025-06-03 16:20:52",
    "last_updated": "2025-06-04 09:16:07",
    "protected": false,
    "protected_to": null,
    "lang": "zh",
    "cstr": "11738.11.NCDC.DPSL.DB6865.2025",
    "i18n": {
        "en": {
            "title": "AI-Ready Standardized Semantic Segmentation Dataset for the silted land formed by check dams in the Loess Plateau",
            "ds_format": "",
            "ds_source": "<p>&emsp;&emsp;This study utilizes 0.75-meter high-resolution imagery acquired by the PMS-02 sensor onboard the Jilin-1 Wideband 01A satellite as the core data source (Original data access URL: https://www.jl1mall.com/store/). Following multi-temporal retrieval and quality assessment, a cloud-free image (cloud cover ≤0%) captured on November 11, 2021 was selected. This dataset employs the WGS 1984 UTM Zone 6N projection, covering an area of 548.30 km², with a solar elevation angle of 29.7493° and a sensor look angle of -2°.</p>",
            "ds_quality": "<p>&emsp;&emsp;The research team corrected spatial deviations in remote sensing interpretation data through field surveys (utilizing drone imagery and on-site positioning), ensuring the accuracy of check dam semantic segmentation data; subsequently, systematic controlled-variable experiments were designed, and multi-model architecture testing (e.g., mIoU exceeding 80% and OA surpassing 89%) confirmed the dataset's clear category delineation, high annotation quality, and balanced sample distribution; comparisons with public datasets demonstrated significant improvements in spatial precision and reliability for this study's dataset (e.g., extraction results showed higher alignment with actual features). This dataset exhibits high precision, strong consistency, and extensive compatibility, providing a solid and reliable data foundation for subsequent research and applications.</p>",
            "ds_ref_way": "",
            "ds_abstract": "<p>  As a critical soil and water conservation project in the Loess Plateau region, check dams play a central role in controlling soil erosion and ensuring regional food security. However, their intelligent management has long been constrained by technical bottlenecks such as inefficient data acquisition, insufficient model generalization capabilities, and lack of standardized datasets. This study utilized 0.75-meter high-resolution Jilin-1 remote sensing imagery to construct an AI-ready standardized semantic segmentation dataset targeting the Jiuyuangou watershed, a typical watershed in the Loess Plateau.</p>",
            "ds_time_res": "",
            "ds_acq_place": "jiuyuangou watershed",
            "ds_space_res": "",
            "ds_projection": "",
            "ds_process_way": "<p>&emsp;&emsp;The complete dataset preprocessing process is divided into six stages as follows:\n</p>\n<p>&emsp;&emsp;(1) Grid division stage: The original image of the sample area is regularly cropped into 256×256 pixel grid units, generating 3689 initial samples in total;\n</p>\n<p>&emsp;&emsp;(2) Sample screening stage: Invalid samples without dam land are eliminated via visual interpretation, retaining 584 effective pixel grid units;\n</p>\n<p>&emsp;&emsp;(3) Semantic annotation stage: Vector annotation for all samples is completed using the open-source platform Labelme, constructing a semantic segmentation dataset with \"gully\" as the unified feature category;\n</p>\n<p>&emsp;&emsp;(4) Label conversion stage: The JSON vector annotation files are batch-rasterized into binary images using the GDAL library to ensure strict spatial coordinate matching with the original images;\n</p>\n<p>&emsp;&emsp;(5) Data augmentation stage: Geometric-radiometric transformations (e.g., mirror flipping, rotation, and brightness adjustment) are applied to the original samples, expanding the sample size to 2920;\n</p>\n<p>&emsp;&emsp;(6) Dataset division stage: The augmented dataset is divided into three independent subsets (training set, validation set, test set) at a ratio of 6:2:2.\nThis standardized process balances the integrity of sample spatial representation and algorithm generalization needs, and the output multi-scale augmented dataset supports feature learning for mainstream convolutional neural networks.</p>",
            "ds_ref_instruction": ""
        }
    },
    "submit_center_id": "ncdc",
    "data_level": 0,
    "license_type": "CC BY 4.0",
    "doi_reg_from": "reg_local",
    "cstr_reg_from": "reg_local",
    "doi_not_reg_reason": null,
    "cstr_not_reg_reason": null,
    "ds_topic_tags": [
        "语义分割",
        "黄土高原坝地",
        "水土保持"
    ],
    "ds_subject_tags": [],
    "ds_class_tags": [],
    "ds_locus_tags": [
        "韭园沟流域"
    ],
    "ds_time_tags": [
        2021
    ],
    "ds_contributors": [
        {
            "true_name": "刘景琦",
            "email": "liujingqi@nieer.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        }
    ],
    "ds_meta_authors": [
        {
            "true_name": "刘景琦",
            "email": "liujingqi@nieer.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        }
    ],
    "ds_managers": [
        {
            "true_name": "李红星",
            "email": "lihongxing@lzb.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        }
    ],
    "category": "其他"
}