{
    "created": "2025-05-28 18:32:37",
    "updated": "2026-04-20 09:47:16",
    "id": "9b9d8d44-4187-441d-8f97-f2ad10fa28fd",
    "version": 7,
    "ds_topic": null,
    "title_cn": "冰川湖图像数据集",
    "title_en": "Glacial Lake Image Dataset for \"Efficient glacial lake mapping by leveraging deep transfer learning and a new annotated glacial lake dataset\"",
    "ds_abstract": "<p>&emsp;&emsp;冰川湖是冰冻圈的重要组成部分，被认为是气候变化的关键哨兵。虽然卫星图像提供了一种直接的方法来监控其动态，但传统方法通常是主观且耗时的。深度学习技术虽然前景广阔，但一直受到标记冰川湖数据集稀缺的阻碍。为了解决这一限制，我们推出了 Glacial Lake 图像数据集 （GLID），这是同类数据集中第一个公开可用的集合。\n<p>&emsp;&emsp;该数据集包含 18,367 个（512 × 512 像素）样本对（湖泊多边形和相应的图像），这些样本对来自多个来源（WorldView-2、Sentinel-2、Landsat-8 和 Gaofen-2）的 36 个场景，覆盖整个喜马拉雅地区。然后，我们提出了一个用于冰川湖提取的可转移深度学习网络。我们的研究结果强调了高质量训练数据在模型性能中的关键作用。GLID 训练的模型取得了优异的结果，准确率为 95.36%，召回率为 87.50%，F1 评分为 91.66%，mIoU 为 82.07%。值得注意的是，这种方法在不同地区表现出有希望的可转移性，包括北美、南美、格陵兰岛和亚洲高山地区。GLID 数据集为推进基于机器学习的冰川制图研究提供了宝贵的资源。通过提供大规模、可公开访问的标记数据集合，我们旨在促进开发更准确、更高效的方法，以监测和了解气候变化对冰川湖生态系统的影响。",
    "ds_source": "<p>&emsp;&emsp;数据来源于：https://zenodo.org/records/14838695",
    "ds_process_way": "<p>&emsp;&emsp;GLID 的构建过程，包括数据预处理、数据集构建和后处理。\n<p>&emsp;&emsp;首先，对图像进行预处理，包括裁剪图像（用于空间范围为 150 km × 150 km 的手动注释）和选择波段（选择 R、G 和 B 波段）。所有图像均通过线性拉伸从原始 16 位转换为 8 位。请注意，所有图像的原始空间分辨率都保留了，没有重新采样，这可能会导致空间信息丢失，尤其是在缩小高分辨率数据（例如，WorldView-2 和 Gaofen-2）时。高亚洲冰川湖 （HAGL） 清单（Wang et al.， 2020）通过喜马拉雅地区进行裁剪，以促进进一步加工。在基于伦道夫冰川清单 （RGI） 6.0 的数据预处理中，还创建了一个 10 公里的冰川缓冲区（Pfeffer 等人，2014 年）。这种缓冲区后来用于非冰川湖过滤。\n<p>&emsp;&emsp;然后，在数据集构建过程中，以 2018 年的 HAGL 清单作为参考数据对冰川湖进行注释。对遗漏、不匹配和错误的冰川湖边界进行手动编辑，以获得与相应图像完全对齐的冰川湖标签。最后，实施了冰川湖的后处理。如前所述，为了排除非冰川湖，根据以前的研究使用了 10 公里的缓冲区（Chen et al.， 2021b， Tang et al.， 2024）。缓冲区外的冰川湖矢量被视为非冰川湖并消除。然后对冰川湖边界进行栅格化，并以相同的像素大小裁剪冰川湖标签和图像（例如，256 × 256、512 × 512 和 1024 × 1024）。样本裁剪后，没有有效值的标签（纯背景样本）被删除。\n<p>&emsp;&emsp;在对不同的样本量进行实验后，最终图像和标签在 Glacial Lake 图像数据集中为 512 × 512。",
    "ds_quality": "<p>&emsp;&emsp;",
    "ds_acq_start_time": null,
    "ds_acq_end_time": null,
    "ds_acq_place": "",
    "ds_acq_lon_east": null,
    "ds_acq_lat_south": null,
    "ds_acq_lon_west": null,
    "ds_acq_lat_north": null,
    "ds_acq_alt_low": null,
    "ds_acq_alt_high": null,
    "ds_share_type": "login-access",
    "ds_total_size": 8495124774,
    "ds_files_count": 6,
    "ds_format": "png,tif,excel,shp,",
    "ds_space_res": "",
    "ds_time_res": "",
    "ds_coordinate": "无",
    "ds_projection": "",
    "ds_thumbnail": "9b9d8d44-4187-441d-8f97-f2ad10fa28fd.jpg",
    "ds_thumb_from": 2,
    "ds_ref_way": "",
    "paper_ref_way": "",
    "ds_ref_instruction": "",
    "ds_from_station": null,
    "organization_id": "0a4269e1-65f4-45f1-aeba-88ea3068eebf",
    "ds_serv_man": "敏玉芳",
    "ds_serv_phone": "0931-4967596",
    "ds_serv_mail": "ncdc@lzb.ac.cn",
    "doi_value": "",
    "subject_codes": [
        "170.45"
    ],
    "quality_level": 3,
    "publish_time": "2025-05-29 16:16:27",
    "last_updated": "2026-01-14 10:40:44",
    "protected": false,
    "protected_to": null,
    "lang": "zh",
    "cstr": "11738.11.NCDC.ZENODO.DB6862.2025",
    "i18n": {
        "en": {
            "title": "Glacial Lake Image Dataset for \"Efficient glacial lake mapping by leveraging deep transfer learning and a new annotated glacial lake dataset\"",
            "ds_format": "",
            "ds_source": "<p>&emsp; &emsp; Data source: https://zenodo.org/records/14838695",
            "ds_quality": "<p>&emsp;&emsp;",
            "ds_ref_way": "",
            "ds_abstract": "<p>  Glacial lakes, crucial components of the cryosphere, are recognized as key sentinels of climate change. While satellite imagery offers a straightforward method for monitoring their dynamics, traditional approaches are often subjective and time-consuming. Deep learning techniques, though promising, have been hindered by the scarcity of labeled glacial lake datasets. To address this limitation, we present the Glacial Lake Image Dataset (GLID), the first publicly available collection of its kind. This dataset comprises 18,367 (512 × 512 pixels) sample pairs (lake polygons and corresponding images) derived from 36 scenes from across multiple sources (WorldView-2, Sentinel-2, Landsat-8, and Gaofen-2), covering the entire Himalayan region. We then propose a transferable deep learning network for glacial lake extraction. Our findings underscore the critical role of high-quality training data in model performance. The GLID-trained model achieved superior results, demonstrating a Precision of 95.36 %, Recall of 87.50 %, F1 score of 91.66 %, and mIoU of 82.07 %. Notably, this method exhibits promising transferability across diverse regions, including North America, South America, Greenland, and High Mountain Asia. The GLID dataset provides a valuable resource for advancing machine learning-based glacial mapping research. By offering a large-scale, publicly accessible collection of labeled data, we aim to facilitate the development of more accurate and efficient methods for monitoring and understanding the impacts of climate change on glacial lake ecosystems.</p>",
            "ds_time_res": "",
            "ds_acq_place": "",
            "ds_space_res": "",
            "ds_projection": "",
            "ds_process_way": "<p>&emsp; &emsp; The construction process of GLID includes data preprocessing, dataset construction, and post-processing.\n<p>&emsp; &emsp; Firstly, preprocess the image, including cropping the image (for manual annotation within a spatial range of 150 km x 150 km) and selecting bands (selecting R, G, and B bands). All images were converted from the original 16 bit to 8-bit through linear stretching. Please note that the original spatial resolution of all images has been preserved without resampling, which may result in loss of spatial information, especially when reducing high-resolution data such as WorldView-2 and Gaofen-2. The High Asian Glacial Lakes (HAGL) inventory (Wang et al., 2020) was trimmed through the Himalayan region to facilitate further processing. In the data preprocessing based on the Randolph Glacier Inventory (RGI) 6.0, a 10 kilometer glacier buffer zone was also created (Pfeffer et al., 2014). This buffer zone was later used for non glacial lake filtration.\n<p>&emsp; &emsp; Then, during the dataset construction process, the 2018 HAGL inventory was used as reference data to annotate the glacial lake. Manually edit missing, mismatched, and incorrect glacier lake boundaries to obtain glacier lake labels that are perfectly aligned with the corresponding images. Finally, post-treatment of glacial lakes was implemented. As mentioned earlier, in order to exclude non glacial lakes, a 10 kilometer buffer zone was used based on previous studies (Chen et al., 2021b, Tang et al., 2024). The vector of glacial lakes outside the buffer zone is considered as non glacial lakes and eliminated. Then rasterize the boundaries of the glacier lake and crop the glacier lake labels and images with the same pixel size (e.g. 256 × 256, 512 × 512, and 1024 × 1024). After sample cropping, labels without valid values (pure background samples) are removed.\n<p>&emsp; &emsp; After conducting experiments on different sample sizes, the final image and label were 512 × 512 in the Glacial Lake image dataset.",
            "ds_ref_instruction": ""
        }
    },
    "submit_center_id": "ncdc",
    "data_level": 0,
    "license_type": "CC BY 4.0",
    "doi_reg_from": "reg_outside",
    "cstr_reg_from": "reg_outside",
    "doi_not_reg_reason": null,
    "cstr_not_reg_reason": null,
    "ds_topic_tags": [
        "冰川湖制图",
        "冰川湖图像数据集 （GLID）",
        "机器学习",
        "深度迁移学习",
        "标记数据集"
    ],
    "ds_subject_tags": [
        "地理学"
    ],
    "ds_class_tags": [],
    "ds_locus_tags": [
        "北美",
        "南美",
        "格陵兰岛",
        "亚洲高山区"
    ],
    "ds_time_tags": [],
    "ds_contributors": [
        {
            "true_name": "马东晖",
            "email": "cuyi1457189zhi@163.com",
            "work_for": "南方科技大学",
            "country": "中国"
        }
    ],
    "ds_meta_authors": [
        {
            "true_name": "马东晖",
            "email": "cuyi1457189zhi@163.com",
            "work_for": "南方科技大学",
            "country": "中国"
        }
    ],
    "ds_managers": [
        {
            "true_name": "马东晖",
            "email": "cuyi1457189zhi@163.com",
            "work_for": "南方科技大学",
            "country": "中国"
        }
    ],
    "category": "冰川"
}