Glacial Lake Image Dataset for "Efficient glacial lake mapping by leveraging deep transfer learning and a new annotated glacial lake dataset"

English name

CSTR

CSTR:11738.11.NCDC.ZENODO.DB6862.2025

DOI

10.5281/zenodo.14838695

Share type

Category

Glacier

contributors

MA Donghui

Datasets description

Glacial lakes, crucial components of the cryosphere, are recognized as key sentinels of climate change. While satellite imagery offers a straightforward method for monitoring their dynamics, traditional approaches are often subjective and time-consuming. Deep learning techniques, though promising, have been hindered by the scarcity of labeled glacial lake datasets. To address this limitation, we present the Glacial Lake Image Dataset (GLID), the first publicly available collection of its kind. This dataset comprises 18,367 (512 × 512 pixels) sample pairs (lake polygons and corresponding images) derived from 36 scenes from across multiple sources (WorldView-2, Sentinel-2, Landsat-8, and Gaofen-2), covering the entire Himalayan region. We then propose a transferable deep learning network for glacial lake extraction. Our findings underscore the critical role of high-quality training data in model performance. The GLID-trained model achieved superior results, demonstrating a Precision of 95.36 %, Recall of 87.50 %, F1 score of 91.66 %, and mIoU of 82.07 %. Notably, this method exhibits promising transferability across diverse regions, including North America, South America, Greenland, and High Mountain Asia. The GLID dataset provides a valuable resource for advancing machine learning-based glacial mapping research. By offering a large-scale, publicly accessible collection of labeled data, we aim to facilitate the development of more accurate and efficient methods for monitoring and understanding the impacts of climate change on glacial lake ecosystems.

Base information

data size	7.9 GiB

Data source description

Data source: https://zenodo.org/records/14838695

Data processing method

The construction process of GLID includes data preprocessing, dataset construction, and post-processing.

Firstly, preprocess the image, including cropping the image (for manual annotation within a spatial range of 150 km x 150 km) and selecting bands (selecting R, G, and B bands). All images were converted from the original 16 bit to 8-bit through linear stretching. Please note that the original spatial resolution of all images has been preserved without resampling, which may result in loss of spatial information, especially when reducing high-resolution data such as WorldView-2 and Gaofen-2. The High Asian Glacial Lakes (HAGL) inventory (Wang et al., 2020) was trimmed through the Himalayan region to facilitate further processing. In the data preprocessing based on the Randolph Glacier Inventory (RGI) 6.0, a 10 kilometer glacier buffer zone was also created (Pfeffer et al., 2014). This buffer zone was later used for non glacial lake filtration.

Then, during the dataset construction process, the 2018 HAGL inventory was used as reference data to annotate the glacial lake. Manually edit missing, mismatched, and incorrect glacier lake boundaries to obtain glacier lake labels that are perfectly aligned with the corresponding images. Finally, post-treatment of glacial lakes was implemented. As mentioned earlier, in order to exclude non glacial lakes, a 10 kilometer buffer zone was used based on previous studies (Chen et al., 2021b, Tang et al., 2024). The vector of glacial lakes outside the buffer zone is considered as non glacial lakes and eliminated. Then rasterize the boundaries of the glacier lake and crop the glacier lake labels and images with the same pixel size (e.g. 256 × 256, 512 × 512, and 1024 × 1024). After sample cropping, labels without valid values (pure background samples) are removed.

After conducting experiments on different sample sizes, the final image and label were 512 × 512 in the Glacial Lake image dataset.

Citations and annotations

In order to protect the rights and interests of the platform's scientific and technological resources, expand the services of the platform center, and enhance the application potential of scientific and technological resources, resource users are requested to mark and quote the research results (including published papers, treatises, data products, unpublished research reports, data products, etc.) generated by the use of resources in the following manner.

For the achievements published in English, please refer to the following specifications： The dataset is provided by National Cryosphere Desert Data Center. (http://www.ncdc.ac.cn).

Data reference

MA Donghui. Glacial Lake Image Dataset for "Efficient glacial lake mapping by leveraging deep transfer learning and a new annotated glacial lake dataset". National Cryosphere Desert Data Center(http://www.ncdc.ac.cn), 2025. https://cstr.cn/CSTR:11738.11.NCDC.ZENODO.DB6862.2025.
MA Donghui. Glacial Lake Image Dataset for "Efficient glacial lake mapping by leveraging deep transfer learning and a new annotated glacial lake dataset". National Cryosphere Desert Data Center(http://www.ncdc.ac.cn), 2025. https://www.doi.org/10.5281/zenodo.14838695.

EndnoteEN BibtexEN RISEN

Papers to be cited

Ma Donghui, Li Jie, Jiang Liguang. Efficient glacial lake mapping by leveraging deep transfer learning and a new annotated glacial lake dataset[J]. Journal of Hydrology, 2025, 657:133072. DOI: https://doi.org/10.1016/j.jhydrol.2025.133072.

Reference format： Endnote Bibtex RIS

license agreement

This work is licensed under CC BY 4.0 (Creative Commons Attribution 4.0 International License).

Relevant data
File list

#	Dataset title
1	High spatial resolution satellite images for glacier outlines
2	Rapid glacier Shrinkage and Glacial Lake Expansion of a China-Nepal Transboundary Catchment in the Central Himalayas dataset（1964-2020）
3	Tibetan Plateau Alpine Desert Plant Image Dataset (2017-2021)
4	Tarim the Junggar Basin desert plant image dataset (2017-2021)
5	Alxa Plateau Hexi Corridor Desert Plant Image Dataset (2017-2021)
6	West Ordos Plateau - Desert Plant Image Dataset at the North Foot of Yinshan Mountain (2017-2021)
7	road surface conditions dataset（2023-2024）
8	Glacial lakes in the Qinghai Tibet Plateau and adjacent areas 1:2 million (2008)
9	Asian Seasonal Rice Yield Dataset (1995-2015)
10	Numerical Simulation Dataset of Loess Slope Landslide Slip Lines Under Extreme Rainfall Conditions

#	title	file size
1	GLID.rar	6.6 GiB
2	GLID_annotation.zip	3.0 MiB
3	Optical_images_source.xlsx	11.1 KiB
4	Transferability validation.zip	304.3 MiB
5	val.zip	1.0 GiB

add comment

Scan and browse on mobile

How to get the data

登录后下载

Theme:
Glacial lake mapping Glacial Lake Image Dataset (GLID) Machine Learning Deep transfer learning Labeled dataset
Subject:
geography
Place:
North America South America Greenland and the high mountain regions of Asia

east

west

south

north

Export metadata

Contact information

contributors:: MA Donghui
author:: MA Donghui
manager:: MA Donghui

Platform Service information

contacts:: Li Hongxing
phone:: 0931-4967592
mailbox:: ncdc@lzb.ac.cn