Glacial lakes, crucial components of the cryosphere, are recognized as key sentinels of climate change. While satellite imagery offers a straightforward method for monitoring their dynamics, traditional approaches are often subjective and time-consuming. Deep learning techniques, though promising, have been hindered by the scarcity of labeled glacial lake datasets. To address this limitation, we present the Glacial Lake Image Dataset (GLID), the first publicly available collection of its kind. This dataset comprises 18,367 (512 × 512 pixels) sample pairs (lake polygons and corresponding images) derived from 36 scenes from across multiple sources (WorldView-2, Sentinel-2, Landsat-8, and Gaofen-2), covering the entire Himalayan region. We then propose a transferable deep learning network for glacial lake extraction. Our findings underscore the critical role of high-quality training data in model performance. The GLID-trained model achieved superior results, demonstrating a Precision of 95.36 %, Recall of 87.50 %, F1 score of 91.66 %, and mIoU of 82.07 %. Notably, this method exhibits promising transferability across diverse regions, including North America, South America, Greenland, and High Mountain Asia. The GLID dataset provides a valuable resource for advancing machine learning-based glacial mapping research. By offering a large-scale, publicly accessible collection of labeled data, we aim to facilitate the development of more accurate and efficient methods for monitoring and understanding the impacts of climate change on glacial lake ecosystems.
| data size | 7.9 GiB |
|---|
Data source: https://zenodo.org/records/14838695
The construction process of GLID includes data preprocessing, dataset construction, and post-processing.
Firstly, preprocess the image, including cropping the image (for manual annotation within a spatial range of 150 km x 150 km) and selecting bands (selecting R, G, and B bands). All images were converted from the original 16 bit to 8-bit through linear stretching. Please note that the original spatial resolution of all images has been preserved without resampling, which may result in loss of spatial information, especially when reducing high-resolution data such as WorldView-2 and Gaofen-2. The High Asian Glacial Lakes (HAGL) inventory (Wang et al., 2020) was trimmed through the Himalayan region to facilitate further processing. In the data preprocessing based on the Randolph Glacier Inventory (RGI) 6.0, a 10 kilometer glacier buffer zone was also created (Pfeffer et al., 2014). This buffer zone was later used for non glacial lake filtration.
Then, during the dataset construction process, the 2018 HAGL inventory was used as reference data to annotate the glacial lake. Manually edit missing, mismatched, and incorrect glacier lake boundaries to obtain glacier lake labels that are perfectly aligned with the corresponding images. Finally, post-treatment of glacial lakes was implemented. As mentioned earlier, in order to exclude non glacial lakes, a 10 kilometer buffer zone was used based on previous studies (Chen et al., 2021b, Tang et al., 2024). The vector of glacial lakes outside the buffer zone is considered as non glacial lakes and eliminated. Then rasterize the boundaries of the glacier lake and crop the glacier lake labels and images with the same pixel size (e.g. 256 × 256, 512 × 512, and 1024 × 1024). After sample cropping, labels without valid values (pure background samples) are removed.
After conducting experiments on different sample sizes, the final image and label were 512 × 512 in the Glacial Lake image dataset.
This work is licensed under
CC BY 4.0 (Creative Commons Attribution 4.0 International License).
| # | title | file size |
|---|---|---|
| 1 | GLID.rar | 6.6 GiB |
| 2 | GLID_annotation.zip | 3.0 MiB |
| 3 | Optical_images_source.xlsx | 11.1 KiB |
| 4 | Transferability validation.zip | 304.3 MiB |
| 5 | val.zip | 1.0 GiB |
Glacial lake mapping Glacial Lake Image Dataset (GLID) Machine Learning Deep transfer learning Labeled dataset
North America South America Greenland and the high mountain regions of Asia
HlwtIT
oJI6A83g
©Copyright 2005-. Northwest Institute of Eco-Environment and Resources, CAS.
Donggang West Road 320, Lanzhou, Gansu, China (730000)

