This dataset is the distribution data of permafrost burial depth in Northeast China, combined with measured drilling and pit exploration data. Driven by terrain factors, vegetation factors, meteorological factors, soil and hydrological factors, machine learning methods (random forest) are used to construct the model. The burial depth of permafrost is relatively small in the northern part of the Greater and Lesser Khingan Mountains, and relatively large in the southern and western parts of the Greater Khingan Mountains. The reliable accuracy enables this frozen soil distribution data to serve as a calibration benchmark and historical reference for simulating the depth distribution of permafrost in Northeast China under the background of global warming. The data format is GeoTIFF, with a spatial resolution of approximately 1km and a geographic coordinate system of WGS 1984.
| collect time | 2023/01/01 - 2024/12/31 |
|---|---|
| collect place | Northeast China |
| data size | 13.9 MiB |
| data format | *.tif |
| Data spatial resolution (/ M) | 1km |
| Data time resolution | |
| Coordinate system | WGS84 |
Actual drilling and pit exploration data; Environmental variable data: Five major categories of environmental variables including terrain, vegetation, climate, hydrology, and soil were selected as predictive factors.
Terrain factor: Extracting altitude, slope, aspect, terrain humidity index, terrain position index, and terrain undulation based on digital elevation model (DEM).
Vegetation/hydrological factors: Use Landsat 8 remote sensing products to extract normalized vegetation index (NDVI), enhanced vegetation index (EVI), and normalized water index (NDWI).
Meteorological factors: Surface temperature (LST) and precipitation data are based on product data, and are used to calculate melting and freezing indices as key intermediate variable inputs.
Data preprocessing: Perform spatial registration and standardization on all multi-source raster data. The unified geographic coordinate system is WGS1984, and the spatial range is cropped to the boundary of the study area. The spatial resolution of all variables is uniformly downscaled to 1km using resampling techniques, and the format is unified as GeoTIFF to ensure strict spatial matching of multi-source data.
Using ArcGIS' Extract Multi Values to Points feature, extract the environmental variable values corresponding to each sample point and construct a high-dimensional dataset of "sample environment features". The constructed sample dataset includes target variables and corresponding feature vectors. Perform integrity checks on the extracted results, eliminate samples containing missing values (NoData) or outliers, and ensure the quality of the input data for the model.
CatBoost model construction: Stratified Random Sampling is used to divide the dataset into a training set (70%) and a testing set (30%). Build a random forest classification model based on the scikit learn machine learning library in Python environment. To address the issue of sample imbalance, set the class_ceight parameter to 'balanced'. Optimize key hyperparameters through grid search, and ultimately determine the number of decision trees (n_estimators) to be 1000, the maximum depth (x_depth), and the minimum number of samples for node splitting (min_stamples_split), and fix the random seed (random_state) to ensure the reproducibility of the results. Use environmental variables as feature inputs and frozen soil types as labels for model training.
Using the CatBoost model to simulate the spatial distribution of soil moisture in the surface 60cm layer of permafrost in Northeast China, and then using the statistical relationship between 60cm soil moisture and active layer soil moisture to calculate the spatial distribution of active layer moisture, which is the weight moisture content. Finally, the thickness of the active layer in the entire watershed was simulated using the fitting function ALT=79.602 × w-0.41 between the measured thickness of the active layer and the weight moisture content (w), with units of cm. The maximum and minimum values of the simulated thickness of the active layer were 147cm and 62cm, respectively. Compared with the measured thickness of the active layer, the RMSE of the simulation result was approximately 30cm. The results show that the model has high accuracy.
The simulated thickness of the active layer in the entire watershed was obtained by fitting the function ALT=79.602 × w-0.41 between the measured thickness of the active layer and the weight moisture content (w), with the unit of cm. The maximum and minimum values of the simulated thickness of the active layer were 147cm and 62cm, respectively. Compared with the measured thickness of the active layer, the RMSE of the simulation result was approximately 30cm.
| # | number | name | type |
| 1 | 2022FY100700 | Survey of Permafrost Conditions and Freeze-Thaw Damage in the High-Latitude Regions of Northeast China | Basic Resource Survey Project |
This work is licensed under
CC BY 4.0 (Creative Commons Attribution 4.0 International License).
| # | title | file size |
|---|---|---|
| 1 | 东北1km多年冻土埋深图(2023-2024年).jpg | 2.1 MiB |
| 2 | 东北1km多年冻土埋深图(2023-2024年).tif | 11.7 MiB |
| 3 | 东北1km多年冻土埋深图(2023-2024年)_元数据.docx | 106.8 KiB |
| 4 | 东北1km多年冻土埋深图(2023-2024年)_说明文档.docx | 24.1 KiB |
dTc4x5
bBcmXGH1
©Copyright 2005-. Northwest Institute of Eco-Environment and Resources, CAS.
Donggang West Road 320, Lanzhou, Gansu, China (730000)

