{
    "created": "2026-04-03 10:41:13",
    "updated": "2026-05-18 16:37:53",
    "id": "556b3cac-b9fd-4523-a890-cb404d101824",
    "version": 10,
    "ds_topic": null,
    "title_cn": "东北1km多年冻土埋深图（2023-2024年）",
    "title_en": "11km permafrost depth map of Northeast China (2023-2024)",
    "ds_abstract": "<p>&emsp;&emsp;本数据集为东北地区多年冻土埋深分布数据，结合实测钻孔和坑探数据，以地形因子、植被因子、气象因子、土壤与水文因子数据为驱动，采用机器学习方法（随机森林）进行模型构建。多年冻土埋深在大小兴安岭北部较小，在大兴安岭南部和西部埋深较大。可靠的精度使得此冻土分布数据可以作为全球变暖背景下东北地区多年冻土埋深分布模拟的标定基准和历史参考。数据格式为GeoTIFF，空间分辨率约1km，地理坐标系为WGS 1984。",
    "ds_source": "<p>&emsp;&emsp;实测钻孔和坑探数据；环境变量数据：选取了地形、植被、气候、水文及土壤五大类环境变量作为预测因子。\n<p>&emsp;&emsp;地形因子：基于数字高程模型（DEM）提取海拔、坡度、坡向、地形湿度指数、地形位置指数及地形起伏度。\n<p>&emsp;&emsp;植被/水文因子：利用Landsat8遥感产品提取归一化植被指数（NDVI）、增强型植被指数（EVI）、归一化水体指数（NDWI）。\n<p>&emsp;&emsp;气象因子：地表温度（LST）和降水数据则基于产品数据，并以此推算融化指数、冻结指数，作为关键中间变量输入。",
    "ds_process_way": "<p>&emsp;&emsp;数据预处理：对所有多源栅格数据进行空间配准与标准化处理。统一地理坐标系为 WGS_1984，将空间范围裁剪至研究区边界，并采用重采样技术将所有变量的空间分辨率统一降尺度至1km，格式统一为GeoTIFF，确保多源数据在空间上的严格匹配。\n<p>&emsp;&emsp;利用ArcGIS的多值提取至点（Extract Multi-Values to Points）功能，提取每个样本点对应的环境变量数值，构建“样本-环境特征”高维数据集。构建的样本数据集包含目标变量及对应的特征向量。对提取结果进行完整性检查，剔除含有缺失值（NoData）或异常值的样本，确保模型输入数据的质量。\n<p>&emsp;&emsp;CatBoost模型构建：采用分层随机抽样法（Stratified Random Sampling），将数据集划分为训练集（70%）和测试集（30%）。基于Python环境下的scikit-learn机器学习库构建随机森林分类模型。针对样本不平衡问题，将class_weight参数设为 'balanced'。通过网格搜索对关键超参数进行优化，最终确定决策树数量（n_estimators）为1000，最大深度（max_depth）及节点分裂最小样本数（min_samples_split）等参数，并固定随机种子（random_state）以保证结果的可重复性。将环境变量作为特征输入，冻土类型作为标签进行模型训练。\n<p>&emsp;&emsp;利用CatBoost模型模拟得到东北多年冻土区表层60cm土壤水分空间分布，然后利用60cm土壤水分与活动层土壤水分的统计关系计算得到活动层水分空间分布，活动层水分为重量含水率。最后，通过实测活动层厚度与重量含水率(w)的拟合函数ALT=79.602×w-0.41模拟得到的整个流域内活动层厚度，单位为cm。模拟的活动层厚度最大值为147cm，最小值为62cm，与实测活动层厚度对比，模拟结果的RMSE≈30cm。结果显示，模型具有较高的精度。",
    "ds_quality": "<p>&emsp;&emsp;通过实测活动层厚度与重量含水率(w)的拟合函数ALT=79.602×w-0.41模拟得到的整个流域内活动层厚度，单位为cm。模拟的活动层厚度最大值为147cm，最小值为62cm，与实测活动层厚度对比，模拟结果的RMSE≈30cm。",
    "ds_acq_start_time": "2023-01-01 00:00:00",
    "ds_acq_end_time": "2024-12-31 00:00:00",
    "ds_acq_place": "中国东北地区",
    "ds_acq_lon_east": 135.08916666666667,
    "ds_acq_lat_south": 38.730555555555554,
    "ds_acq_lon_west": 111.15833333333335,
    "ds_acq_lat_north": 53.558055555555555,
    "ds_acq_alt_low": null,
    "ds_acq_alt_high": null,
    "ds_share_type": "login-access",
    "ds_total_size": 14607163,
    "ds_files_count": 3,
    "ds_format": "*.tif",
    "ds_space_res": "1km",
    "ds_time_res": "2年",
    "ds_coordinate": "WGS84",
    "ds_projection": "WGS_1984_Albers",
    "ds_thumbnail": "556b3cac-b9fd-4523-a890-cb404d101824.jpg",
    "ds_thumb_from": 2,
    "ds_ref_way": "",
    "paper_ref_way": "",
    "ds_ref_instruction": "",
    "ds_from_station": null,
    "organization_id": "221ebf56-1b0b-4574-972b-1fb6d3cf1be7",
    "ds_serv_man": "敏玉芳",
    "ds_serv_phone": "0931-4967596",
    "ds_serv_mail": "ncdc@lzb.ac.cn",
    "doi_value": "",
    "subject_codes": [
        "170.45"
    ],
    "quality_level": 3,
    "publish_time": "2026-04-03 15:20:45",
    "last_updated": "2026-05-12 11:12:22",
    "protected": false,
    "protected_to": null,
    "lang": "zh",
    "cstr": "11738.11.NCDC.NIEER.DB7271.2026",
    "i18n": {
        "en": {
            "title": "11km permafrost depth map of Northeast China (2023-2024)",
            "ds_format": "*.tif",
            "ds_source": "<p>&emsp;Actual drilling and pit exploration data; Environmental variable data: Five major categories of environmental variables including terrain, vegetation, climate, hydrology, and soil were selected as predictive factors.\r\n<p>&emsp;Terrain factor: Extracting altitude, slope, aspect, terrain humidity index, terrain position index, and terrain undulation based on digital elevation model (DEM).\r\n<p>&emsp;Vegetation/hydrological factors: Use Landsat 8 remote sensing products to extract normalized vegetation index (NDVI), enhanced vegetation index (EVI), and normalized water index (NDWI).\r\n<p>&emsp;Meteorological factors: Surface temperature (LST) and precipitation data are based on product data, and are used to calculate melting and freezing indices as key intermediate variable inputs.",
            "ds_quality": "<p>&emsp;The simulated thickness of the active layer in the entire watershed was obtained by fitting the function ALT=79.602 × w-0.41 between the measured thickness of the active layer and the weight moisture content (w), with the unit of cm. The maximum and minimum values of the simulated thickness of the active layer were 147cm and 62cm, respectively. Compared with the measured thickness of the active layer, the RMSE of the simulation result was approximately 30cm.",
            "ds_ref_way": "",
            "ds_abstract": "<p>&emsp;This dataset is the distribution data of permafrost burial depth in Northeast China, combined with measured drilling and pit exploration data. Driven by terrain factors, vegetation factors, meteorological factors, soil and hydrological factors, machine learning methods (random forest) are used to construct the model. The burial depth of permafrost is relatively small in the northern part of the Greater and Lesser Khingan Mountains, and relatively large in the southern and western parts of the Greater Khingan Mountains. The reliable accuracy enables this frozen soil distribution data to serve as a calibration benchmark and historical reference for simulating the depth distribution of permafrost in Northeast China under the background of global warming. The data format is GeoTIFF, with a spatial resolution of approximately 1km and a geographic coordinate system of WGS 1984.",
            "ds_time_res": "",
            "ds_acq_place": "Northeast China",
            "ds_space_res": "",
            "ds_projection": "",
            "ds_process_way": "<p>&emsp;Data preprocessing: Perform spatial registration and standardization on all multi-source raster data. The unified geographic coordinate system is WGS1984, and the spatial range is cropped to the boundary of the study area. The spatial resolution of all variables is uniformly downscaled to 1km using resampling techniques, and the format is unified as GeoTIFF to ensure strict spatial matching of multi-source data.\r\n<p>&emsp;Using ArcGIS' Extract Multi Values to Points feature, extract the environmental variable values corresponding to each sample point and construct a high-dimensional dataset of \"sample environment features\". The constructed sample dataset includes target variables and corresponding feature vectors. Perform integrity checks on the extracted results, eliminate samples containing missing values (NoData) or outliers, and ensure the quality of the input data for the model.\r\n<p>&emsp;CatBoost model construction: Stratified Random Sampling is used to divide the dataset into a training set (70%) and a testing set (30%). Build a random forest classification model based on the scikit learn machine learning library in Python environment. To address the issue of sample imbalance, set the class_ceight parameter to 'balanced'. Optimize key hyperparameters through grid search, and ultimately determine the number of decision trees (n_estimators) to be 1000, the maximum depth (x_depth), and the minimum number of samples for node splitting (min_stamples_split), and fix the random seed (random_state) to ensure the reproducibility of the results. Use environmental variables as feature inputs and frozen soil types as labels for model training.\r\n<p>&emsp;Using the CatBoost model to simulate the spatial distribution of soil moisture in the surface 60cm layer of permafrost in Northeast China, and then using the statistical relationship between 60cm soil moisture and active layer soil moisture to calculate the spatial distribution of active layer moisture, which is the weight moisture content. Finally, the thickness of the active layer in the entire watershed was simulated using the fitting function ALT=79.602 × w-0.41 between the measured thickness of the active layer and the weight moisture content (w), with units of cm. The maximum and minimum values of the simulated thickness of the active layer were 147cm and 62cm, respectively. Compared with the measured thickness of the active layer, the RMSE of the simulation result was approximately 30cm. The results show that the model has high accuracy.",
            "ds_ref_instruction": ""
        }
    },
    "submit_center_id": "ncdc",
    "data_level": 0,
    "recommendation_value": 0,
    "license_type": "https://creativecommons.org/licenses/by/4.0/",
    "doi_reg_from": "reg_local",
    "cstr_reg_from": "reg_local",
    "doi_not_reg_reason": null,
    "cstr_not_reg_reason": null,
    "is_paper_in_submitting": false,
    "ds_topic_tags": [
        "多年冻土",
        "活动层厚度",
        "重复含水率"
    ],
    "ds_subject_tags": [
        "地理学"
    ],
    "ds_class_tags": [],
    "ds_locus_tags": [
        "中国东北地区"
    ],
    "ds_time_tags": [
        2023,
        2024
    ],
    "ds_contributors": [
        {
            "true_name": "杜二计",
            "email": "duerji@lzb.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        },
        {
            "true_name": "赵林",
            "email": "lzhao@nuist.edu.cn",
            "work_for": "南京信息工程大学",
            "country": "中国"
        }
    ],
    "ds_meta_authors": [
        {
            "true_name": "杜二计",
            "email": "duerji@lzb.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        }
    ],
    "ds_managers": [
        {
            "true_name": "王翀",
            "email": "wangchong2022@nuist.edu.cn",
            "work_for": "南京信息工程大学",
            "country": "中国"
        }
    ],
    "category": "冻土"
}