{
    "created": "2026-03-31 16:38:51",
    "updated": "2026-05-15 16:35:01",
    "id": "af27c5af-0c53-4d99-9968-258dabe55f03",
    "version": 6,
    "ds_topic": null,
    "title_cn": "大兴安岭东坡塔河地区卡马兰河流域30m多年冻土分布图（2023-2025年）",
    "title_en": "Distribution Map of 30m Permafrost in the Kamalan River Basin of Dongpo Tahe Area in Daxing'an Mountains (2023-2025)",
    "ds_abstract": "<p>&emsp;&emsp;本数据集为大兴安岭东坡塔河地区卡马兰河流域多年冻土分布数据，结合实测钻孔和坑探数据（多年冻土样本点347个，季节冻土样本点310个），以地形因子、植被因子、气象因子和土壤与热状况因子（共13个变量）数据为驱动，采用机器学习方法（随机森林）进行模型构建，模拟精度达到0.86。多年冻土主要分布在海拔较高的山地缓坡区域、丘陵及部分低洼地带。冻土分布呈岛状或片状。可靠的精度使得此冻土分布数据可以作为全球变暖背景下卡马兰河流域多年冻土模拟的标定基准和历史参考。",
    "ds_source": "<p>&emsp;&emsp;原始数据：实测钻孔和坑探数据：多年冻土样本点347个，季节冻土样本点310个；环境变量数据：选取了地形、植被、气候及土壤四大类共13个环境变量作为预测因子。\n<p>&emsp;&emsp;地形因子：基于数字高程模型（DEM）提取海拔、坡度、坡向及地形起伏度。\n<p>&emsp;&emsp;植被因子：利用MOD13A3遥感产品提取归一化植被指数（NDVI）；基于中国森林植被碳储量数据集获取地上及地下生物量数据。\n<p>&emsp;&emsp;气象因子：从ERA5-Land数据集提取春季降水；地表温度（GST）数据则基于实测钻孔及地表测温数据，预先通过随机森林模型模拟获取，作为关键中间变量输入。\n<p>&emsp;&emsp;土壤与热状况因子：整合计算融化指数、冻结指数、腐殖质厚度及土壤含水率。",
    "ds_process_way": "<p>&emsp;&emsp;数据预处理：对上述所有多源原始栅格数据进行空间配准与标准化处理。统一投影坐标系为 WGS_1984_Albers，将空间范围裁剪至研究区边界，并采用重采样技术将所有变量的空间分辨率统一降尺度至30 m，格式统一为GeoTIFF，确保多源数据在空间上的严格匹配。\n<p>&emsp;&emsp;利用ArcGIS的多值提取至点（Extract Multi-Values to Points）功能，提取每个样本点对应的13个环境变量数值，构建“样本-环境特征”高维数据集。构建的样本数据集包含目标变量（分类标签：1代表多年冻土，0代表季节冻土）及对应的特征向量。对提取结果进行完整性检查，剔除含有缺失值（NoData）或异常值的样本，确保模型输入数据的质量。\n<p>&emsp;&emsp;随机森林模型构建：采用分层随机抽样法（Stratified Random Sampling），将数据集划分为训练集（70%）和测试集（30%）。基于Python环境下的scikit-learn机器学习库构建随机森林分类模型。针对样本不平衡问题，将class_weight参数设为 'balanced'。通过网格搜索对关键超参数进行优化，最终确定决策树数量（n_estimators）为1000，最大深度（max_depth）及节点分裂最小样本数（min_samples_split）等参数，并固定随机种子（random_state）以保证结果的可重复性。将13个环境变量作为特征输入，冻土类型作为标签进行模型训练。",
    "ds_quality": "<p>&emsp;&emsp;本数据采用机器学习方法（随机森林）进行模型构建，计算混淆矩阵、总体准确率（Overall Accuracy）、精确率（Precision）、召回率（Recall）、F1-Score及Kappa系数。结果显示，模型具有较高的一致性（Kappa > 0.6），冻土分布模拟精度达到0.86。",
    "ds_acq_start_time": "2023-08-01 00:00:00",
    "ds_acq_end_time": "2025-10-31 00:00:00",
    "ds_acq_place": "大兴安岭东坡卡马兰河流域",
    "ds_acq_lon_east": 123.69111111111111,
    "ds_acq_lat_south": 51.75694444444444,
    "ds_acq_lon_west": 122.48611111111111,
    "ds_acq_lat_north": 52.46944444444445,
    "ds_acq_alt_low": null,
    "ds_acq_alt_high": null,
    "ds_share_type": "login-access",
    "ds_total_size": 879158,
    "ds_files_count": 3,
    "ds_format": "*.tif",
    "ds_space_res": "30m",
    "ds_time_res": "3年",
    "ds_coordinate": "WGS84",
    "ds_projection": "WGS_1984_Albers",
    "ds_thumbnail": "af27c5af-0c53-4d99-9968-258dabe55f03.png",
    "ds_thumb_from": 2,
    "ds_ref_way": "",
    "paper_ref_way": "",
    "ds_ref_instruction": "",
    "ds_from_station": null,
    "organization_id": "221ebf56-1b0b-4574-972b-1fb6d3cf1be7",
    "ds_serv_man": "敏玉芳",
    "ds_serv_phone": "0931-4967596",
    "ds_serv_mail": "ncdc@lzb.ac.cn",
    "doi_value": "",
    "subject_codes": [
        "170.45"
    ],
    "quality_level": 3,
    "publish_time": "2026-03-31 18:21:15",
    "last_updated": "2026-05-11 18:09:31",
    "protected": false,
    "protected_to": null,
    "lang": "zh",
    "cstr": "11738.11.NCDC.NIEER.DB7242.2026",
    "i18n": {
        "en": {
            "title": "Distribution Map of 30m Permafrost in the Kamalan River Basin of Dongpo Tahe Area in Daxing'an Mountains (2023-2025)",
            "ds_format": "*.tif",
            "ds_source": "<p>&emsp; &emsp; Raw data: actual drilling and pit exploration data: 347 permafrost sample points and 310 seasonally frozen soil sample points; Environmental variable data: Thirteen environmental variables, including terrain, vegetation, climate, and soil, were selected as predictive factors.\r\n<p>&emsp; &emsp; Terrain factor: Extracting altitude, slope, aspect, and terrain undulation based on digital elevation model (DEM).\r\n<p>&emsp; &emsp; Vegetation factor: Using MOD13A3 remote sensing products to extract normalized vegetation index (NDVI); Obtain aboveground and underground biomass data based on the Chinese forest vegetation carbon storage dataset.\r\n<p>&emsp; &emsp; Meteorological factors: Extracting spring precipitation from the ERA5 Land dataset; The surface temperature (GST) data is based on actual drilling and surface temperature measurement data, which are obtained in advance through a random forest model simulation and used as key intermediate variable inputs.\r\n<p>&emsp; &emsp; Soil and thermal condition factors: integrated calculation of melting index, freezing index, humus thickness, and soil moisture content.",
            "ds_quality": "<p>&emsp; &emsp; This data is modeled using machine learning methods (random forest) to calculate confusion matrix, overall accuracy, precision, recall, F1 Score, and Kappa coefficient. The results show that the model has high consistency (Kappa>0.6), and the simulation accuracy of frozen soil distribution reaches 0.86.",
            "ds_ref_way": "",
            "ds_abstract": "<p>&emsp; &emsp; This dataset is the distribution data of permafrost in the Kamalan River Basin of the Tahe area on the east slope of the Daxing'an Mountains. It combines actual drilling and pit exploration data (347 permafrost sample points and 310 seasonal permafrost sample points), and is driven by terrain factors, vegetation factors, meteorological factors, and soil and thermal conditions factors (a total of 13 variables). The model is constructed using machine learning methods (random forest), and the simulation accuracy reaches 0.86. Permafrost is mainly distributed in high-altitude mountainous gentle slopes, hills, and some low-lying areas. The distribution of frozen soil is island shaped or patchy. The reliable accuracy enables this frozen soil distribution data to serve as a calibration benchmark and historical reference for simulating permafrost in the Kamaran River Basin under the background of global warming.",
            "ds_time_res": "",
            "ds_acq_place": "Kamaran River Basin on the East Slope of Daxing'an Mountains",
            "ds_space_res": "",
            "ds_projection": "",
            "ds_process_way": "<p>&emsp; &emsp; Data preprocessing: Perform spatial registration and standardization on all multi-source raw raster data mentioned above. The unified projection coordinate system is WGS1984_ Albers, and the spatial range is cropped to the boundary of the study area. The spatial resolution of all variables is uniformly downscaled to 30 m using resampling techniques, and the format is unified as GeoTIFF to ensure strict spatial matching of multi-source data.\r\n<p>&emsp; &emsp; Using ArcGIS' Extract Multi Values to Points feature, extract 13 environmental variable values corresponding to each sample point and construct a high-dimensional dataset of \"sample environment features\". The constructed sample dataset includes the target variables (classification labels: 1 represents permafrost, 0 represents seasonal permafrost) and their corresponding feature vectors. Perform integrity checks on the extracted results, eliminate samples containing missing values (NoData) or outliers, and ensure the quality of the input data for the model.\r\n<p>&emsp; &emsp; Random Forest Model Construction: Stratified Random Sampling is used to divide the dataset into a training set (70%) and a testing set (30%). Build a random forest classification model based on the scikit learn machine learning library in Python environment. To address the issue of sample imbalance, set the class_ceight parameter to 'balanced'. Optimize key hyperparameters through grid search, and ultimately determine the number of decision trees (n_estimators) to be 1000, the maximum depth (x_depth), and the minimum number of samples for node splitting (min_stamples_split), and fix the random seed (random_state) to ensure the reproducibility of the results. Train the model with 13 environmental variables as feature inputs and frozen soil types as labels.",
            "ds_ref_instruction": ""
        }
    },
    "submit_center_id": "ncdc",
    "data_level": 0,
    "recommendation_value": 0,
    "license_type": "https://creativecommons.org/licenses/by/4.0/",
    "doi_reg_from": "reg_local",
    "cstr_reg_from": "reg_local",
    "doi_not_reg_reason": null,
    "cstr_not_reg_reason": null,
    "is_paper_in_submitting": false,
    "ds_topic_tags": [
        "多年冻土",
        "分布",
        "卡马兰河流域"
    ],
    "ds_subject_tags": [
        "地理学"
    ],
    "ds_class_tags": [],
    "ds_locus_tags": [
        "大兴安岭东坡卡马兰河流域"
    ],
    "ds_time_tags": [
        2023,
        2024,
        2025
    ],
    "ds_contributors": [
        {
            "true_name": "臧淑英",
            "email": "zsy6311@163.com",
            "work_for": "哈尔滨师范大学",
            "country": "中国"
        }
    ],
    "ds_meta_authors": [
        {
            "true_name": "郭殿繁",
            "email": "dfguo@hrbnu.edu.cn",
            "work_for": "哈尔滨师范大学",
            "country": "中国"
        },
        {
            "true_name": "陈梦瑶",
            "email": "cmy_543@163.con",
            "work_for": "哈尔滨师范大学",
            "country": "中国"
        },
        {
            "true_name": "余江涛",
            "email": "yujiangtao23@163.com",
            "work_for": "哈尔滨师范大学",
            "country": "中国"
        }
    ],
    "ds_managers": [
        {
            "true_name": "孙丽",
            "email": "sunli_wabb@163.com",
            "work_for": "哈尔滨师范大学",
            "country": "中国"
        }
    ],
    "category": "冻土"
}