{
    "created": "2026-03-31 19:20:35",
    "updated": "2026-05-16 05:19:22",
    "id": "a8f959f0-bcd8-4049-af4a-c7b83693c1ae",
    "version": 4,
    "ds_topic": null,
    "title_cn": "大兴安岭西坡额尔古纳地区根河流域30m多年冻土分布图（2023-2025年）",
    "title_en": "Distribution map of 30m permafrost in the Genhe River Basin of the Erguna area on the western slope of the Greater Khingan Range (2023-2025)",
    "ds_abstract": "<p>&emsp;&emsp;本数据集为大兴安岭西坡额尔古纳地区根河流域多年冻土分布数据，结合实测钻孔和坑探数据，以地形因子、植被因子、气象因子和土壤与热状况因子数据为驱动，采用机器学习方法（随机森林）进行模型构建。多年冻土主要分布在海拔较高的山地缓坡区域、丘陵及部分低洼地带。冻土分布呈岛状或片状。可靠的精度使得此冻土分布数据可以作为全球变暖背景下根河流域多年冻土模拟的标定基准和历史参考。数据格式为GeoTIFF，空间分辨率约30 m，投影为WGS_1984_Albers。",
    "ds_source": "<p>&emsp;&emsp;原始数据：实测钻孔和坑探数据；环境变量数据：选取了地形、植被、气候及土壤四大类环境变量作为预测因子。\n<p>&emsp;&emsp;地形因子：基于数字高程模型（DEM）提取海拔、坡度、坡向及地形起伏度。\n<p>&emsp;&emsp;植被因子：利用MOD13A3遥感产品提取归一化植被指数（NDVI）。\n<p>&emsp;&emsp;气象因子；地表温度（GST）数据则基于实测钻孔及地表测温数据，预先通过随机森林模型模拟获取，作为关键中间变量输入。\n<p>&emsp;&emsp;土壤与热状况因子：整合计算融化指数、冻结指数、腐殖质厚度及土壤含水率。",
    "ds_process_way": "<p>&emsp;&emsp;数据预处理：对上述所有多源栅格数据进行空间配准与标准化处理。统一投影坐标系为 WGS_1984_Albers，将空间范围裁剪至研究区边界，并采用重采样技术将所有变量的空间分辨率统一降尺度至30 m，格式统一为GeoTIFF，确保多源数据在空间上的严格匹配。利用ArcGIS的多值提取至点（Extract Multi-Values to Points）功能，提取每个样本点对应的环境变量数值，构建“样本-环境特征”高维数据集。构建的样本数据集包含目标变量（分类标签：1代表多年冻土，0代表季节冻土）及对应的特征向量。对提取结果进行完整性检查，剔除含有缺失值（NoData）或异常值的样本，确保模型输入数据的质量。\n<p>&emsp;&emsp;随机森林模型构建：采用分层随机抽样法（Stratified Random Sampling），将数据集划分为训练集（70%）和测试集（30%）。基于Python环境下的scikit-learn机器学习库构建随机森林分类模型。针对样本不平衡问题，将class_weight参数设为 'balanced'。通过网格搜索对关键超参数进行优化，最终确定决策树数量（n_estimators）为1000，最大深度（max_depth）及节点分裂最小样本数（min_samples_split）等参数，并固定随机种子（random_state）以保证结果的可重复性。将环境变量作为特征输入，冻土类型作为标签进行模型训练。",
    "ds_quality": "<p>&emsp;&emsp;计算混淆矩阵、总体准确率（Overall Accuracy）、精确率（Precision）、召回率（Recall）、F1-Score及Kappa系数。结果显示，模型具有较高的一致性。",
    "ds_acq_start_time": "2023-08-01 00:00:00",
    "ds_acq_end_time": "2025-10-31 00:00:00",
    "ds_acq_place": "大兴安岭西坡额尔古纳地区根河流域",
    "ds_acq_lon_east": 122.70277777777778,
    "ds_acq_lat_south": 49.92916666666667,
    "ds_acq_lon_west": 119.28333333333333,
    "ds_acq_lat_north": 51.282777777777774,
    "ds_acq_alt_low": null,
    "ds_acq_alt_high": null,
    "ds_share_type": "login-access",
    "ds_total_size": 252887406,
    "ds_files_count": 3,
    "ds_format": "*.tif",
    "ds_space_res": "30m",
    "ds_time_res": "3年",
    "ds_coordinate": "WGS84",
    "ds_projection": "WGS_1984_Albers",
    "ds_thumbnail": "a8f959f0-bcd8-4049-af4a-c7b83693c1ae.jpg",
    "ds_thumb_from": 0,
    "ds_ref_way": "",
    "paper_ref_way": "",
    "ds_ref_instruction": "",
    "ds_from_station": null,
    "organization_id": "221ebf56-1b0b-4574-972b-1fb6d3cf1be7",
    "ds_serv_man": "敏玉芳",
    "ds_serv_phone": "0931-4967596",
    "ds_serv_mail": "ncdc@lzb.ac.cn",
    "doi_value": "",
    "subject_codes": [
        "170.45"
    ],
    "quality_level": 3,
    "publish_time": "2026-03-31 19:29:31",
    "last_updated": "2026-05-11 18:53:44",
    "protected": false,
    "protected_to": null,
    "lang": "zh",
    "cstr": "11738.11.NCDC.NIEER.DB7246.2026",
    "i18n": {
        "en": {
            "title": "Distribution map of 30m permafrost in the Genhe River Basin of the Erguna area on the western slope of the Greater Khingan Range (2023-2025)",
            "ds_format": "*.tif",
            "ds_source": "<p>&emsp; &emsp; Raw data: measured drilling and pit exploration data; Environmental variable data: Four major categories of environmental variables including terrain, vegetation, climate, and soil were selected as predictive factors.\r\n<p>&emsp; &emsp; Terrain factor: Extracting altitude, slope, aspect, and terrain undulation based on digital elevation model (DEM).\r\n<p>&emsp; &emsp; Vegetation factor: Extract normalized vegetation index (NDVI) using MOD13A3 remote sensing products.\r\n<p>&emsp; &emsp; Meteorological factors; The surface temperature (GST) data is based on actual drilling and surface temperature measurement data, which are obtained in advance through a random forest model simulation and used as key intermediate variable inputs.\r\n<p>&emsp; &emsp; Soil and thermal condition factors: integrated calculation of melting index, freezing index, humus thickness, and soil moisture content.",
            "ds_quality": "<p>&emsp; &emsp; Calculate confusion matrix, Overall Accuracy, Precision, Recall, F1 Score, and Kappa coefficient. The results show that the model has high consistency.",
            "ds_ref_way": "",
            "ds_abstract": "<p>&emsp; &emsp; This dataset is the distribution data of permafrost in the Genhe River Basin of the E'erguna area on the western slope of the Greater Khingan Range. Combined with measured drilling and pit exploration data, the model is constructed using machine learning methods (random forest) driven by terrain factors, vegetation factors, meteorological factors, and soil and thermal condition factors. Permafrost is mainly distributed in high-altitude mountainous gentle slopes, hills, and some low-lying areas. The distribution of frozen soil is island shaped or patchy. The reliable accuracy enables this frozen soil distribution data to serve as a calibration benchmark and historical reference for simulating permafrost in the Genhe River Basin under the background of global warming. The data format is GeoTIFF, with a spatial resolution of approximately 30m and a projection of WGS1984_ Albers.",
            "ds_time_res": "",
            "ds_acq_place": "Genhe River Basin in the Erguna area on the western slope of the Greater Khingan Range",
            "ds_space_res": "",
            "ds_projection": "",
            "ds_process_way": "<p>&emsp; &emsp; Data preprocessing: Perform spatial registration and standardization on all multi-source raster data mentioned above. The unified projection coordinate system is WGS1984_ Albers, and the spatial range is cropped to the boundary of the study area. The spatial resolution of all variables is uniformly downscaled to 30 m using resampling techniques, and the format is unified as GeoTIFF to ensure strict spatial matching of multi-source data. Using ArcGIS' Extract Multi Values to Points feature, extract the environmental variable values corresponding to each sample point and construct a high-dimensional dataset of \"sample environment features\". The constructed sample dataset includes the target variables (classification labels: 1 represents permafrost, 0 represents seasonal permafrost) and their corresponding feature vectors. Perform integrity checks on the extracted results, eliminate samples containing missing values (NoData) or outliers, and ensure the quality of the input data for the model.\r\n<p>&emsp; &emsp; Random Forest Model Construction: Stratified Random Sampling is used to divide the dataset into a training set (70%) and a testing set (30%). Build a random forest classification model based on the scikit learn machine learning library in Python environment. To address the issue of sample imbalance, set the class_ceight parameter to 'balanced'. Optimize key hyperparameters through grid search, and ultimately determine the number of decision trees (n_estimators) to be 1000, the maximum depth (x_depth), and the minimum number of samples for node splitting (min_stamples_split), and fix the random seed (random_state) to ensure the reproducibility of the results. Use environmental variables as feature inputs and frozen soil types as labels for model training.",
            "ds_ref_instruction": ""
        }
    },
    "submit_center_id": "ncdc",
    "data_level": 0,
    "recommendation_value": 0,
    "license_type": "https://creativecommons.org/licenses/by/4.0/",
    "doi_reg_from": "reg_local",
    "cstr_reg_from": "reg_local",
    "doi_not_reg_reason": null,
    "cstr_not_reg_reason": null,
    "is_paper_in_submitting": false,
    "ds_topic_tags": [
        "多年冻土",
        "大兴安岭",
        "根河流域",
        "冻土分布"
    ],
    "ds_subject_tags": [
        "地理学"
    ],
    "ds_class_tags": [],
    "ds_locus_tags": [
        "大兴安岭西坡额尔古纳地区根河流域"
    ],
    "ds_time_tags": [
        2023,
        2024,
        2025
    ],
    "ds_contributors": [
        {
            "true_name": "胡国杰",
            "email": "huguojie123@lzb.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        },
        {
            "true_name": "邹德富",
            "email": "defuzou@lzb.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        },
        {
            "true_name": "刘广岳",
            "email": "liuguangyue@lzb.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        },
        {
            "true_name": "肖瑶",
            "email": "xiaoyao@lzb.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        },
        {
            "true_name": "杜二计",
            "email": "duerji@lzb.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        },
        {
            "true_name": "赵拥华",
            "email": "zhaoyonghua@lzb.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        }
    ],
    "ds_meta_authors": [
        {
            "true_name": "肖瑶",
            "email": "xiaoyao@lzb.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        }
    ],
    "ds_managers": [
        {
            "true_name": "肖瑶",
            "email": "xiaoyao@lzb.ac.cn",
            "work_for": "中国科学院西北生态环境资源研究院",
            "country": "中国"
        }
    ],
    "category": "冻土"
}