Given the scarcity of internationally recognized standardized RSCs (road surface conditions) datasets, particularly those documenting RSCs under extreme weather events, this study presents a comprehensive dataset on road conditions during snow and ice disasters. The dataset fills a critical gap in the field and provides valuable resources to enhance the performance and accuracy of RSCs recognition models.
Focusing specifically on RSCs under snow and ice disasters, the dataset is structured based on statistical analyses of the impact of extreme weather on traffic control, categorizing RSCs into three main types: icy roads, blowing snow, and heavy snowfall. Data sources include highway cameras, mobile devices, and online resources, resulting in a dataset that encompasses six typical RSCs: dry, snowy, icy, snow-blown, melting snow, and slippery roads.
In the data processing phase, to prevent potential correlations introduced by data augmentation that could affect the accuracy and reliability of model performance evaluation, a cautious approach was adopted. Initially, the raw dataset was divided into training, validation, and test sets, ensuring direct independence between these subsets. Subsequently, data augmentation operations, such as flipping, rotating, translating, and adding Gaussian noise, were applied separately to each subset to minimize data crossover effects caused by the sequence of augmentation steps. Following multiple augmentation strategies, the dataset was expanded to a total of 9,000 images.
To further improve the training efficiency and convergence speed of deep learning models, normalization of the dataset is recommended. A standard approach is to apply zero-mean and unit standard deviation normalization. The mean and standard deviation values for the dataset in the red, green, and blue channels are as follows: mean = [0.550, 0.565, 0.568], standard deviation = [0.082, 0.082, 0.085].
collect time | 2023/10/01 - 2024/10/01 |
---|---|
collect place | Highway cameras, mobile devices, and network resources |
data size | 300.1 MiB |
data format | |
Coordinate system |
data sources include highway cameras, mobile devices, and online resources
• Image Resizing: The images are resized to 224×224 pixels, a standard dimension commonly used in deep learning that strikes a balance between computational efficiency and model performance. This size is widely employed in models pretrained on ImageNet, such as VGG and ResNet, and has been proven effective in practical applications.
• Dataset Splitting: The dataset is randomly partitioned into training, validation, and test sets, with proportions of 60%, 20%, and 20%, respectively.
• Brightness Adjustment: Due to the complex and dynamic nature of road conditions, issues such as object occlusion and uneven lighting often lead to overexposed or underexposed areas in the images, which can obscure or blur critical details. Furthermore, these factors may cause different types of road conditions to appear similar, increasing the difficulty of recognition. To address these challenges, an adaptive correction algorithm based on a 2D gamma function is applied to adjust the lighting intensity of the images.
• Data Augmentation: Data augmentation is a crucial step in addressing class imbalance within the dataset, especially when certain categories have significantly fewer samples than others. By applying transformations such as flipping, rotating, cropping, scaling, and color adjustments, additional samples are generated. In this study, data augmentation is performed using the OpenCV and NumPy libraries. Techniques such as random flipping, random translation, random rotation, and the addition of Gaussian noise are employed, increasing the total number of images to 9,000.
• Data Normalization: To accelerate the convergence of the model during training, pixel values are normalized to have zero mean and unit standard deviation. The mean and standard deviation values for the dataset in the red, green, and blue channels are [0.550, 0.565, 0.568] and [0.082, 0.082, 0.085], respectively.
Performing data augmentation before splitting the dataset into training, validation, and test sets may introduce potential correlations between these subsets, thereby compromising the independence of the validation and test sets and affecting the accuracy and reliability of model performance evaluation. To address this issue, this study first divides the dataset into three independent subsets and then applies data augmentation to each subset separately, minimizing the cross-contamination effects that could arise from the sequence of augmentation steps.
# | number | name | type |
1 | ZKXFWCG2022060004 | other | |
2 | 2022-ZD-006 | other | |
3 | 2022YFF0711704 | the National Key R&D Program of China | National key R & D plan |
4 | KY2022041101 | other |
# | title | file size |
---|---|---|
1 | dataset.zip | 300.1 MiB |
©Copyright 2005-. Northwest Institute of Eco-Environment and Resources, CAS.
Donggang West Road 320, Lanzhou, Gansu, China (730000)