A major hurdle for automated approaches is the training and validation of machine learning models. These models use massive amounts of data to find patterns that can be used to make predictions or classifications. Image classification models, such as convolutional neural networks, need even more data than "traditional" machine learning approaches. Datasets such as MNIST and CIFAR are used to train these models, and contain 10s of thousands of images. At the moment, there is no comparable dataset for storm morphology. Initially, we are going to provide a limited dataset of labels on which to test your down machine learning models. We hope that with crowd-sourcing the classification process, we can greatly expand the size of the dataset.
The initial sample data are available below. We use a train/validation/test split of ~70% / ~10% / ~20%, respectively, and the data are organized as follows:
You can browse image thumbnails for each class at the following links:
These will be posted shortly, along with examples.
Radar images are centered on SPC severe weather reports and extracted from the closest hourly data in GridRad which can be downloaded from the Research Data Archive. The original ~2x2km 3D data are converted to 2D by calculating the column maximum reflectivity. These values are then converted to 8-bit integers and interpolated to a 3.75 km Lambert conformal conic grid using nearest neighbor. The 136 x 136 dimensions result in a region approximately 512 x 512 km.
Each image is assigned to one of six classes. These classes and their descriptions are as follows:
Class Name | Class Description |
---|---|
Cellular | Circular areas of red and orange near the center of the image.
|
QLCS | Continuous red and orange lines that intersect the center of the image.
|
Tropical | Green and yellow lines that appear to circle around the bottom or left edge of the image.
|
Other | Morphologies that do not obviously fit into one of the previous three classes.
|
Noise | Low intensity rings, spikes, or pixelation that does not look natural.
|
Missing | The entire image or the majority of the image is blue (i.e., missing intensity).
|
The data are provided at no cost, as-is, with no warranty of any kind. No modification of either the SPC reports or the GridRad data (beyond interpolation) is done before these data are hosted on the website. The process is completely repeatable from start to finish, assuming you have patience or access to a supercomputer cluster. Please examine the GridRad and the SPC severe weather reports pages to read about the caveats and issues with those data before using these data. See the data page for more information.
We are generating these data solely because we think they would be of interest to the meteorology and climatology community. That being said, we would like to get some credit if you find them useful!
Haberlie, A. M., W. S. Ashley, and M. Karpinski, 2020: Mean storms: Composites of radar reflectivity images during two decades of severe thunderstorm events. International Journal of Climatology, In Press.
Bowman, K. P., and C. R. Homeyer. 2017. GridRad - Three-Dimensional Gridded NEXRAD WSR-88D Radar Data. Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory.