A Labeled Image Dataset for Deep Learning-Driven Rockfall Detection on the Moon and Mars

The term rockfall describes the rapid displacement of a large, usually meter–sized block of rock down–slope, triggered by, for example, endogenic or exogenic events like impacts, quakes or rainfall (Hungr et al., 2014; Xiao et al., 2013). In a remote sensing context, the term rockfall is also being used to describe the characteristic geomorphic deposit of a rockfall event that can be identified from an airor space–borne perspective, i.e., the combination of a displaced boulder and the track it carved into the slope substrate while bouncing, rolling, and sliding over the surface (also called “boulder with track” or “rolling boulder”) (Hovland and Mitchell, 1973; Filice, 1967; Moore, 1970) (see Figure 1). In planetary science, the spatial distribution and frequency of rockfalls provide insights into the global erosional state and activity of a planetary body (Bickel et al., 2020a; Tesson et al., 2020) while their tracks act as tools that allow for the remote estimation of the surface strength properties of yet unexplored regions in preparation of future ground exploration missions (Eggleston et al., 1968), such as the lunar pyroclastic (Bickel et al., 2019), polar sunlit (Bickel and Kring, 2020) and permanently shadowed regions of the Moon (Sargeant et al., 2020). Due to their small physical size (meters), the identification and mapping of rockfalls in planetary satellite imagery is challenging and very time–consuming, however. For this reason, Bickel et al. (2018) and Bickel et al. (2020b) trained convolutional neural networks to automate rockfall mapping in lunar and martian satellite imagery. Parts of the unpublished datasets used for earlier work have now been complemented with newly labeled data to create a well-balanced dataset of 2,822 lunar and martian rockfall labels (which we call “RMaM–2020”—Rockfall Mars Moon 2020, 416 MB in total, available here: https://edmond. mpdl.mpg.de/imeji/collection/DowTY91csU3jv9S2) that can be used for deep learning and other data science applications. Here, balanced means that the labels have been derived from imagery with a wide and continuous range of properties like spatial resolution, solar illumination, and others. So far, this dataset has been used to analyze the benefits of multi–domain learning on rockfall detector performance (Mars & Moon vs. Moon–only or Mars–only), but there are numerous other (non–planetary science) applications such as for featurization, feature or target recognition (aircraft/spacecraft autonomy), and data augmentation experiments.


Data Preprocessing and Labeling
The HiRISE images have subsequently been reduced to 8 bit (PTIF NAC images are distributed as 8 bit images already).Using QGIS (https://www.qgis.org/en/site/) a series of approximately 300 × 300 to 1000 × 1000 pixel-sized image tiles have been cropped from the full 109 lunar and 44 martian parent images; these tiles either contain rockfalls (positive tiles) or do not (negative tiles).For Mars, there are 484 positive tiles and 330 negative tiles, for the Moon there are 377 positive tiles and 317 negative tiles for training and testing.All positive tiles have then been processed in Matlab R2017a, where one experienced (with domain knowledge) human operator manually labeled all rockfalls with rectangular bounding boxes (bboxes)-a particularly time-expensive and tedious process.Some examples of these user-derived labels are showcased in Figure 1.A label always contains a boulder and its associated track to guarantee instance uniqueness, i.e., to avoid confusion with static boulders or track sections without boulders (see Figure 1).
The authors would like to note that, due to the complexity as well as the limited quality and spatial resolution of the available imagery, very small rockfalls could potentially be overlooked or confused with other geomorphic features such as static boulders with a sand tail that got deposited by aeolian processes (on Mars), especially in complex geomorphic contexts like talus cones.While this dataset represents a best effort approach, the potential occurrence of false positives (labels which describe non-rockfall features), false negatives (rockfall features in image tiles which have not been labeled) and duplicates in the dataset cannot be excluded completely.Particular care has been taken to remove all FPs and FNs from the designated testing label sets, to maximize the accuracy of model validation.

DATASET
This dataset has been tailor-made for a Python, Keras, and TensorFlow implementation of RetinaNet (Keras RetinaNet by Fizyr: https://github.com/fizyr/keras-retinanet,based on Lin et al. (2018)) and consists of positive and negative image tiles (.jp2 and .tif)as well as labels and classes (.csv).There are a total of 1,000 positive rockfall training labels per planetary domain and an additional 300 negative training tiles without any rockfalls that  Each label .csvfile consists of 6 column entries (each row represents one label), 1) the path to the image the label belongs to, 2-5) the upper left column and row, and the lower right column and row image coordinates of the label bounding box (the pythonian image 0-0 is the top left of the image), and 6) the label class (here: rockfall).After deployment of the dataset, the label paths need to be adapted to point to the correct image tile directory.Each classes.csv file contains 2 column entries, 1) the class (here rockfall) and 2) the class ID (here 0).The class ID can be adapted in case the data are used to train a multi-class detector.The.csv-file structure of RMaM-2020 meets the input criteria for RetinaNet, but can be easily adapted to fit to any other object detection framework.Similarly, all labels can be augmented using geometric operations, as long as the same operations are applied to the labels and the images.The dataset consists of 2 main folders which hold 8 sub-folders and a series of .jp2(HiRISE), .tif(NAC), .csvfiles, and one ReadMe file.

ANALYSIS
The RMaM-2020 labels cover a large range of spatial resolutions and solar illumination conditions.For Mars, RMaM-2020 contains images with spatial resolutions between 0.25 and 0.5 m/pixel as well as solar incidence and phase angles between 35 and 65 degrees.The solar phase angle describes the angle between the light that comes from the Sun and the light that is reflected into the camera (Sun-Planet-Satellite); the solar incidence angle describes the angle between the incoming light and the surface of a planet, where 0 degrees denote vertical and 90 degrees horizontal illumination conditions.For the Moon, RMaM-2020 contains images with a wider range of spatial resolutions (due to the nature of LRO's orbit and NAC's technical setup) between 0.5 and 2.2 m/pixel as well as a wider range of solar incidence and phase angles between 15 and 87 degrees.For both planetary domains, the pairings of differing spatial resolutions and solar incidence angle are relatively well-distributed, i.e., there are low-and high-resolution images for low-and high-incidence angle conditions.The ratio of label x and y dimensions (the bbox aspect ratio) is consistent for both planetary domains, reflecting the fact that both datasets have been labeled by the same human operator.The shape of each bounding box has been adapted to 1) the orientation of the respective boulder and track and 2) the local geologic background, to avoid that other, non-rockfall objects fall within the label (see Figure 1).The good alignment of absolute bbox dimensions across the two planetary domains is surprising, given that the lunar image data have slightly lower spatial resolutions than the martian image data; this could potentially indicate that either 1) martian rockfalls are slightly smaller on average or 2) that the human operator subconsciously increased the size of the bboxes when looking at lower resolution rockfall examples during labeling.The discussed properties of RMaM-2020 are visualized in Figure 2 and the IDs of all used HiRISE and NAC images are listed in Tables 1, 2, respectively.

FUTURE WORK
Our goal is to complement RMaM-2020 with more rockfall labels over the next years, not only from the Moon and Mars, but from other planetary bodies such as Ceres (NASA's Dawn Mission), comet 67P (ESA's Rosetta Mission), and from potential future datasets such as from ESA's BepiColombo Mission to Mercury.This way RMaM-2020 (and its successors) would remain a

FIGURE 1 |
FIGURE 1 | Examples of martian HiRISE (top) and lunar NAC (bottom) rockfall labels (bright boulder and elongated track); each example has been cropped from its respective positive tile using its label bounding box coordinates, as indicated in the figure.Tile (0-0) is at the top left (pythonian style).Unlabelled parts of each tile can act as negatives, as indicated in the shown tiles.

FIGURE 2 |
FIGURE 2 | Distribution of spatial resolutions of the imagery used for training and testing from Mars (MRO HiRISE, red) and the Moon (LRO NAC, gray) and visualizations of the relations between label dimensions (label aspect ratio), illumination condition (solar incidence and phase), and spatial resolution.The label aspect ratio varies as labels need to adapt to the geomorphic appearance and spatial orientation of rockfalls.