1 juin 2022
Speaker : David Cornu (Obs de Paris)
Title: Winning the SKA Science Data Challenge 2 with a fast Deep Learning object detector
Abstract:
With its 1 TB simulated data cube of HI line emission, the SKA Science Data Challenge 2 (SDC2) is getting closer to the difficulty of real upcoming SKA observation analysis. Even if the type of task to perform in the SKA SDCs are rather classical (detection, classification, parameter extraction, etc.) modern datasets have become heavily demanding for classical approaches due to size and dimensionality. It is not a surprise then, that many astronomers started to focus their work on Machine Learning approaches that demonstrated their efficiency in similar applications. However, hyperspectral images from astronomical interferometers are in fact very different from images used to train state-of-the-art pattern recognition algorithms, especially in terms of noise level, contrast, object size, class imbalance, spectral dimensionality, etc. As a direct consequence, these methods do not perform as well as expected when directly applied to astronomical datasets. In this context, the MINERVA (MachINe lEarning for Radioastronomy at obserVatoire de PAris) project has assembled a team to participate in the SDC2 with the objective of developing innovative Machine Learning methods that better suit the needs of astronomical images.
In this presentation, I will describe the work we have made on implementing a modern YOLO (You Only Look Once) CNN object detector inside our custom framework CIANNA (Convolutional Interactive Artificial Neural Networks by/for Astrophysicists) and describe the modifications and tuning that allowed us to reach the first place of the SKA SDC2. I will start by discussing the strengths and weaknesses of this type of method in comparison to more widely adopted Region-Based CNN detectors (Faster R-CNN, Mask R-CNN, …). I will also review the motivation and the effect of the numerous changes we made on the method (data quantization, 3D convolution, layer architecture, detection layout to manage blending, objectness decomposition, IoU selection, additional parameter inference, …) in order to apply it to both SDC1 and SDC2, and identify what are the present limits as well as some tracks for further improvements. I will detail the computational efficiency of the method (with GPU acceleration) and discuss its scaling capabilities for upcoming challenges or datasets. Finally, we will comment on how this methodology could be used to analyze the actual data from SKA pathfinders or any other similar astronomical dataset and how it could be used to merge knowledge and information from multiple datasets at the same time.