ObjectiveAs a basic branch of computer vision, object detection plays an important role in subsequent tasks such as image segmentation and object tracking. It aims to find all the objects in the image and determine the location and category of the objects. It is used in industrial testing and has profound and extensive applications in aerospace, autonomous driving, and other fields. Aircraft detection in remote sensing images is of great significance to both military and civilian fields such as air traffic control and battlefield dynamic monitoring. As a result of the large differences in object size in remote sensing aircraft images, the acquisition process is affected by factors such as lighting and occlusion, resulting in similar characteristics of different types of aircraft, poor detection of small objects, and the inability to achieve fine-grained distinction within categories. In object detection, the loss function is used to measure the difference between the model prediction and the actual object, which directly affects the performance and convergence speed of the model. Adjusting the model parameters so that the value of the loss function reaches the minimum value can improve the accuracy of the model in the test set. The loss function of YOLOv5 consists of position loss, category loss and confidence loss. YOLOv5 uses the intersection over union (IoU) and the derivative algorithm complete IoU by default, and provides IoU, generalized IoU, and distance IoU for replacement. However, for small object detection, especially with anchor box-based algorithms such as YOLOv5, the IoU series indicators cannot meet application needs well. Different types of remote sensing aircraft have fine-grained characteristics, which are reflected in subtle differences between classes, large differences within classes, and detail accuracy within classes. For fine-grained recognition tasks, extracting local information is crucial. The feature fusion module PANet used by YOLOv5s cannot achieve global feature fusion and is not conducive to extracting fine-grained features. To solve the above problems, this article proposes a model improvement algorithm based on YOLOv5s.MethodIn view of the shortcomings of IoU in small object detection based on YOLOv5, this article introduces Gaussian Wasserstein distance into the calculation of bounding box overlap to improve the detection performance of the network. Different from the IoU series of algorithms that calculate the similarity between different prediction boxes and real boxes based on the set of pixels contained in the bounding box, the Gaussian Wasserstein distance abandons the set, models the bounding box as a two-dimensional Gaussian distribution, and proposes a new metric called normalized Gaussian Wasserstein distance to calculate the similarity between frames, which fundamentally solves the problem of IoU in small object detection based on YOLOv5. In response to PANet’s shortcomings in fine-grained detection, this article introduces the gather-and-distribute feature aggregation module in Gold-YOLO into YOLOv5s to enhance the YOLOv5s network’s ability to extract fine-grained features through convolution and self-attention mechanisms. 1) The method combining Gaussian Wasserstein distance and traditional IoU is used to improve the loss function of YOLOv5s. 2) The gather-and-distribute feature aggregation module is introduced in the neck part of YOLOv5s to enhance the network’s local feature extraction capabilities. Through the above two methods, the overall detection accuracy is improved. To test the advantages of this algorithm in fine-grained and small object recognition on military aircraft, this paper uses the remote sensing aircraft fine-grained classification dataset MAR20 and the remote sensing aircraft small object dataset CORS-ADD to conduct experiments. In the field of remote sensing military aircraft identification, different types of aircraft often have similar characteristics, resulting in different types of aircraft having similar characteristics, making it difficult to achieve intra-class identification. This article uses the open-source object detection remote sensing image dataset military aircraft recognition 20(MAR20) to achieve fine-grained recognition of remote sensing military aircraft. The dataset contains a total of 3 842 images, including 20 military aircraft models (SU-35, C-130, C-17, C-5, F-16, TU-160, E-3, B-52, P-3C, B-1B, E-8, TU-22, F-15, KC-135, F-22, FA-18, TU-95, KC-10, SU-34, SU-24). The CORS-ADD dataset is a complex optical remote sensing aircraft small object dataset that is manually annotated and constructed by the Space Optical Engineering Research Center of Harbin Institute of Technology. It contains a total of 7 337 images, including 32 285 aircraft instances, and the object size ranges from 4 × 4 pixels to 240 × 240 pixels. Different from the single data source of previous remote sensing datasets, the CORS-ADD dataset comes from satellite platforms such as Google Maps, WorldView-2, WorldView-3, Pleiades, Jilin-1, and IKONOS, covering airports, aircraft carriers, oceans, land, and other scenarios, as well as aircraft objects such as bombers, fighter jets, and early-warning aircraft at typical airports in China and the United States.ResultTo test the algorithm improvement effect of the two improved modules on remote sensing aircraft recognition based on YOLOv5s, this article compares the model performance of the original YOLOv5s with the introduction of normalized Gaussian Wasserstein distance(NWD) (
r is the weight parameter used to adjust the ratio of IoU and NWD) and GD. The experimental result shows that the introduction of NWD and GD can improve the recognition accuracy to varying degrees, and the improvements are effective. When the ratio of IoU to NWD is 1:1, the recognition effect of the MAR20 dataset is the best; when the ratio of IoU to NWD is 1:9, the recognition effect of the CORS-ADD dataset is the best. Experimental results show the following: For the MAR20 dataset, compared with that of YOLOv5s, YOLOv8s, and Gold-YOLO, the mAP of improved YOLOv5s increased by 1.1%, 0.7% and 1.8% respectively; for the CORS-ADD dataset, mAP increased by 0.6%, 1.7%, and 3.9%, respectively.ConclusionAn improved YOLOv5s network is proposed to solve the problems of large object size differences and high intra-class similarity in the process of remote sensing aircraft image recognition. On the basis of YOLOv5s, the loss function of YOLOv5s is improved by combining the Gaussian Wasserstein distance with the traditional IoU metric, which improves the detection effect of objects of different sizes, thereby improving the detection accuracy of the model. At the same time, to solve the problem of the characteristics of different types of aircraft being similar and the difficulty of distinguishing between sub-categories, this article uses the gather-and-distribute feature aggregation module in Gold-YOLO to enhance the ability of the YOLOv5s network to extract fine-grained features. A comparison shows that the improved YOLOv5s has a better model detection accuracy than that of YOLOv5s, YOLOv8s, Gold-YOLO, and Faster R-CNN. To improve the image processing speed of the model without reducing the accuracy of the model and to reduce the consumption of computing resources as much as possible to achieve lightweight deployment in the future, this article will consider using the C3_DSConv network to replace the C3 network of the YOLOv5s detection part to improve the model check speed and make it lightweight.… …
相似文献