Deep learning models have gained significant popularity in recent years, particularly in robotics and autonomous systems [1]. These models achieved good performance on tasks like object detection (OD) and segmentation on image data. However, the OD models usually require a large amount of labeled (or annotated) image data for training. The manual labeling process of custom environment data is time-consuming and requires significant human resources [2]. Although, there exist methods like Semi-supervised learning (SSL), and Active learning to predict the labels from less labeled trained data, but they still require human assistance to examine the generated dataset manually because of the false positive detections of the model.
Objectives:
- Improve the Object Detection model performance using custom environment data
- Remove the False Positive detections using clustering-based methodology to automate the process
Methodology
To improve the performance of OD models, it is crucial to remove false positives (FPs) from the training dataset, which becomes challenging to do manually in large datasets. To address this, a clustering-based method is proposed for automatic FP removal. The environment data is first collected using a camera and a 2D LiDAR sensor and stored in a Multi-Channel Audio Pack (MCAP) file. Object detection is performed on the image data using a YOLO model. By combining image-based bounding boxes with corresponding LiDAR points, the 3D positions of detected objects are estimated. To eliminate false positives, the DBSCAN clustering algorithm is applied to these positions, assuming that the largest cluster corresponds to the actual object, while outlier points are discarded. This filtered and refined dataset is then used to train the OD model, leading to more accurate and robust detection performance.
Experimental Validation
To illustrate the approach, a service robot called ‘Double 3’ from Double Robotics is considered to drive in the given environment and the object of interest for detection is another Double 3 robot. The YOLOv5n object detection model was chosen for training due to its efficiency on small devices. Initially, the robot collects data, including position, orientation, and video, during each drive session (called a "Job"), which is saved with timestamps. The recorded data after the first drive manually annotated, used for the OD model training and once the model is trained, it can be used to extract the datasets from the next drive job. The provided approach realizes the true location of the object in the environment by checking its appearance in consecutive frames. In other words, to finalize the object’s presence, one can check the number of detections in the estimated location. To explain clearly, the YOLOv5n network is used to detect the object (Double 3) and try to estimate the object’s position using raw, yaw, pitch angles of the observed robot relative to the detected object and multiply the distance with the observed object’s position. These estimated points can be clustered using DBSCAN density-based clustering algorithm. Finally, the maximum clustering points are considered as the true positions of the observed.
The approach presented has been successfully implemented and adapted to the given environment by removing the false positives in the extracted datasets autonomously after a few jobs/iterations. After each iteration, the trained model is deployed on the embedded device to validate its detection performance. However, there are still some challenges remaining in the provided approach. Currently, the approach is only validated on stable objects in the environment and while testing on moving objects, the methodology fails to capture some true predictions because of the clustering density. To overcome this challenge, the object tracking methodology has been suggested and it is still ongoing research. Further, the research can be helpful for Federated Learning to train OD models with different environment data and check the overall performance.
About me
Shivakrishna Karnati has began his Master’s in Applied Mathematics in 2021, during which he developed a strong interest in interpretable models, deep learning, and computer vision. Under the supervision of Prof. Dr.-Ing. Falk Langer, he recently completed his master’s thesis at IAV GmBH. Currently, He's expanding his research focus to include Large Language Models (LLMs) and their practical applications in industry. Outside of academics, he enjoys playing "Basketball" and an active member of the Mittweida Friday Club.