Detection And Tracking of Sphere Markers

Facebook Tweet Pin Email

When a person watches a video he/she could easily distinguish between objects and realize what they are. Our brain could even keep tracking different objects in real time. Until today nobody has discovered how our brain works in this purpose or how it learns to do such tasks that most advanced algorithms cannot nearly do the same job. By advancing technology, now computers can run same algorithms thousands of times faster than they used to do only two decades ago (Piguet (2018)), and reaching the goal of having performance as fast as human mind seems to be feasible. But speed is the most reachable feature. We are also interested to know how we perform tasks. Computers have different physical structure than humans. Even if we know how we think, it might not be possible to make computers learn the same way we do.

There are many reasons why tracking and detection of objects have high demands both in industry and research. But the goal is to let machines understand the surrounding environment like humans do. One very interesting and novel application that detects visual features to learn the environment and enhance the accuracy of GPS is VPS (Visual Positioning System) recently introduced by Kaware (2018) used in « Google Maps » and « Waymo » autonomous cars. We are at golden age of developing autonomous cars. The cars that can see obstacles, track them or even predict their behavior in the near future. Detection and tracking of each part of human body is what is needed to reconstruct the 3D model of it (Kazemi et al. (2013)). These results could be used for other applications such as tele-immersion where a 3D model of each individual is reconstructed in a virtual simulated environment such as a virtual conference. The main application we use to design our method and algorithm is detection and tracking of markers attached to shoulders, elbows and wrists of a physically impaired athlete while driving a three-wheel racing wheelchair. A Go-Pro forth generation camera is installed above the front wheel and captures at 240 fps with the resolution of 1280×720. In the verification chapter, we compare our method with three other well-known algorithms in the conditions that satisfies our main application.

Template Matching

Integrated HOG

Template matching was subject of much research since 1960s (Guo et al. (2019)). Most older techniques were based on correlation matching methods and are suitable for « whole-to-whole » template matching (Zhang et al. (2017)). In more general circumstances image matching is challenging because of complex backgrounds and noise. Zhang et al. (2017) proposed an HOG (Histogram of Oriented Gradients) patterns for the improved GPT (Global Projection Transformation) matching to gain the robustness against noise and background by using norm normalization. Investigates using the Graffiti dataset revealed that this suggested approach comparing with the original GPT correlation matching and the integration of Speeded Up Robust Features (SURF) feature descriptor and Random Sample Consensus (RANSAC) method is capable of an excellent matching (Zhang et al. (2017)). Moreover, the computational cost of the suggested approach decreased dramatically.

Memory Efficiency

Mun & Kim (2017) modified GHT (Generalized Hough Transform) to decrease its computational complexity and memory requirement and enhance its performance under rigid motion. In the suggested method, orientation and displacement are considered individually by using a multi-stage structure, and the displacement collector–that uses more memory than others–is downsampled without reducing detection accuracy. Additionally, an adaptive weight scheme is used to make the template position more trustworthy. Experimental test outcomes express that the suggested scheme has benefits in memory requirement and computational cost comparing with conventional GHT, and pose estimation becomes more steady.

Enhanced Normalized Cross Correlation

Pontecorvo & Redding (2017) discusses a particular example of non-periodic conversion symmetry. In this method they propose to observe consistent and overlapping regions of selfresemblance through a non-urban scene and proposed a method that automatically detects various poles, or their shadows. The approach does not depend on having a prior pole sample or knowledge of its precise size. By using normalized cross-correlation, analogous areas across the entire photographed picture are found. By estimating the size of the pole and its given or obtained alignment, the blobs could be refined. The suggested technique then shows a mutual amount of self-similarity between similar patches by clustering together all image patches that have commonly overlapping blobs. For non-urban areas, it is likely to detect identical or analogous poles. Experimental results on real aerial imagery show that this method with only small number of false alarms can potentially detect almost any pole , and performs with greater performance compared with state-of-art template-matching methods (Pontecorvo & Redding (2017)). The following limitations should be considered while applying the algorithm:

The algorithm to detect blobs and filter background objects must assume configurations of pole-shaped objects.
– The pole size should be assumed as given a priori in the main image and the sensor metadata.
– In case of few self-similar designated objects in the captured picture, the self similarity detection methods may not be able to perform as expected (The worst-case scenario)
– Ideally, minimum of 4 or 5 targets of interest should be visible in the image.
– There is no easy approach to sample an H ×W image densely , where HW correlation images have to be considered. It requires the total memory of (H × H ×W ×W) tensor should be stored. This requirement could be decreased by using a stride but it has to be chosen carefully to make sure the target is not ignored sinking among the sample pixels .

Pontecorvo & Redding (2017) have offered a different pole detector in airborne electro-optical imagery based on the idea of image-wide non-periodic translation symmetry. At first they computed the 4D tensors of thresholded correlation images for a subsample of image pixels by assuming that the camera inspecting geometry is known along with the estimated pole size. Secondly, they established a sequence of filtering steps to eliminate any blobs not shaped and oriented like the expected pole from the thresholded correlation pictures. In conclusion, a clustering technique is developed to collect all pixels with enough numbers of commonly overlying filtered blobs. They then showed that most of the poles available in the image are detectable by the largest clusters with just a few false alarms by using this method for the detection of poles in two airborne images of non-urban sights. Experiments show this method detected more targeted objects with fewer false alarms and it’s compared to state-of-art template matching method, which assume a target template is presented a priori.

Table des matières

INTRODUCTION
CHAPTER 1 LITERATURE REVIEW
1.1 Template Matching
1.1.1 Integrated HOG
1.1.2 Memory Efficiency
1.1.3 Enhanced Normalized Cross Correlation
1.2 Edge Detection
1.2.1 Canny Method
1.2.2 Customized Efficiency
1.3 Color Detection
1.3.1 Integration of Edge and Color
1.3.2 Multi-color Spaces Combination
1.4 Circle Detection
1.4.1 Gradient Hough Circle Transform
1.4.2 Threshold Segmentation
1.4.3 Reshaped Circles in Real 3D World
1.4.4 Soccer Robot Example
1.5 Corner Detection
1.5.1 3D Corner Neighborhood Estimation
1.5.2 Corner Detection And Template Matching Integration
1.5.3 Corner Detection in Curved Images
1.6 Tracking
1.6.1 Part-Base Tracker
1.6.2 3D Cloud Model
1.6.3 UAVs Collision Avoidance
1.6.4 Kernelized Correlation Filters
1.6.5 Extended KCF
1.7 Kalman Filter and Extended Kalman Filter
1.7.1 Kalman filter for tracking prediction and denoising
1.8 Conclusion
CHAPTER 2 TRACKING AND DETECTION OF A SPHERE MARKER
2.1 Introduction
2.2 Objective
2.3 Tracking
2.3.1 Introduction
2.3.2 Learning Tracker
2.3.2.1 Kernel Filters
2.3.2.2 Kernel Trick
2.3.3 Point Tracker
2.3.3.1 Results
2.4 Detection
2.4.1 Circle Hough Gradient
2.4.1.1 Error Function
2.4.1.2 Optimizing Function Accuracy
2.4.2 Color Detection
2.4.2.1 HSV vs RGB
2.4.3 Optimizing Selected Part
2.4.3.1 Conclusion and results
2.4.4 Verifying Lightning Endurance
CHAPTER 3 VERIFYING AND VALIDATION
3.1 Verifying True Detection
3.1.1 Cross Correlation
3.1.2 Sum of Absolute Differences
3.1.3 Correlation Coefficient
3.1.4 Conclusion and results
3.1.4.1 Voting Procedure
3.2 3D Positioning
3.2.1 Validation
3.3 Denoising And Prediction
3.3.1 Kalman Filter
3.3.2 Final Results
3.4 Verifying by Comparison
3.4.1 Experiment
3.4.2 Comparing Results
3.4.3 Conclusion And Results
CHAPTER 4 CONCLUSION