“ Distinctive Image Features from Scale-Invariant Keypoints, “
Lowe IJCV 2004
The feature presented in this paper is well known as SIFT, which is widely used in matching task between objects in computer vision community. This local feature descriptor achieves invariance to scale, rotation and some degree of affine transformation.
The whole method can be generalized to four stages as following,
1. Scale-space extrema detection
It is implemented efficiently by using adifference-of-Gaussian function (consecutive Gaussian kernels differed by k*σ ) to identify potential interest points that are invariant to scale and orientation.
2. Keypoint localization
Selecting extrema locations from previous stage by checking pixel value with eight neighbors as shown in the following figure.
Later filtering process is taken based on stability of the point, eliminating noise and edge response.
3. Orientation assignment
An orientation histogram is formed from the gradient orientations of sample pointswithin a region around the keypoint. The orientation histogram has 36 bins covering the 360 degree range of orientations. Peaks in the orientation histogram correspond to dominant direction and the others are rotated based on the dominant direction, so that the rotation invariance is attained.
Local peak that is within 80% of the highest peak is used to also create a keypoint with that orientation. If there are multiple orientations with similar magnitude, the keypoint is duplicated with different orientation assignments.
4. Keypoint descriptor
A keypoint descriptor is created by first computing the gradient magnitude and orientation at each image sample point in a region around the keypoint location, as shown on the left. These are weighted by a Gaussian window, indicated by the overlaid circle. These samples are then accumulated into orientation histograms summarizing the contents over4x4 sub-regions, as shown on the right, with the length of each arrow corresponding to the sum of the gradient magnitudes near that direction within the region.
This figure shows a 2x2 descriptor array computed from an 8x8 set of samples, whereas the experiments in this paper use 4x4 descriptors computed from a 16x16 sample array.
Comments:
Merits
- The keypoint descriptors are highly distinctive, which allows a single feature to find its correct match with good probability in a large database of features.
- Most matching tasks are based on the Harris corner detector that is very sensitive to changes in image scale, while SIFT conquer this thorny problem.
Defects
u When facing object recognition problem with similar characteristic on different object, the performance of SIFT feature will decline.
u All the parameters setting are based on experiment result, not derivation of optimal solution.



沒有留言:
張貼留言