Aggregating local descriptors into a compact image representation,
Herve Jegou , CVPR'10
The goal of this work is to address the problem of searching the most similar images in a very large image database (million scale) under the joint optimization of three constraints: the search accuracy, its efficiency and the memory usage.
The evaluation shows that their search ac-curacy is comparable to the bag-of-features approach for an image representation that fits in 20 bytes. Searching a 10 million image dataset takes about 50ms.
The whole procedure can be generalized to three stages as following,
Aggregate local image descriptors into a vector representation
They propose a descriptor, derived from both BOF and Fisher kernel, that aggregates SIFT descriptors and produces a compact representation, termed VLAD.
Similar to BOF feature, the idea of the VLAD descriptor is to accumulate, for each visual word ci, the differences x−ci of the vectors x assigned to ci. (where ci is the center formed by k-means algorithm)
Dimensionality reduction of these vectors
By using projection method (PCA) to reduce dimension of feature vector and then represent each vector by quantized version of ADC (asymmetric distance computation, a method referenced from other’s work) method.
The indexing algorithm
After the quantized code word construction, inverted file is then built, termed IVFADC, search will base on this as in traditional IR task.
Comments:
VLAD feature outperforms BOF for the same size, as the computation is more efficient and the memory cost is less after dimension reduction of PCA procedure.
After dimension reduction, mAP drop most by VLAD method in comparison with BOF and Fisher alone.
沒有留言:
張貼留言