J. Wang et al, "Semi-Supervised Hashing for Scalable Image Retrieval," CVPR, 2010
Hashing methods are used to deal the problems like search in large scale data, similar contents retrieval, etc.
Hashing methods can be divided into two main categories: unsupervised methods and supervised methods. While unsupervised methods do not require any labeled data and their parameters are easy to learn, in some vision problem, similarity can’t just be defined by simple metric. In other words, semantic similarity may not be preserved by the image descriptor under the similarity metric. These problem can be solved by using supervised hashing method, but all kind of supervised based learning process tend to be much slower than unsupervised method.
This paper proposed a novel idea to do semi-supervised Hash learning.
The idea is to label a fraction of the given pairs of data into two categories, M and C.
The goal of SSH is to learn hash functions that minimize the error on the labeled training data, while maximally satisfying the desirable properties of hashing e.g., independence of bits and balanced partitioning.
Basic Formulation
The formulation is as the following, given a vector wk∈RD , the kth hash function is defined as
bk = 0 since data X can be normalized to have zero mean.
One can get the corresponding binary bit as,
They define the following objective function to measure the empirical accuracy on the labeled data for a family of hash functions H:
By defining a matrix S ∈ RLXL, (L is the number of data points)
the above formulation can be expressed as a compact form:
And with the idea that each hash function should be able to separate the dataset into two parts, the objective function is subject to additional constraint.
Relaxing Objective Function
The concept of relaxing the objective function is quite intuitive, just replace the “same sign” criteria with signed magnitude, which means it not only desires similar points to have the same sign but also large projection magnitudes, meanwhile projecting dissimilar points not only with different signs but also as far as possible.
And the new objective function can be rewritten as a function of W,
And since it’s hard to ensure perfectly balanced partitions, they replace the hard balancing constraint by a “soft” constraint maximizing the variance of the bits.
And with some inference, they derived more compact form, since the data are pre-normalized and have zero-mean. Then the overall objective function becomes
To summarize, The first (supervised) term is the empir-ical accuracy of the learned hash functions on the pairwise labeled data, while the second (unsupervised) term is a regularizer that prefers those directions that maximize the variance of the projections subject to orthogonality constraints.
Relaxing Orthogonality Constraints
Since the orthogonality constraints force one to progressively pick those directions that have very low variance, substantially reducing the quality of lower bits. Depending on the application, it may make sense to pick a direction that is not necessarily orthogonal to the previous directions but has higher variance as well as low empirical error on the labeled set.
So instead of imposing hard orthogonality constraints, they convert them into a penalty term added to the objective function, so the objective function finally becomes
And can be solved by some characteristic of matrix operation, I’ll just skip those detail here.
This work proposed the novel idea that can reach certain accuracy with small bit hamming distance. Compare with the state-of-art methods like LSH (Locality Sensitive Hashing), SH (Spectral Hashing) and RBMs (Restricted Boltzmann Machines), SSH has better performance with little bits needed, also less training time in comparison with these method, especially RBMs.
While the data scale will become extremely large in the future, certain method like this will become very influential, and the formulation process may be used in other similar works.
Experimental results on one million gist dataset from the paper.















