2012年4月27日 星期五

[Paper Summary Lec_08-09] -1 Nonlinear dimensionality reduction by locally linear embedding

"Nonlinear dimensionality reduction by locally linear embedding," 
Roweis & Saul, Science, 2000.


This paper aims to find out the compact representation of data in low dimensional space called manifold. ( some locally Euclidean topological space )
Below is the figure 1 cropped from the paper to give the basic understanding of the idea of manifold. Coherent structure in the world leads to strong correlations between data. As shown in the figure, similarity of neighbor points on the surface can be understand by some “low dimensional” description.



The LLE algorithm can be summarized in the following figure, which is based on simple geometric intuitions. The work characterize the local structure of these patches by linear coefficients that reconstruct each data point from its neighbors.
Where the cost function

The weights Wij summarize the contribution of the jth data point to the ith reconstruction. It can be used to keep the geometry structure of data after dimensional reduction, each high-dimensional observation Xi is mapped to a low-dimensional vector Yi representing global internal coordinates on the manifold. This is done by choosing d-dimensional coordinates Yi to minimize the embedding cost function

Subject to constraints that make the problem well-posed, it can be minimized by solving a sparse N X N eigen value problem, whose bottom d nonzero eigenvectors provide an ordered set of orthogonal coordinates centered on the origin.


At last, the projection result of some toy example was shown; we can see the similarities have been preserved actually.


Comments:

Manifold methods can preserve the spatial similarity information that traditional PCA, LDA methods can not attend. 
The inference result propose by this work is easy to be implemented, conceptually intuitive and simple.
Also, LLE scales well with the intrinsic manifold dimensionality, d, and does not require a discretized gridding of the embedding space. 
Finally, LLE is likely to be even more useful in combination with other methods in data analysis and statistical learning, leading to lots of future potential of development with every kind of works.

沒有留言:

張貼留言