-
Notifications
You must be signed in to change notification settings - Fork 0
Why?
Vector embeddings are often used to represent the features of data points in a high-dimensional space. Cosine similarity is a commonly used metric to compute the similarity between two vectors, which measures the cosine of the angle between them. Given two vectors u and v, their cosine similarity can be defined as:
cos(u, v) = (u · v) / (||u|| ||v||)
where · denotes the dot product of two vectors, and ||.|| denotes the L2 norm of a vector.
Tables of cosine similarities between pairs of data points can be represented as a matrix, which is symmetric since the similarity between two points does not depend on the order of the points. However, this representation does not capture the overall structure of the data and may not be optimal for clustering or visualization.
On the other hand, Laplacian eigenvectors can be used to represent the structure of a graph. Given a graph G with n vertices, the Laplacian matrix L(G) of G can be defined as:
L(G) = D(G) - A(G)
where D(G) is the degree matrix of G, which is a diagonal matrix where the ith diagonal entry is the degree of vertex i, and A(G) is the adjacency matrix of G, which is a symmetric matrix where the (i,j) entry is 1 if vertices i and j are adjacent, and 0 otherwise.
The Laplacian matrix has several important properties, such as being positive semidefinite and having non-negative eigenvalues. The eigenvectors of L(G) can be used to cluster the vertices of G or to visualize the structure of G in a low-dimensional space.
In particular, the second smallest eigenvector of L(G), denoted as v2, is often used in spectral clustering algorithms. The entries of v2 can be used to partition the vertices of G into two clusters, where vertices with positive entries belong to one cluster and vertices with negative entries belong to the other cluster.
Compared to cosine similarity matrices, Laplacian eigenvectors provide a more structured and interpretable representation of the data. Furthermore, they can be used to cluster the data points or to visualize the data in a low-dimensional space, which can be useful for various applications such as image or text classification.