Spectral clustering

From CU Denver Optimization Student Wiki
Jump to: navigation, search

Spectral clustering[1] is a method that uses weighted graphs to partition the network into groups of different sizes and densities. Let $ G = (V,E) $ be an undirected graph with vertex set $ V = \{v_{1}, \ldots , v_{n} \} $ and edge at $ E= \{(v_{i}, v_{j}): v_{i}, v_{j} \in V \} $. For each $ v_i $ we define the degree of $ v_i \in V $ as

$ d_{i}=\sum_{j=1}^{n} w_{ij}, $

where $ w_{ij} $ is a weight between two vertices $ v_i $ and $ v_j $; note that $ w_{ij}\geq 0 $ and because the graph is undirected $ w_{ij}=w_{ji} $.

Different similarity graphs

For a set of data points $ x_1, \ldots ,x_n $, with $ x_i \in \mathbb{R}^2 $ let the similarity graph be represented by $ G = (V,E) $, where each vertex $ v_i $ in $ V $ represents a data point $ x_i $. Given $ s_{ij}\geq 0 $ i.e., some notion of similarity between all pairs of data points $ x_i $ and $ x_j $, two vertices are connected if the similarity value is positive or larger than a certain threshold; the edge is then weighted by $ s_{ij} $. There are several constructions to transform a given set $ x_1, \ldots ,x_n $ of data points with pairwise similarities $ s_{ij} $ or pairwise distances $ d_{ij} $ into a graph $ G=(V,E) $. We will discuss two alternatives ways and give an example for each. In particular, we take the data set and classify it into different groups through spectral clustering. The results obtained for different similarity graphs with the same set of data are shown in the following Figures. These figures show the relationships among artificially generated data and allow us to observe the various advantages of different similarity graph.

The $ \varepsilon $-neighborhood approach

Given a positive $ \varepsilon $, all points at a distance less than $ \varepsilon $ are connected. As the distances between all connected points are roughly on the same scale (in the sense that they are at most $ \varepsilon $ units away), weighting the edges would not incorporate more information of the data in the graph. Hence, the $ \varepsilon $-neighborhood graph is usually represented by an unweighted graph. In the Figure below the points within a distance less than $ \varepsilon = 0.28 $ are joined by an edge. It shows that the middle points on each moon are strongly connected, while the points in the extremes less so. In general, it is difficult to define a parameter $ \varepsilon $ that is robust, especially when the points are distributed with different distances among them depending on the space. If we choose a good parameter $ \varepsilon $, we obtain well-defined clusters at the output of the algorithm. As the Figure illustrates the data is grouped into eight clusters represented by different colors.


A $  \varepsilon  $-neighborhood graph with $  \varepsilon = 0.28  $ and 8 clusters.

The $ k $-nearest neighbor approach

Given a positive $ k $, vertex $ v_{i} $ is connected to vertex $ v_{j} $ if $ v_{j} $ is among the $ k $-nearest neighbors of $ v_{i} $. Note that, this definition leads to a directed graph, as the neighborhood relationship is not symmetric. There are two ways of making it undirected:

- One way is to simply ignore the directions of the edges, that is we connect $ v_{i} $ and $ v_{j} $ with an undirected edge if $ v_{i} $ is among the $ k $-nearest neighbors of $ v_{j} $ or if $ v_{j} $ is among the $ k $-nearest neighbors of $ v_{i} $. The resulting graph is what is usually called the $ k $-nearest neighbor graph, shown in the Figure below. Vertices $ x_i $ and $ x_j $ are connected with an undirected edge if $ x_i $ is among the 6-nearest neighbors of $ x_j $ or if $ x_j $ is among the 6-nearest neighbors of $ x_i $. The advantage of this approach is that if many points are too close, it will not generate excessive links (as can be seen at the points in the middle of the moon of the Figure). However, the $ k $-nearest approach can break the graph into several disconnected components and give a wrong number of connected components.


A $ k $-nearest neighbor graph with $ k=6 $ and 7 clusters.


- Another way is to connect vertices $ v_{i} $ and $ v_{j} $ if both $ v_{i} $ is among the $ k $-nearest neighbors of $ v_{j} $ and $ v_{j} $ is among the $ k $-nearest neighbors of $ v_{i} $. The resulting graph is called the \emph{mutual $ k $-nearest neighbor graph} (shown in the Figure below). Compared to the other approaches mutual $ k $-nearest neighbor graph does not alter the number of connected components, being appropriate to detect clusters of different densities.


A mutual $ k $-nearest neighbor graph with $ k=6 $ and 7 clusters.

For the last two approaches, after connecting the appropriate vertices, we weight the edges by the similarity of their endpoints.

The unnormalized graph Laplacian

Let $ W = (w_{ij})_{i,j=1,...,n} $ the weighted adjacency matrix of the graph and the degree matrix $ \mathcal{D} $ the diagonal matrix with the degrees $ d_{1}, \ldots , d_{n} $ on the diagonal. The unnormalized graph Laplacian matrix is defined as

$ L = \mathcal{D}-W. $

Unnormalized spectral clustering algorithm

To be able to formalize spectral clustering, we need to define the similarity matrix. Given a set of data points $ x_1, \ldots ,x_n $ and some notion of similarity $ s_{ij}=s(x_i, x_j) $, there is a similarity matrix $ S = (s_{ij})_{i,j=1, \ldots ,n} $ which is assumed to be symmetric and non-negative.

Unnormalized spectral clustering algorithm

Input: Similarity matrix $ S \in \mathbb{R}^{n\times n} $ and number $ m $ of clusters to construct.

- Construct a similarity graph. Let $ W $ be its weighted adjacency matrix.

- Compute the first $ m $ eigenvectors $ u_{1}, \ldots , u_{m} $ of $ L $.

- Let $ U \in \mathbb{R}^{n\times m} $ be the matrix containing the vectors $ u_{1}, \ldots , u_{m} $ as columns.

- Let $ y_{i} \in \mathbb{R}^{m} $ be the vector corresponding to the $ i $-th row of $ U $, for $ i = 1, \dots , n $

- Cluster the points $ (y_{i})_{i=1,...,n} $ in $ \mathbb{R}^{m} $ into clusters $ C_{1}, \ldots ,C_{m} $.

Output: Clusters $ A_{1}, \ldots , A_{m} $ with $ A_{i} $ =$ \{j| y_{j} \in C_{i}\} $.

The output of the algorithm returns the partitioned network. Points in different clusters are dissimilar to each other; that is, between-cluster similarities are minimized and within-cluster similarities are maximized.


Applying spectral clustering

Crime response planning by linear programing gives the optimal police location with regard to a set of crime locations from the Open Data Catalog[2]. The analysis in the project is focused on murder and robbery data but could be extended to other crimes. The goal is to minimize the expected distance between the crime location and the police location.

The Denver Open Data Catalog includes criminal offenses in the City and County of Denver for the previous five calendar years plus the current year to date. The data is based on the National Incident Based Reporting System (NIBRS). The data is filtered in the preprocessing stage of a Python program, avoiding crime locations outside of Denver area.

In the first part of the project, we implement and compare the Discrete Wasserstein Barycenters models [3] using murder data. Then, we included robbery to the study. This increases the amount of input data for 2016 from 52 to 1204 cases. To be able to include different types of crime and work with a bigger data set, we apply spectral clustering to partition the locations of crime incidents in the Denver area into smaller parts. Then, we apply the discrete barycenter models to determine the optimal police location in each part.

The main result of the project is a Python code that extracts relevant data from Denver Open Data Cataloge, computes spectral clustering, implements discrete barycenters for crime data in AMPL, and provides a visualization of results using Google maps. Repository at https://github.com/nataliavillegasfranco/barycenters.


Partitioned network for murder and robbery data using spectral clustering with 4 clusters.

Spectral clustering generates a network that captures the relationships between crime locations, using a similarity matrix $ S=(s_{ij})_{i,j=1,\dots ,n} $, where $ s_{ij}=s(x_i,x_j) $.

We apply spectral clustering to robbery and murder data from the year 2016 and get partition in the Figure above.


Strongly Polynomial 2-Approximation with 4 clusters. Murder and robbery data for 2016 (12 months).

The Figure above shows the 2-approximation model[4] with 4 clusters with murder and robbery data for January to December 2016, 12 months. The blue markers represent the suggested police location.

References

  1. U. Luxburg. "A tutorial on spectral clustering. " Statistics and Computing, 17(4):395– 416, 2007. [1].
  2. [2], Denver Open Data Catalog: data set of the crimes occurred in Denver since 2012.
  3. Ethan Anderes, Steffen Borgwardt and Jacob Miller. "Discrete Wasserstein barycenters: optimal transport for discrete data. " Optimization and Control (math.OC), 10 Aug 2015. [3].
  4. Steffen Borgwardt. "Strongly Polynomial 2-Approximation of Discrete Wasserstein Barycenters."