Events

DMS Graduate Student Seminar

Time: Nov 29, 2023 (03:00 PM)
Location: 108 ACLC

Details:

Speaker: Rui Shi, Auburn University 

Title: Outlier Detection with Cluster Catch Digraphs (CCDs)

 

Abstract: Outlier detection is one of the most popular research topics and also a challenging task. Many outlier detection approaches based on different techniques have been developed. We propose approaches that are based on Cluster Catch Digraphs (CCDs).
 
CCDs are a family of clustering algorithms. The CCD algorithms are graph-based, density-based, and distribution-based approaches. They construct a certain number of hyperspheres to capture latent cluster structures. There are two main versions of CCDs: KS-CCDs and RK-CCDs. The latter ones are especially appealing as they do not require parameter input.
 
We develop two outlier detection algorithms that first utilize RK-CCDs to build hyperspheres for each cluster structure. Then, by constructing a Mutual Catch Graph (MCG) from the KS-CCD, outliers can be identified among those points that are not in any of the hyperspheres. We called these two approaches the Mutual catch graph with Cluster Catch Digraph (M-CCD) algorithm and the Fast Mutual catch graph with Cluster Catch Digraph (Fast M-CCD) algorithm. However, due to some shortcomings of RK-CCDs, they perform poorly with high dimensionality. To resolve this issue, we propose a new version of CCDs with the Nearest Neighbor Distance (NND) and refer to it as NN-CCD. Subsequently, we proposed another outlier detection algorithm based on NN-CCDs, which has much better performance with high dimensionality; we call this approach the Mutual catch graph with Nearest Neighbor Cluster Catch Digraph (M-NNCCD) algorithm. We also propose ways to adapt the above algorithm to the cases when the shapes of clusters are arbitrary; we call those approaches the Mutual catch graph with Flexible Cluster Catch Digraph (M-FCCD) algorithm and the Mutual catch graph with Flexible Nearest Neighbor Cluster Catch Digraph (M-FNNCCD) algorithm. Additionally, we offer an approach to calculate the outlying-ness scores for all the CCD-based algorithms.
 
Keywords: Outlier detection, Graph-based clustering, Cluster catch digraphs, \(k\)-nearest neighborhood, Mutual catch graphs, Nearest neighbor distance, Outlying-ness score.