Events

DMS Statistics and Data Science Seminar

Time: Oct 28, 2021 (02:00 PM)
Location: ZOOM

Details:

dayan2.jpg

Speaker: Da Yan, University of Alabama at Birmingham

Title: Large-Scale Graph Mining: From "Think Like a Vertex" to "Think Like a Task"


Abstract: Big graph processing systems such as Google's Pregel and Apache Spark's GraphX have become increasingly popular thanks to their emphasis on ease of programming. Unfortunately, these frameworks are dominantly designed for IO-bound execution and are only suitable for graph processing problems with a low time complexity. However, a lot of graph mining problems such as finding dense and frequent subgraph structures usually have a very high time complexity, and when IO-bound systems are applied, the performance is a catastrophe. This problem is currently still not getting enough attention in the big data mining community and many researchers are still using those IO-bound systems to address compute-heavy graph mining problems. In this talk, we explicitly categorize the popular graph mining problems into IO-heavy and CPU-heavy categories and provide prior evidence that CPU-heavy graph mining problems should not be addressed using IO-bound systems which can lead to performance worse than even a serial algorithm. We then introduce two recent compute-intensive solutions to mining dense subgraph structures and frequent subgraph patterns, respectively, that satisfactorily address the IO-bound issue of existing systems. The key design is to expose an explicit task-based divide-and-conquer API to users, in contrast to the existing iterative computation paradigms. We will also show how to develop popular graph mining algorithms in these frameworks.