FlashGraphng
A new frontier in largescale graph analysis and data mining

FlashGraph is an SSDbased graph analysis framework that we designed to process graphs with billions of vertices and hundreds of billions of edges or even larger. We extend FlashGraph to support processing more data structures such as sparse matrices and dense matrices. As such, FlashGraph is now able to support a wide variety of data mining and machine learning algorithms. We address the entire data analysis framework with FlashGraphng.
The current implementation of FlashGraphng has four main components:
The figure below shows the architecture of FlashGraphng. At the bottem is SAFS, which sits on top of an array of SSDs and exposes a unified asynchronous I/O interface to the data analysis frameworks. On the left is FlashGraph, which exposes a vertexcentric programming interface for users to express a varieties of graph algorithms. FlashGraph contains a set of graph algorithm library written in C++. The graph library is integrated with R so that R users can invoke the graph algorithms in R directly. On the right, FlashMatrix provides both inmemory and externalmemory vector and matrix implementations as well as a small set of generalized operators to perform computation on the vectors and matrices. FlashMatrix has an optimizer that optimizes a sequence of operations to achieve performance of an application comparable to a manually optimized C/C++ implementation. FlashR integrates the generalied operators in FlashMatrix to R and reimplements the existing R matrix operations with FlashMatrix. In the future, we will provide an R compiler so that R users can implement the userdefined functions in R and pass them to the generalized operators to perform actual computation.