FlashGraph-ng
A new frontier in large-scale graph analysis and data mining
 All Classes Namespaces Functions Variables Typedefs Enumerations Enumerator Pages
The mainpage documentation

FlashGraph is an SSD-based graph analysis framework that we designed to process graphs with billions of vertices and hundreds of billions of edges or even larger. We extend FlashGraph to support processing more data structures such as sparse matrices and dense matrices. As such, FlashGraph is now able to support a wide variety of data mining and machine learning algorithms. We address the entire data analysis framework with FlashGraph-ng.

The current implementation of FlashGraph-ng has four main components:

The figure below shows the architecture of FlashGraph-ng. At the bottem is SAFS, which sits on top of an array of SSDs and exposes a unified asynchronous I/O interface to the data analysis frameworks. On the left is FlashGraph, which exposes a vertex-centric programming interface for users to express a varieties of graph algorithms. FlashGraph contains a set of graph algorithm library written in C++. The graph library is integrated with R so that R users can invoke the graph algorithms in R directly. On the right, FlashMatrix provides both in-memory and external-memory vector and matrix implementations as well as a small set of generalized operators to perform computation on the vectors and matrices. FlashMatrix has an optimizer that optimizes a sequence of operations to achieve performance of an application comparable to a manually optimized C/C++ implementation. FlashR integrates the generalied operators in FlashMatrix to R and reimplements the existing R matrix operations with FlashMatrix. In the future, we will provide an R compiler so that R users can implement the user-defined functions in R and pass them to the generalized operators to perform actual computation.

Architecture