FlashGraph-ng
A new frontier in large-scale graph analysis and data mining
 All Classes Namespaces Functions Variables Typedefs Enumerations Enumerator Pages
Classes | Typedefs | Enumerations | Functions | Variables
fg Namespace Reference

Compute kmeans on matrix of features. More...

Classes

class  FG_vector
 FlashGraph vector that provides several parallelized methods when compared to an STL-vector.
NOTE: Not an STL-compatible data structure. This vector is also ideally used with numeric data types.
Methods marked with the keyword parallel are parallelized implementations. More...
 
class  FG_graph
 A user-friendly wrapper for FlashGraph's raw graph type. Very usefule when when utilizing FlashGraph pre-written/library algorithms. More...
 
class  graph_config
 
class  compute_vertex
 Class from which users' vertex-centric programs should inherit. Serial code written when implementing run*. methods here will be run in parallel within the graph engine. More...
 
class  compute_directed_vertex
 A directed version of the compute_vertex class that users inherit from when using the FlashGraph engine. More...
 
class  vertex_scheduler
 
class  vertex_filter
 When the graph engine starts, a user can use this filter to decide what vertices are activated for the first time. More...
 
class  vertex_initializer
 A user may be decide to initialize individual vertex state in a custom way not expressible via the vertex constructor. This provides that capability. More...
 
class  vertex_query
 Parallized query of the vertex state of all vertices in the graph. Each worker thread gets an instance of the query and the per-thread query results will be merged in the end. Inherit from this class to run queries in parallel. More...
 
class  graph_engine
 This is the class that coordinates how & where algorithms are run. It can be seen as the central organ of FlashGraph. More...
 
class  graph_header
 
class  graph_index
 This file contains a set of graph index implementation. The graph index is the in-memory container to keep all vertices of a graph.
More...
 
class  vertex_message
 
struct  local_vid_t
 
class  ts_edge_data
 
class  edge_count
 
class  page_vertex
 Vertex representation when in the page cache. More...
 
class  TS_page_vertex
 
class  page_directed_vertex
 
class  page_undirected_vertex
 This vertex class represents an undirected vertex in the page cache. More...
 
class  vertex_program
 
class  vertex_program_creater
 Extend/Override when defining a custom vertex program. The graph engine uses this to construct vertex programs for each worker thread. More...
 
class  vertex_program_impl
 The default implementation of a vertex program in the graph engine. More...
 
class  ts_vertex_request
 

Typedefs

typedef unsigned int vsize_t
 Basic data types used in FlashGraph.
 
typedef std::pair< int, struct
local_vid_t
vertex_loc_t
 

Enumerations

enum  directed_triangle_type
 Triangle computation type. More...
 
enum  edge_type { , IN_EDGE, OUT_EDGE, BOTH_EDGES, NUM_TYPES }
 Edge type of an edge in the graph. More...
 

Functions

template<class T , class ApplyFunc >
void multi_vec_apply (const std::vector< typename FG_vector< T >::ptr > &inputs, typename FG_vector< T >::ptr output, ApplyFunc apply)
 Apply a user defined function to multipl FG_vectors. parallel More...
 
FG_vector< vertex_id_t >::ptr compute_wcc (FG_graph::ptr fg)
 Compute all weakly connectected components of a graph. More...
 
FG_vector< vertex_id_t >::ptr compute_sync_wcc (FG_graph::ptr fg)
 Compute all weakly connectected components of a graph synchronously. The reason of having this implementation is to understand the performance of synchronous wcc and asynchronous wcc. More...
 
FG_vector< vertex_id_t >::ptr compute_ts_wcc (FG_graph::ptr fg, time_t start_time, time_t time_interval)
 Compute all weakly connectected components of a time-series graph in a specified time interval. More...
 
FG_vector< vertex_id_t >::ptr compute_scc (FG_graph::ptr fg)
 Compute all strongly connected components of a graph. More...
 
FG_vector< size_t >::ptr compute_directed_triangles (FG_graph::ptr fg, directed_triangle_type type)
 Compute the directed triangle count for each each vertex. Currently, it only counts the number of cycle triangles. More...
 
FG_vector< size_t >::ptr compute_undirected_triangles (FG_graph::ptr fg)
 Compute undirected triangle count for each vertex. More...
 
FG_vector< size_t >::ptr compute_local_scan (FG_graph::ptr)
 Compute the per-vertex local Scan Statistic. More...
 
FG_vector< std::pair
< vertex_id_t, size_t > >::ptr 
compute_topK_scan (FG_graph::ptr, size_t topK)
 Obtain the top K vertices with the largest local Scan Statistic value. More...
 
size_t estimate_diameter (FG_graph::ptr fg, int num_bfs, bool directed)
 Compute the diameter estimation for a graph. More...
 
FG_vector< float >::ptr compute_pagerank (FG_graph::ptr fg, int num_iters, float damping_factor)
 Compute the PageRank of a graph using the pull method where vertices request the data from all their neighbors each iteration. Tends to converge to stable values. More...
 
FG_vector< float >::ptr compute_pagerank2 (FG_graph::ptr, int num_iters, float damping_factor)
 Compute the PageRank of a graph using the push method where vertices send deltas of their PageRank to neighbors in the event their own PageRank changes. More...
 
in_mem_subgraph::ptr fetch_subgraph (FG_graph::ptr graph, const std::vector< vertex_id_t > &vertices)
 Fetch the clusters with the wanted cluster IDs. More...
 
FG_vector< size_t >::ptr compute_kcore (FG_graph::ptr fg, size_t k, size_t kmax=0)
 Compute the k-core/coreness of a graph. The algorithm will determine which vertices are between core k and kmax – all other vertices will be assigned to core 0. More...
 
FG_vector< vsize_t >::ptr get_degree (FG_graph::ptr fg, edge_type type)
 Get the degree of all vertices in the graph. More...
 
FG_vector< float >::ptr compute_transitivity (FG_graph::ptr fg)
 Compute the transitivity/clustering coefficient of a graph. More...
 
FG_vector< float >::ptr compute_betweenness_centrality (FG_graph::ptr fg, const std::vector< vertex_id_t > &vids)
 Compute the betweeenness centrality of a graph. More...
 
FG_vector< vsize_t >::ptr get_ts_degree (FG_graph::ptr fg, edge_type type, time_t start_time, time_t time_interval)
 Get the degree of all vertices in a specified time interval in a time-series graph. More...
 
std::pair< time_t, time_t > get_time_range (FG_graph::ptr fg)
 Get the time range in which the time-series graph is. More...
 
void compute_overlap (FG_graph::ptr fg, const std::vector< vertex_id_t > &vids, std::vector< std::vector< double > > &overlap_matrix)
 Get the neighborhood overlap of each pair of vertices in `vids'. More...
 
void compute_louvain (FG_graph::ptr fg, const uint32_t levels)
 Compute louvain clustering for a graph. More...
 
sem_kmeans_ret::ptr compute_sem_kmeans (FG_graph::ptr fg, const size_t k, const std::string init, const unsigned max_iters, const double tolerance)
 Compute Semi-External Memory kmeans. More...
 

Variables

const vertex_id_t MAX_VERTEX_ID = UINT_MAX
 

Detailed Description

Compute kmeans on matrix of features.

Parameters
matrixThe matrix who's row IDs are being clustered.
clustersThe cluster centers (means).
cluster_assignmentsWhich cluster each sample falls into.
cluster_assignment_countsHow many members each cluster has.
num_rowsThe number of rows in matrix.
nevThe number of eigenvalues / number of columns in matrix.
kThe number of clusters required.
max_itersThe maximum number of iterations of K-means to perform.
initThe type of initilization ["random", "forgy", "kmeanspp"]

Copyright 2014 Open Connectome Project (http://openconnecto.me) Written by Da Zheng (zheng.nosp@m.da19.nosp@m.36@gm.nosp@m.ail..nosp@m.com)

This file is part of FlashGraph.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Typedef Documentation

typedef std::pair<int, struct local_vid_t> fg::vertex_loc_t

The vertex location in a partition. first: the partition ID. second: the location in the partition.

Enumeration Type Documentation

Triangle computation type.

  • CYCLE triangles are defined for directed graphs and depend on the direction of each edge. All edges must be head to tail connections. E.g A --—> B ^ / | / | v C
  • ALL triangles. Edge direction is disregarded. E.g A --— B | / | / | / C

Edge type of an edge in the graph.

Enumerator
IN_EDGE 

No edge

OUT_EDGE 

In edges

BOTH_EDGES 

Out edges

NUM_TYPES 

Both in and out edges

Function Documentation

FG_vector<float>::ptr fg::compute_betweenness_centrality ( FG_graph::ptr  fg,
const std::vector< vertex_id_t > &  vids 
)

Compute the betweeenness centrality of a graph.

Parameters
fgThe FlashGraph graph object for which you want to compute.
vidsThe vertex IDs for which BC should be computed
Returns
A vector with an entry for each vertex in the graph's betweennesss centrality value.
FG_vector<size_t>::ptr fg::compute_directed_triangles ( FG_graph::ptr  fg,
directed_triangle_type  type 
)

Compute the directed triangle count for each each vertex. Currently, it only counts the number of cycle triangles.

Parameters
fgThe FlashGraph graph object for which you want to compute.
typeThe type of triangles you wish to count.
Returns
A vector that contains the number of triangles associated with each vertex in the graph.
FG_vector<size_t>::ptr fg::compute_kcore ( FG_graph::ptr  fg,
size_t  k,
size_t  kmax = 0 
)

Compute the k-core/coreness of a graph. The algorithm will determine which vertices are between core k and kmax – all other vertices will be assigned to core 0.

Parameters
fgThe FlashGraph graph object for which you want to compute.
kThe core value to be computed.
kmax(Optional) The kmax value. If omitted then all cores are computed i.e., coreness. This is not recommended for very large graphs.
Returns
An FG_vector containing the core of each vertex between k and kmax. All other vertices are assigned to core 0.
FG_vector<size_t>::ptr fg::compute_local_scan ( FG_graph::ptr  )

Compute the per-vertex local Scan Statistic.

Parameters
fgThe FlashGraph graph object for which you want to compute.
Returns
A vector with an entry for each vertex in the graph's local scan value.
void fg::compute_louvain ( FG_graph::ptr  fg,
const uint32_t  levels 
)

Compute louvain clustering for a graph.

Parameters
fgThe FlashGraph graph object for which you want to compute.
levelsThe number of levels of the hierarchy to do.
void fg::compute_overlap ( FG_graph::ptr  fg,
const std::vector< vertex_id_t > &  vids,
std::vector< std::vector< double > > &  overlap_matrix 
)

Get the neighborhood overlap of each pair of vertices in `vids'.

Parameters
fgThe FlashGraph graph object for which you want to compute.
vidsThe vertices whose neighborhood overlap is computed.
overlap_matrixA dense matrix that stores the overlap of each pair of vertices.
FG_vector<float>::ptr fg::compute_pagerank ( FG_graph::ptr  fg,
int  num_iters,
float  damping_factor 
)

Compute the PageRank of a graph using the pull method where vertices request the data from all their neighbors each iteration. Tends to converge to stable values.

Parameters
fgThe FlashGraph graph object for which you want to compute.
num_itersThe maximum number of iterations for PageRank.
damping_factorThe damping factor. Originally .85.
Returns
A vector with an entry for each vertex in the graph's PageRank value.
FG_vector<float>::ptr fg::compute_pagerank2 ( FG_graph::ptr  ,
int  num_iters,
float  damping_factor 
)

Compute the PageRank of a graph using the push method where vertices send deltas of their PageRank to neighbors in the event their own PageRank changes.

Parameters
fgThe FlashGraph graph object for which you want to compute.
num_itersThe maximum number of iterations for PageRank.
damping_factorThe damping factor. Originally .85.
Returns
A vector with an entry for each vertex in the graph's PageRank value.
FG_vector<vertex_id_t>::ptr fg::compute_scc ( FG_graph::ptr  fg)

Compute all strongly connected components of a graph.

Parameters
fgThe FlashGraph graph object for which you want to compute.
Returns
A vector with a component ID for each vertex in the graph.
sem_kmeans_ret::ptr fg::compute_sem_kmeans ( FG_graph::ptr  fg,
const size_t  k,
const std::string  init,
const unsigned  max_iters,
const double  tolerance 
)

Compute Semi-External Memory kmeans.

Parameters
fgThe FlashGraph graph object for which you want to compute.
kThe number of clusters.
initInitialization type [random, forgy, kmeanspp].
max_itersThe max number of iterations to compute for.
toleranceThe min fraction of changes from 1 iter to next required to converge.
FG_vector<vertex_id_t>::ptr fg::compute_sync_wcc ( FG_graph::ptr  fg)

Compute all weakly connectected components of a graph synchronously. The reason of having this implementation is to understand the performance of synchronous wcc and asynchronous wcc.

Parameters
fgThe FlashGraph graph object for which you want to compute.
Returns
A vector with a component ID for each vertex in the graph.
FG_vector<std::pair<vertex_id_t, size_t> >::ptr fg::compute_topK_scan ( FG_graph::ptr  ,
size_t  topK 
)

Obtain the top K vertices with the largest local Scan Statistic value.

Parameters
fgThe FlashGraph graph object for which you want to compute.
topKThe value for K used for the top K vertices.
Returns
A vector of std::pair with an entry for each vertex in the top K together with its value.
FG_vector< float >::ptr fg::compute_transitivity ( FG_graph::ptr  fg)

Compute the transitivity/clustering coefficient of a graph.

Compute transitivity of all vertices in the graph.

Parameters
fgThe FlashGraph graph object for which you want to compute.
Returns
A vector with an entry for each vertex in the graph's transitivity value.
Parameters
fgThe FlashGraph graph object for which you want to compute.
Returns
A vector with an transitivity value for each vertex.
FG_vector<vertex_id_t>::ptr fg::compute_ts_wcc ( FG_graph::ptr  fg,
time_t  start_time,
time_t  time_interval 
)

Compute all weakly connectected components of a time-series graph in a specified time interval.

Parameters
fgThe FlashGraph graph object for which you want to compute.
start_timeThe start time of the time interval.
time_intervalThe length of the time interval.
Returns
A vector with a component ID for each vertex in the graph.
FG_vector<size_t>::ptr fg::compute_undirected_triangles ( FG_graph::ptr  fg)

Compute undirected triangle count for each vertex.

Parameters
fgThe FlashGraph graph object for which you want to compute.
Returns
A vector that contains the number of triangles associated with each vertex in the graph.
FG_vector<vertex_id_t>::ptr fg::compute_wcc ( FG_graph::ptr  fg)

Compute all weakly connectected components of a graph.

Parameters
fgThe FlashGraph graph object for which you want to compute.
Returns
A vector with a component ID for each vertex in the graph.
size_t fg::estimate_diameter ( FG_graph::ptr  fg,
int  num_bfs,
bool  directed 
)

Compute the diameter estimation for a graph.

Parameters
fgThe FlashGraph graph object for which you want to compute.
Returns
The diameter estimate value.
in_mem_subgraph::ptr fg::fetch_subgraph ( FG_graph::ptr  graph,
const std::vector< vertex_id_t > &  vertices 
)

Fetch the clusters with the wanted cluster IDs.

Parameters
fgThe FlashGraph graph object for which you want to compute.
verticesThe vertices that the induced subgraph has.
Returns
A subgraph.
FG_vector<vsize_t>::ptr fg::get_degree ( FG_graph::ptr  fg,
edge_type  type 
)

Get the degree of all vertices in the graph.

Parameters
fgThe FlashGraph graph object for which you want to compute.
typeThe edge type: IN_EDGE, OUT_EDGE, BOTH_EDGES.
Returns
A vector with an entry for each vertex degree.
std::pair<time_t, time_t> fg::get_time_range ( FG_graph::ptr  fg)

Get the time range in which the time-series graph is.

Parameters
fgThe FlashGraph graph object for which you want to compute.
Returns
A pair of timestamp that defines the time range of the time-series graph.
FG_vector<vsize_t>::ptr fg::get_ts_degree ( FG_graph::ptr  fg,
edge_type  type,
time_t  start_time,
time_t  time_interval 
)

Get the degree of all vertices in a specified time interval in a time-series graph.

Parameters
fgThe FlashGraph graph object for which you want to compute.
typeThe edge type: IN_EDGE, OUT_EDGE, BOTH_EDGES.
start_timeThe start time of the time interval.
time_intervallength of the time interval.
Returns
A vector with an entry for each vertex degree.
template<class T , class ApplyFunc >
void fg::multi_vec_apply ( const std::vector< typename FG_vector< T >::ptr > &  inputs,
typename FG_vector< T >::ptr  output,
ApplyFunc  apply 
)

Apply a user defined function to multipl FG_vectors. parallel

Parameters
inputsA vector of FG_vectors that are the inputs.
outputA FG_vector that are the outputs.
applyThe user-defined function that will be applied to all vecotors.

Variable Documentation

const vertex_id_t fg::MAX_VERTEX_ID = UINT_MAX

Used to represent vertex IDs in graph