Spark GraphX
作者:互联网
Concept
GraphX is Apache Spark’s API for graphs and graph-parallel computation.
GraphX is a new component in Spark for graphs and graph-parallel computation. At a high level, GraphX extends the Spark RDD by introducing a new Graph abstraction: a directed multigraph with properties attached to each vertex and edge. To support graph computation, GraphX exposes a set of fundamental operators (e.g., subgraph, joinVertices, and aggregateMessages) as well as an optimized variant of the Pregel API. In addition, GraphX includes a growing collection of graph algorithms and builders to simplify graph analytics tasks.
GraphX is a layer on top of Spark that provides a graph data structure composed of Spark RDDs, and it provides an API to operate on those graph data structures. GraphX comes with the standard Spark distribution, and you use it through a combination of the GraphX-specific API and the regular Spark API
GraphX is not a database. Instead, it’s a graph processing system, which is useful,
for example, for fielding web service queries or performing one-off, long-running
standalone computations. Because GraphX isn’t a database, it doesn’t handle updates
and deletes like Neo4j and Titan, which are graph databases.
Apache Giraph is another example of a graph processing system, but Giraph is limited to slow Hadoop Map/Reduce.
GraphX, Giraph, and GraphLab are all separate implementations of the ideas expressed in the Google Pregel paper. Such graph processing systems are optimized for running algorithms on the entire graph in a massively parallel manner, as opposed to working with small pieces of graphs like graph databases.
#画一个类似于标准关系数据库的比较
To draw a comparison to the world of standard relational databases,
graph databases like Neo4j are like OLTP (Online Transaction Processing) whereas graph processing systems like GraphX are like OLAP (Online Analytical Processing).
标签:like,graph,processing,API,Spark,GraphX 来源: https://blog.csdn.net/zhixingheyi_tian/article/details/120327503