Large graph mining systems

Материал из WEGA
Перейти к навигации Перейти к поиску

Mining large graphs can be done with custom software developed for each task. However, there are now a number of large graph mining systems(Системы обработки больших графов) (and there continue to be more that are being developed) that hope to make the process easier. These systems abstract standard details away and provide a higher-level interface to manipulate algorithms running on a graph. Three relevant properties of such systems are as follows.

Batch or online systems. A batch system must process the entire graph for any task, whereas an online system provides access to arbitrary regions of the graph more quickly.

Systems with adjacency or edge list. A system that allows adjacency access enables us to get all neighbors of a given node. A system that allows edge list access only gives us a set of edges.

Distributed or centralized systems. If the graph mining system is distributed, then systems can only access local regions of the graph that are stored on a given machine, and the data that are needed to understand the remainder of the graph may be remote and difficult to access; a centralized system has a more holistic view of the graph.

For instance, a MapReduce graph processing system is a batch, distributed system that provides either edge list or adjacency access; GraphLab is a distributed, online, adjacency system; and Ligra is an online, adjacency list, centralized system.

Литература

  • Buhlmann P., Drineas P., Kane M., van der Laan M. (eds.) Handbook of Big Data. — CRC Press, 2016.