Classification of large graphs: различия между версиями

Перейти к навигации Перейти к поиску
нет описания правки
(Новая страница: «'''Classification of large graphs''' (''Классификация больших графов'') is based on their size <math> n</math> where <math> n</math>…»)
 
Нет описания правки
 
Строка 3: Строка 3:
'''Small graphs''' (under 10k vertices). At this size, standard algorithms run easily. For instance, computing all-pairs, shortest paths takes <math> O(n^3)</math> time and <math> O(n^2)</math> memory. This is not a problem for any modern computer.
'''Small graphs''' (under 10k vertices). At this size, standard algorithms run easily. For instance, computing all-pairs, shortest paths takes <math> O(n^3)</math> time and <math> O(n^2)</math> memory. This is not a problem for any modern computer.


'''A large small graph''' (10k – 1M vertices). These are graphs where <math> O(n^2)</math> time algorithms are possible, but <math> O(n^2)</math> memory algorithms become prohibitive or impossible. These graphs more strongly associated with small graphs though, because there are many tasks, such as diameter computations, that can be done exactly on these graphs, with some additional time. Two differences are worth noting: (1) the most important properties of graphs in this regime (and larger) are very different than the properties of small graphs, and (2) even if quadratic time computations are possible, they can be challenging, and they can become prohibitive if they are used in an exploratory data analysis mode or as part of a large cross-validation computation.
'''Large small graphs''' (10k – 1M vertices). These are graphs where <math> O(n^2)</math> time algorithms are possible, but <math> O(n^2)</math> memory algorithms become prohibitive or impossible. These graphs more strongly associated with small graphs though, because there are many tasks, such as diameter computations, that can be done exactly on these graphs, with some additional time. Two differences are worth noting: (1) the most important properties of graphs in this regime (and larger) are very different than the properties of small graphs, and (2) even if quadratic time computations are possible, they can be challenging, and they can become prohibitive if they are used in an exploratory data analysis mode or as part of a large cross-validation computation.


'''Small large graphs''' (1M – 100M vertices). In many ways the transition between small and large graphs occurs around one million vertices. For instance, with a graph of five million vertices, algorithms that do <math> O(n^2)</math> computations are generally infeasible without specialized computing resources. That said, with appropriate considerations being given to computational issues, graphs with between 1M and 100M vertices are reasonably easy to mine with fairly sophisticated techniques given modest computing resources. The basic reason for this is the extreme sparsity of real-world networks. Real-world graphs in this size regime typically have an average degree between 5 and 100. Thus, even a large real-world graph would have at most a few billion edges. This would consume a few gigabytes of memory and could easily be tackled on a modern laptop or desktop computer with 32 Gb of memory. For instance, computing a PageRank vector on a graph with 60M vertices takes a few minutes on a modern laptop.
'''Small large graphs''' (1M – 100M vertices). In many ways the transition between small and large graphs occurs around one million vertices. For instance, with a graph of five million vertices, algorithms that do <math> O(n^2)</math> computations are generally infeasible without specialized computing resources. That said, with appropriate considerations being given to computational issues, graphs with between 1M and 100M vertices are reasonably easy to mine with fairly sophisticated techniques given modest computing resources. The basic reason for this is the extreme sparsity of real-world networks. Real-world graphs in this size regime typically have an average degree between 5 and 100. Thus, even a large real-world graph would have at most a few billion edges. This would consume a few gigabytes of memory and could easily be tackled on a modern laptop or desktop computer with 32 Gb of memory. For instance, computing a PageRank vector on a graph with 60M vertices takes a few minutes on a modern laptop.

Навигация