Social network graphs, like the ones captured by Facebook and Twitter exhibit smallworld characteristics [2][3]. In 2011, Facebook documented that among all Facebook users at the time of their research (721 million users with 69 billion friendship links), there is an average average shortest path distance of 4.74 between users. This simply means that on average, any two people in the world are separated by just five other people. It’s a small world indeed ! Formally, a smallworld network is defined to be a network where the typical distance L between two randomly chosen nodes grows proportionally to the logarithm of the number of nodes N in the network [4].
Consider the following scenario. You have a social network profile and you want someone to introduce you to the person in that profile. But luckily you are given the entire friendship graph captured by this social network. If there are mutual friends, then you just ask one of them to help you out. If not, you need some sequence of friend introductions to finally meet that person. What is the minimum number of intermediate friend introductions that you need to meet that person you are interested in ? This is equivalent to finding the shortest path in the social network graph between you and that person of interest. The solution is to run Breadth First Search on that social network graph with your profile as the starting vertex. The other interesting question is, if we have extra information about our graph exhibiting smallworld properties, can we make the exhaustive BreadthFirst Search (BFS) faster? The ideas expressed on this topic appeared in Beamer et al [1], where the authors optimized BFS for the number of edges traversed.
BreadthFirst Search:
BFS uses the idea of a frontier that separates the visited nodes from unvisited nodes. The frontier holds the nodes of the recently visited level and is used to find the next set of nodes to be visited. On every step of BFS, the current frontier is used to identify the next frontier from the set of unvisited nodes.
Figure 1. A simple graph
Looking at the example in the figure, the current frontier consists of the nodes 1, 2, 3, 4 and 5. The edges from these nodes are examined to find a node that has not been visited. In the above case node 2’s edges are used to mark H and add it to the next frontier. But note that even though H has been marked by 2, nodes 3, 4 and 5 still inspect H to see whether it is visited or not.
Pseudocode for Naive BFS [5] :
Input: A graph G = (V,E) containing V vertices and E edges and source vertex s
Output: parent: Array[Int], where parent[v] gives the the parent of v in the graph or 1 is if a parent does not exist
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

class BFS(g: Graph) { val parent = ArrayBuffer.fill(g.numberVertices)(1).toArray def bfs(source: Int, updater: (Seq[Int], Array[Int]) => Seq[Int]) = { var frontier = Seq(source) parent(source) = 2 while (!frontier.isEmpty) { frontier = updater(frontier, parent) } } } trait TopDownUpdater extends FrontierUpdater { def update(frontier: Seq[Int], parents: Array[Int]): Seq[Int] = { val next = ArrayBuffer[Int]() frontier.foreach{ node => graph.getNeighbors(node).filter(parents(_) == 1).foreach { neighbor => next += neighbor parents(neighbor) = node } } next }

One of the observations of conventional BFS (henceforth referred to as topdown BFS) is that it always performs in the worst case complexity, i.e., O(V + E) where V and E are the number of vertices and number of edges respectively. For example, if a node v has p parents, then we just need to explore one edge from any p parents to v to check for connectivity. But topdown BFS checks all incoming edges to v.
The redundancy of these additional edge lookups is more pronounced when topdown BFS is run on graphs exhibiting smallworld properties. As a consequence of the definition of smallworld networks, the number of nodes increases exponentially with the effective diameter of the network, which result in large networks with very low diameters. The low diameter of these graphs forces them to have a larger number of nodes at a particular level and leads to topdown BFS visiting a larger number of nodes in every step, making the frontier very large. Traversing the edges of the nodes in a frontier is the major computation that is performed, and topdown BFS unfortunately ends up visiting all the outgoing edges from the frontier. Moreover, it has also been shown in [1] that most of the edge lookups from the frontier nodes end up in visited nodes (marked by some other parent), which gives further evidence that iterating through all edges from the frontier can be avoided.
The idea behind bottomup BFS [1] is to avoid visiting all the edges of the nodes in the frontier, which is a pretty useful thing to do for the reasons mentioned above. To accomplish this, bottomup BFS traverses the edges of the unvisited nodes to find a parent in the current frontier. If an unvisited node has at least one of its parents in the current frontier, then that node is added to the next frontier. To efficiently find if a node’s parent is present in the frontier, the frontier data structure is changed to a bitmap.
Figure 2. Bottom up BFS
In the above example, {H, I, J, K } are the unvisited nodes. However only nodes { J, H } have a neighbor in the current frontier and as a result the next frontier now becomes {H , J}. In the next iteration the set of unvisited nodes will be {I, K} and each of them have a parent in the current frontier which is {H, J}. So {I, K} will be visited and the search will complete in the next iteration since there will be no more nodes to be added to the next frontier, since all nodes will be visited.
Pseudocode for Bottomup BFS:
Input: A graph G = (V,E) containing V vertices and E edges and source vertex s
Output: parent: Array[Int], where parent[v] gives the the parent of v in the graph or 1 is if a parent does not exist
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

trait DirectedSerialAncestorManager extends SerialAncestorManager{ var _graph: SerialDirectedGraph = _ def getAncestor(id: Int): IndexedSeq[Int] = { _graph.getParents(id) } def getVertices: IndexedSeq[Int] = (0 to _graph.numberVertices  1) } trait SBottomUpUpdater extends FrontierUpdater with SerialAncestorManager { def update(frontier: BitSet, parents: Array[Int]): Seq[Int] = { val next = mutable.BitSet() val vertices = getVertices val frontierSet = frontier.toSet (vertices.filter(parents(_) == 1)).foreach { node => val neighbors = (getAncestor(node)) neighbors.find(frontierSet) match { case Some(ancestor) => { parents(node) = ancestor next(node) = true } case None => None } } next.toBuffer } }

The major advantage to this approach is that the search for an unvisited node’s parent will terminate once any one parent is found in the current frontier. Contrast this with topdown BFS, which needs to visit all the neighbors of a node in the frontier during every step.
Topdown, Bottomup, or both?
When the frontier is large, you gain by performing bottomups BFS as it only examines some edges of the unvisited nodes. But when the frontier is small, it may not be advantageous to perform bottomup BFS, as apart from having to go over, it incurs the additional overhead of identifying the unvisited nodes. Smallworld networks usually start off with small frontiers in the initial step and have an exponential increase in the frontier size in the middle stages of the search procedure. These tradeoffs lead us to another approach for smallworld networks where we combine combine both topdown and bottomup BFS—hybrid BFS [1]. In hybrid BFS, the size of the frontier is used to define a heuristic, which is used to switch between the two approaches, topdown and bottomup. A thorough analysis of this heuristic is presented in [1].
How about parallelizing these approaches ?
When trying to parallelize the two approaches, observe that bottomup BFS is easier to parallelize than topdown BFS. For bottomup BFS, you can introduce parallelism in the stage where you populate the next frontier. Each of the unvisited nodes can be examined in parallel, and since every node just updates itself in the next data structure, it does not require the use of locks.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

trait ParallelAncestorManager { def getAncestor(id: Int): ParSeq[Int] def getParVertices: ParSeq[Int] } trait PBottomUpUpdater extends FrontierUpdater with ParallelAncestorManager { def update(frontier: Seq[Int], parents: Array[Int]):Seq[Int] = { val next = BitSet() val frontierSet = frontier.toSet getParVertices.filter(parents(_) == 1).foreach { node => val parNeighbors = getAncestor(node) parNeighbors.find(x => frontierSet.contains(x)) match { case Some(ancestor) => { parents(node) = ancestor next(node) = true } case None => None } } next.toBuffer } }

On inspecting the topdown BFS pseudocode for sources of parallelism, observe that the nodes in the current frontier can be explored in parallel. The parallel topdown pseudocode is:
1
2
3
4
5
6
7
8
9
10
11
12
13

trait PTopDownUpdater extends FrontierUpdater { def update(frontier: Seq[Int], parents: Array[Int]): Seq[Int] = { val next = ArrayBuffer[Int]() frontier.par.foreach { node => graph.getNeighbors(node).filter(parents(_) == 1).foreach { neighbor => next += neighbor parents(neighbor) = node } } next }

In terms of correctness, the above pseudocode looks good, but there is a benign race condition introduced by updating parents and next. This may result in a node being added more than once, making it inefficient. But it does not affect the correctness of the algorithm. Cleaner code would have a synchronized block to ensure only one thread updates the frontier.
The hybrid approach combining the parallel versions of topdown and bottomup BFS provides one of the fastest single node implementation of Parallel BFS [1].
References:

Beamer, Scott, Krste Asanović, and David Patterson. “Directionoptimizing breadthfirst search.” Scientific Programming 21.3 (2013): 137148.

Ugander, Johan, et al. “The anatomy of the facebook social graph.” arXiv preprint arXiv:1111.4503 (2011).

Li, Jun, Shuchao Ma, and Shuang Hong. “Recommendation on social network based on graph model.” Control Conference (CCC), 2012 31st Chinese. IEEE, 2012.

Watts, Duncan J., and Steven H. Strogatz. “Collective dynamics of ‘smallworld’networks.” nature 393.6684 (1998): 440442.

Introduction to Algorithms (1990) by T H Cormen, C E Leiserson, R L Rivest