---
title: "Optimizing breadth-first search for social networks"
page_name: "Optimizing Breadth-First Search for Social Networks"
type: "blog"
slug: "optimizing-breadth-first-search-for-social-networks"
published_at: "2014-10-28"
modified_at: "2025-05-09"
url: "https://www.sumologic.com/blog/optimizing-breadth-first-search-for-social-networks"
canonical: "https://www.sumologic.com/blog/optimizing-breadth-first-search-for-social-networks"
markdown_url: "https://www.sumologic.com/blog/optimizing-breadth-first-search-for-social-networks.md"
lang: "en"
excerpt: "According to Facebook, on average, any two people in the world are separated by just five other people..."
---

[ All blogs ](https://www.sumologic.com/blog "blog")

# Optimizing Breadth-First Search for Social Networks

[Karthik Anantha Padmanab](#blog-author-block-181)

October 28, 2014

5 min read 

##### Table of contents

 

 

 

 

Social network graphs, like the ones captured by Facebook and Twitter exhibit small-world characteristics *\[2\]\[3\]*. In 2011, Facebook documented that among all Facebook users at the time of their research (721 million users with 69 billion friendship links), there is an average average shortest path distance of 4.74 between users. This simply means that on average, any two people in the world are separated by just five other people. It’s a small world indeed ! Formally, a small-world network is defined to be a network where the typical distance *L* between two randomly chosen nodes grows proportionally to the logarithm of the number of nodes N in the network *\[4\]*.

Consider the following scenario. You have a social network profile and you want someone to introduce you to the person in that profile. But luckily you are given the entire friendship graph captured by this social network. If there are mutual friends, then you just ask one of them to help you out. If not, you need some sequence of friend introductions to finally meet that person. What is the minimum number of intermediate friend introductions that you need to meet that person you are interested in ? This is equivalent to finding the shortest path in the social network graph between you and that person of interest. The solution is to run Breadth First Search on that social network graph with your profile as the starting vertex. The other interesting question is, if we have extra information about our graph exhibiting small-world properties, can we make the exhaustive Breadth-First Search (BFS) faster? The ideas expressed on this topic appeared in Beamer et al *\[1\],* where the authors optimized BFS for the number of edges traversed.

**Breadth-First Search:**

BFS uses the idea of a frontier that separates the visited nodes from unvisited nodes. The frontier holds the nodes of the recently visited level and is used to find the next set of nodes to be visited. On every step of BFS, the current frontier is used to identify the next frontier from the set of unvisited nodes.

***Figure 1. A simple graph***

Looking at the example in the figure, the current frontier consists of the nodes 1, 2, 3, 4 and 5. The edges from these nodes are examined to find a node that has not been visited. In the above case node 2’s edges are used to mark H and add it to the next frontier. But note that even though H has been marked by 2, nodes 3, 4 and 5 still inspect H to see whether it is visited or not.

**Pseudocode for Naive BFS \[5\] :**

**Input**: *A graph G = (V,E) containing V vertices and E edges and source vertex s*

**Output**: *parent: Array\[Int\], where parent\[v\] gives the the parent of v in the graph or -1 is if a parent does not exist*

nn

n n

nn

n

n [view raw](https://gist.github.com/treadstone90/a7ca1bd9e54e42b56ad4/raw/ffff5f93404112d055401509bf4e4bc242d1962a/top-down.scala)n [top-down.scala](https://gist.github.com/treadstone90/a7ca1bd9e54e42b56ad4#file-top-down-scala)n hosted with ❤ by [GitHub](https://github.com)n 

n

n

n')

One of the observations of conventional BFS (henceforth referred to as top-down BFS) is that it always performs in the worst case complexity, i.e., *O(|V| + |E|)* where *V* and *E* are the number of vertices and number of edges respectively. For example, if a node v has p parents, then we just need to explore one edge from any p parents to v to check for connectivity. But top-down BFS checks all incoming edges to v.

The redundancy of these additional edge lookups is more pronounced when top-down BFS is run on graphs exhibiting small-world properties. As a consequence of the definition of small-world networks, the number of nodes increases exponentially with the effective diameter of the network, which result in large networks with very low diameters. The low diameter of these graphs forces them to have a larger number of nodes at a particular level and leads to top-down BFS visiting a larger number of nodes in every step, making the frontier very large. Traversing the edges of the nodes in a frontier is the major computation that is performed, and top-down BFS unfortunately ends up visiting all the outgoing edges from the frontier. Moreover, it has also been shown in *\[1\]* that most of the edge lookups from the frontier nodes end up in visited nodes (marked by some other parent), which gives further evidence that iterating through all edges from the frontier can be avoided.

The idea behind bottom-up BFS \[1\] is to avoid visiting all the edges of the nodes in the frontier, which is a pretty useful thing to do for the reasons mentioned above. To accomplish this, bottom-up BFS traverses the edges of the unvisited nodes to find a parent in the current frontier. If an unvisited node has at least one of its parents in the current frontier, then that node is added to the next frontier. To efficiently find if a node’s parent is present in the frontier, the frontier data structure is changed to a bitmap.

***Figure 2. Bottom up BFS***

In the above example, *{H, I, J, K }* are the unvisited nodes. However only nodes *{ J, H }* have a neighbor in the current frontier and as a result the next frontier now becomes *{H , J}*. In the next iteration the set of unvisited nodes will be *{I, K}* and each of them have a parent in the current frontier which is *{H, J}*. So *{I, K}* will be visited and the search will complete in the next iteration since there will be no more nodes to be added to the next frontier, since all nodes will be visited.

**Pseudocode for Bottom-up BFS**:

**Input**: *A graph G = (V,E) containing V vertices and E edges and source vertex s*

**Output**: *parent: Array\[Int\], where parent\[v\] gives the the parent of v in the graph or -1 is if a parent does not exist*

nn

n n

nn

n

n [view raw](https://gist.github.com/treadstone90/5ee73a21a23a99b7a523/raw/ba4dbba1b0e8f19bfcabb5c117387176dba39533/bottom-up.scala)n [bottom-up.scala](https://gist.github.com/treadstone90/5ee73a21a23a99b7a523#file-bottom-up-scala)n hosted with ❤ by [GitHub](https://github.com)n 

n

[AI Instructions](https://www.sumologic.com/ai-instructions.md)
