Everyone gets bidirectional BFS wrong
People really need to stop blindly copying code from the Internet.
This post is about graphs and graph algorithms. Specifically, it’s about a common, simple algorithm that’s somehow so hard to get right that the first few pages of Google results are filled with wrong implementations.
Most of the post is about what graphs are and how some pathfinding algorithms work. If you don’t care about this, you can skip to the namedropping section (but you’ll miss out on some cool interactive visualizations).
The interactive figures in this post use HTML and CSS features that are still not supported by Safari. I’m working on fixing that, but in the meantime, please use a different browser.
Graphs 101
A graph is a bunch of things (called nodes or vertices) that are connected by links (called edges). Here are a few graphs, free of charge:
If we look closely, these three graphs can be characterized by two important properties:
- the first two are directed, meaning the edges have an intrinsic direction (e.g. “Eat breakfast” comes after “Wake up”), while the third is undirected, meaning the edges don’t have a direction (e.g. “Paris” is connected to “Berlin” and vice-versa).
- the last two contain cycles, meaning you can start at a node and follow the edges to come back to the same node, while the first doesn’t.
Additionally, the first one is a tree, or more pedantically a directed tree. In a tree, there exists at most one path between any two nodes.
Doing Stuff
A useful thing to do with graphs of things is to find paths between things, or more usefully, the shortest path between two things.
For example, in the first graph, this would be somewhat equivalent to trying to find a folder named “Documents” that is located somewhere on the disk (but whose exact location we do not know). The path would be the sequence of folders you need to go through to reach “Documents”, here C: → Users → Alice → Documents, or written as an (aptly-named) path, C:\Users\Alice\Documents
. We would say that path has a length of 3, since we had to go down three levels to find our target. It’s easy to see, for this small graph, that this is the shortest path, and that it is unique.
Finding a path in the second graph is not really interesting since it’s just a circle. The shortest path between two nodes always consists of simply going around the circle from the first node to the second.
In the third graph, finding a path would be like finding a train route between two cities. For example, to go from Paris to Milan, you could take the train from Paris to Rome, then from Rome to Milan. The path would be Paris → Rome → Milan
. This path has a length of 2. Of course, you could also go through Brussels and Berlin, that would be a valid path, but it would not be the shortest.
In real life, graphs can contain a multitude of additional information, like the cost of going from one node to another (which, here, could be the time it takes to travel between two cities, or the amount of fuel needed, or the cost of a train ticket, etc), in which case the graph is said to be weighted, and the shortest path ends up not simply being the path that has the fewest edges, but the one that has the smallest sum of costs.
In any case, here, we’ll mostly be interested in undirected, unweighted graphs, such as the third one here (so, assume this is what “graph” means from now on).
Searches & Traversals
An interesting (to me, hopefully to you) question we can ask ourselves about a given graph is: what are the possible ways to visit all the nodes in the graph? This is called a traversal.
There are two common ways to traverse a graph, which are kind of each other’s dual: depth-first search (DFS) and breadth-first search (BFS).
Try playing with the above demo to get a feel for how these two algorithms work. You can see that DFS explores the graph by going as deep as possible along each branch, before backtracking (“climbing back up the tree”) and exploring another branch. BFS, on the other hand, explores the graph by visiting all the nodes at a given depth before moving on to the next depth.
For readability, enqueued nodes are annotated with their order in the queue: a node annotated with “1” will be visited at the next step.
Even for graphs that contain cycles, such as the cities one, all nodes will be visited exactly once, because the algorithms keep a set of visited nodes to avoid visiting the same node multiple times.
If it makes it easier for you to visualize, here are the graphs with the nodes annotated with the order in which they’ll be visited:
Finding Paths
As surprising as it may be, there is actually a quite simple algorithm to find the shortest path between two nodes in a graph: the BFS.
Think about it, the BFS visits the source node, then its neighbors, then its neighbors’ neighbors, and so on: one level of depth at a time. This means that to find the shortest path from the source to a destination, we just have to run a BFS starting from the source, and stop as soon as we reach the destination. The path we followed to reach the destination is necessarily the shortest.
If the nodes $a$ and $b$ are at a distance $D$ from each other, and we run a BFS from $a$, we’ll visit nodes at depth 0 (that’s just $a$), then nodes at depth 1 ($a$’s neighbors), and so on, and when we reach $b$, the depth we are currently visiting is the distance between $a$ and $b$, that is: $D$.
Seen from the other way around, it’s impossible to reach $b$ in more than $D$ levels, because if it we’re at some level $L>D$, it means we’ve already visited the entirety of level $D$, which includes $b$.
Here’s a demo. If no node is highlighted and no path is displayed at the end, then it means no path exists between the two nodes.
Note: the algorithm stops as soon as the target is reached; i.e. as soon as it is added to the queue.
Finding Paths, Faster
The BFS is a great algorithm to find the shortest path between two nodes. So great, in fact, that it’s the optimal algorithm for the general case. That’s right. “But what about all those fast algorithms I’ve heard about, like Dijkstra’s or A*?” you might ask. Well, two things:
- Dijkstra’s algorithm is a BFS on steroids, that can handle giving different costs/weights to edges. Where the BFS visits neighbors in insertion order, Dijkstra’s uses a priority queue to order them based on the cost of the path. On an unweighted graph, Dijkstra’s is equivalent to BFS.
- The A* algorithm relies on the programmer providing a heuristic function that estimates the cost of reaching the destination from a given node. If the heuristic is well-chosen, A* will be faster than BFS, sure, but for a lot of cases, that’s a big if. This is why I specifically said “general case”: for a given graph $G$ that you know nothing about, BFS is the best you can do. A* is for when you know more about the graph, and can exploit that knowledge to make the search faster.
My use case (large social graphs) doesn’t have good, fast to compute cost functions that can be used as heuristics for algorithms such as A*, so I’m sticking with BFS. So, is BFS the fastest we can do? There is actually a slight variation of BFS that can be a lot faster in some cases: the bidirectional BFS.
A bidirectional search, as the name suggests, is simply a search that is performed from both the source and the destination nodes at the same time. The idea is that the two searches will meet somewhere in the middle, and the path between the two meeting points will be the shortest path between the two nodes.
Formally, when we study graphs, we can introduce a measure called the branching factor, which is the average outgoing number of nodes from a given node in the graph. In a binary tree, the branching factor is 2, because each node has at most 2 children.
The total number of nodes in such a graph is approximated by $b^d$, where $b$ is the branching factor and $d$ is the depth of the tree.
When a standard BFS goes through a graph with branching factor $b$, it recursively visits all the neighbors of all the nodes, level by level. At level 0, it visits 1 node (the source). At level 1, it visits all the neighbors of the source, that is, $b$ (on average). At level 2, all the neighbors’ neighbors, that is, $b^2$. At level $L$, it visits $b^L$ nodes. In total, if you’re trying to find the path between two nodes that are, say, 10 edges apart, the BFS will visit $1 + b + b^2 + \ldots + b^{10}$ nodes, which is in the order of $b^{10}$. For a binary tree, that would be $2^{10} = 1024$ nodes.
Let’s say, now, that we’re starting one BFS from the source, and one from the destination, and at each level we advance one step in each. The two BFS will meet at some point, and they must meet exactly in the middle, because they advance at the same speed (and the shortest path between two points is the sum of the path from the source to the middle, and the path from the middle to the destination). If the source and destination are $D$ edges apart, the two BFS will meet after $D/2$ steps, and the total number of nodes visited by each BFS will be $1 + b + b^2 + \ldots + b^{D/2}$, which is in the order of $b^{D/2}$. For a binary tree, that would be $2^5 = 32$ nodes (by each BFS, so, $64$ visited nodes in total).
We got from $1024$ to $64$ visited nodes, that’s something! Specifically, from $O(b^{D})$ to $O(b^{D/2})$, which is an exponential speedup.
As a reference, finding the path between two nodes ~10 edges apart in my 2M nodes graph previously took about 2 seconds with a standard BFS, and only takes 8ms with a bidirectional BFS. That’s a 250x speedup.
Here, play with it:
Finding the Wrong Path
My explanation of the bidirectional BFS might sound like it’s correct, and the demo might look like it’s working, but there’s actually a small, minuscule, tiny mistake that makes it all wrong. Well, not all wrong, it works for the demo! But it can give the wrong answer, sometimes. Here, try it with this pathological case:
For this graph, the bidirectional BFS finds a correct path… but not a minimal one; it’s one edge longer than the shortest path.
The problem, specifically, is at this step:
At the next step, the node a1 will be visited. This will add a2 to the queue, and then a3 will be visited, which will stop the algorithm since a2 will be found to already have been seen (since it’ll already be in the queue at that point). The resulting path will be one node too long.
What happened here?
Well, to find the shortest path, it would have had to visit either b1 or b2 before visiting a2. At first glance, this may look like an ordering problem: should we visit a nodes’ neighbors in a certain order? How would we sort them?
No, the problem is deeper. The two keys to understanding the issue are:
- “[…] one level of depth at a time. This means that to find the shortest path from the source to a destination, we just have to run a BFS starting from the source, and stop as soon as we reach the destination. The path we followed to reach the destination is necessarily the shortest.”
- “The two BFS will meet at some point, and they must meet exactly in the middle, because they advance at the same speed.”
The BFS works because it visits all the nodes at a given depth before moving on to the next depth. This isn’t what our bidirectional BFS is doing! It’s stopping right after seeing a2 without having had the chance to see the edge between b1 and b2. We’re getting a wrong result because our algorithm is wrong.
I wrote this in the previous section, while describing the bidirectional BFS:
- “Let’s say, now, that we’re starting one BFS from the source, and one from the destination, and at each level we advance one step in each.”
This sounds sensible, but I didn’t define what a “step” means. In the regular BFS, a step is just visiting the next node in the queue. But here, we need to consider a higher-level step: each BFS shouldn’t advance one node at a time, but one level of nodes (depth) at a time:
Here’s the same graph as above with a corrected algorithm:
We can also try the new algorithm on the example graphs. Two important points, though:
- for these, there are no cases I could find where the bad algorithm gives a path that is not the shortest, so you’ll get the same result with both versions (but the steps will differ)
- the incorrect version sometimes gets the result a bit faster than the correct one. That’s neat, but it’ll still give you incorrect results sometimes. A calculator that gives you the good answer really fast 99% of the time, but the wrong answer 1% of the time, is not a good calculator
Also, keep in mind that this is not a new insight: this SO answer details the exact same example as I did (though I found it after having written the better part of this post), and it’s responding to a question that cites an excerpt from the 2004 book “Artificial Intelligence: A Modern Approach” by Stuart Russell and Peter Norvig:
Bidirectional search is implemented by replacing the goal test with a check to see whether the frontiers of the two searches intersect; if they do, a solution has been found. It is important to realize that the first solution found may not be optimal, even if the two searches are both breadth-first; some additional search is required to make sure there isn’t a shortcut across the gap.
Someone is Wrong on the Internet
While working on a project of mine where I needed to implement some sort of pathfinding feature, I ended up wanting to implement the bidirectional BFS algorithm to replace the regular BFS I was using. A BFS is simple enough to implement, but I went the lazy route and just googled “bidirectional bfs implementation”. I clicked on a random legit-sounding link, translated the implementation to Rust, and it worked! And it was a lot faster, like I said earlier, so I was happy.
Days passed, but then, while fiddling with my project, I ended up stumbling upon a case where the path was wrong. Like, one node wrong — exactly the bug we just studied. I compared my code with the site I copied from, and I couldn’t see any mistake. Well, then, I googled again, and copied from another site. Different code structure, but same bug. So I googled again, and again. Same bug, everywhere. I started to suspect that reality was playing a prank on me. How could everyone be wrong?
I started methodically testing every bidirectional BFS implementation I could find. Most were wrong, a few were right. Among the wrong ones, quite a few were actually just copying code from a well-known CS website (and removing comments, and renaming variables, to make it look like their own… which is nothing more than plain old plagiarism), while some others were original-looking code, but with the same bug.
Here are the wrong results I found, along with date of the earliest archive.org crawl (which is a good enough approximation of when the code was first published). All of these were found on the first or second page of Google results. All of the links below contain wrong code:
- (C++, Java, Python, C#, JS) GeeksforGeeks (~2016)
- (C++) The Algorists (plagiarizing, 2020), also hosted on their old WordPress blog
- (Python) AskPython (plagiarizing, 2022)
- (Python) Educative (plagiarizing, 2022)
- (Java) Naukri Code 360 (plagiarizing, 2024)
- (C++) Educba (plagiarizing, 2020)
- (C++) An undergrad paper that unfortunately ended up using GeeksforGeeks’ implementation (with credits, 2020)
- (C++) At least 17 GitHub repositories (including some advertising themselves as CS/DSA lesson material, most with no credits)
- (Python) At least 31 GitHub repositories (same as above)
- (JS) Zachary Freeman on Medium (2023)
- (Java) OpenGenus (similar to GeeksforGeeks, 2020)
For fun, I also tested a few easily available LLMs with simple prompts. All of those I tested generated wrong (albeit original) code:
- (Python) ChatGPT 4o (has our bug)
- (C++) ChatGPT 4o (has our bug)
- (Python) Claude Haiku (has our bug, but also returns a completely wrong path sometimes)
- GitHub Copilot (Codex) guessed the wrong path not once but twice while trying to help me write this blogpost
If you want to run tests yourself, here is the pathological graph (both the one in the post and a symmetrical version):
[(0, 1), (0, 2), (1, 3), (3, 4), (4, 6), (2, 5), (5, 6)]
: correct path is0, 2, 5, 6
, wrong path is0, 1, 3, 4, 6
- or, in adjacency list form:
{ 0: [1, 2], 1: [0, 3], 2: [0, 5], 3: [1, 4], 4: [3, 6], 5: [2, 6], 6: [4, 5] }
- or, in adjacency list form:
[(0, 1), (0, 2), (1, 4), (2, 3), (3, 5), (4, 6), (5, 6)]
: correct path is0, 1, 4, 6
, wrong path is0, 2, 3, 5, 6
- or, in adjacency list form:
{ 0: [1, 2], 1: [0, 4], 2: [0, 3], 3: [2, 5], 4: [1, 6], 5: [3, 6], 6: [4, 5] }
- or, in adjacency list form:
And here are those I could find who implemented it correctly (at least, as far as I know):
- (Python) shawnlyu on Medium (2020)
- (Python) Andrew McDowell on StackOverflow (2019)
- (C++) Rafael Glikis on GitHub (2018)
- (Pseudocode) Baeldung CS (2022)
I only went up to page 5 of Google results, but other implementations may appear with different search keywords.
Finding the Right Path, Faster
There’s an additional easy optimization we can add on top of bidirectional BFS. Right now, we’re alternating sides, with each advancing one level at a time. But instead of alternating between sides, it appears that advancing on whichever side has the fewest nodes in its queue accelerates the algorithm in a lot of cases. The intuition behind is that:
- if a path exists, both sides will end up meeting at a midpoint that is on the shortest path
- advancing one level, no matter how many nodes we visit, will always get us closer to the midpoint by exactly one level
- if we’re getting one level closer, we might as well advance on the side that has the fewest queued nodes, since that means we’ll have less nodes to visit in the future (while still getting closer to the midpoint as fast as possible, but with less work)
On my test graph, this sped up some long paths by about 100 times: without the optimization, most paths take about 3-4ms but very long paths (>10 edges) can take up to 400ms. With the optimization, all queries take about 3-4ms.
Takeaways
Don’t blindly copy algorithms from the Internet, even from reputable websites. Study them, and implement them. Except crypto.
All the graphs in this post were pre-rendered to SVG using Graphviz, and then animated and made interactive using vanilla JS code that you can inspect by viewing the source code of this post on GitHub.
While writing this post, I discovered:
- that CSS nesting works even better than I thought it did (CSS has come a long way)
@container
queries that made my life easier for handling light/dark-mode- that .dot files can be parsed quite easily if you don’t care about doing things the Right Way
Thanks for reading.