The Basics of A* Pathfinding – part 1

The A* algorithm allows you to find the shortest path between two points with (impassable) obstacles between them. The key word here is “shortest”. Let’s take a look at what that means:

How A* “sees” it

Imagine we have the following:

(Images created using Daniel Cook’s awesome tile sets as well as Wulax’s sweet character sheets)

More precisely, the character on the left needs to get to the chests on the right:

For the sake of argument, the trees and stone cliffs are impassable obstacles (the character cannot walk through them). Now, there are multiple paths the character can take in order to get to the chests:

A* gives the shortest of these paths (the one marked with blue in the above picture). It does so by first separating the area in discrete chunks, like little square cells, which are then interpreted as “passable” or “impassable”. We’ll see later on that we can also declare cells as more or less preferable for passage (like swamp vs. road), but we won’t get into that right now. So, the above image might “look” to the A* algorithm as something like this:

where black marks the areas through which the character cannot walk (impassable obstacles) while the white parts mark the areas the character can walk through (passable terrain).

As we said, A* separates everything into discrete cells (usually square), where some cells are passable terrain, other are impassable obstacles. Then, after doing its thing, A* will tell which of the cells containing passable terrain make up the shortest path between two points. Before detailing on all this, here’s a JavaScript live demo of this:

The A* demo app can be tried here.

Here’s a quick tutorial on how it works:

Select “impassable obstacle” from the left menu, then, with that selected, drag around the area with your mouse (holding down the left mouse button). This will mark cells as obstacles (impassable) and they’ll become black. You can also select “(completely) passable terrain” from the left menu and reset some black cells back to white (aka, terrain through which we can walk) by dragging with the left mouse button over them.

Once you’re satisfied with the placement of obstacles, select “path start” from the left and click somewhere on the area to place the start of the path. A red square with an “S” on it will be created at the clicked location (cell). Similarly, select “path end” and click where you want the path to end. Now that both a start cell and an end cell have been placed, you should see the shortest path between them marked with red. At this point, selecting either “path start” or “path end” and placing either of them somewhere else will show the new path. You can even drag around with your mouse while having either “path start” or “path end” selected to see how the path changes in real time, though this might now run too smoothly on some machines (mostly due to the fact that the code used here for drawing is not very optimized). You’ll also notice than when a path is shown in red, there’s also some cells with a dot (“.”) in them. Those are the cells that were considered (explored) during the calculation of the path, but were found to not belong to the shortest path.

A*’s constituents

It’s vital to note at this point, that what we’re presenting here is only a subset of the whole A* algorithm, more precisely, we’re only describing A* in a specific context (finding the shortest path between obstacles for a video game). One of the important characteristics of this context is the fact that we’re assuming that the nodes are rectangular cells of a certain size placed inside a grid. This simplifications allows us to use certain optimizations which normally wouldn’t be globally available to A*: for example, we can estimate the distance between two cells when we don’t know what obstacles are between them using the Pythagorean theorem, and we can be sure that this estimation will never be bigger than the length of the actual path (obstacles and all) that exists between those 2 cells. (the Pythagorean theorem gives us the shortest possible distance between 2 points when we know their xy coordinates) and because of this nice feature, once we process a cell, we can set it aside and be sure that we don’t need to re-evaluate it again.

Now, said cells allows for 8 directions of movement. From one cell, we can, at most (if there are no obstacles) go: up, down, left, right and on any of the four diagonals in between. We assign a distance (cost) for walking straight (not diagonally) from one cell to an orthogonally adjacent one, in this particular example that distance is 3. Using the Pythagorean theorem we can calculate the distance/cost of getting to any of the diagonally adjacent cells: it’s 3 multiplied by the square root of 2 ( 3*sqrt(2) ) which is about 4.24.

This same formula is used to calculate the distance between any two cells: if the coordinates of the first cell are x1, y1 and those of the 2nd cell are x2, y2, then the distance between those two cells (if the straight distance between to orthogonally neighboring cells is 3) will be:

distanceBetweenCells = sqrt( ( 3*(x1-x2) )^2 + ( 3*(y1-y2) )^2 )

So, one of the notions of A* is the distance between cells. The calculations shown above, using the Pythagorean theorem are but one way to calculate said distance. Other formulas can be used as well, as we’ll see later on.

Now, the next notions will all be described at the same time, because they are pretty tied up to each-others.

For a given cell, A* will analyze its neighbors to determine the best path between 2 points. It will start with the neighbors of the first cell (the one containing the start position). Each cell in this setup can have at most 8 neighbor cells. There can be less than 8 if the cell is placed at some extreme of the analyzed area (like a corner, of one of the side rows/columns) or if any of the neighbors contain impassable obstacles (in which case they are ignored for the path analysis).

So the A* algorithm advances by getting all the available neighbors of a cell, calculating certain metrics for each of them, and then using those metric to decide which amongst them is the next cell to be analyzed, at which point this cell’s neighbors are gathered, their metrics are calculated, and so on and so forth, until either one of the analyzed cells is the one containing the destination (at which point we’ve successfully obtained the shortest path) or we run out of cells to analyze (meaning that the destination cell is unreachable, i.e. : it may be completely surrounded by impassable terrain, etc.). So another two A* specific notions are those of neighbors of a given cell, and metrics calculated for each neighbor cell.

Lets detail a bit the metrics. This is where A* starts to become less intuitive. Not to worry, we’ll soon see how these metrics (and all the other notions here) are actually used by doing a step-by-step description of an actual path obtained using A* calculations. So once A* gathers the neighbors of a cell, what it does is: calculate the cost (distance walked) from the start cell to this neighbor cell, when going through the currently analyzed cell (aka, when going through the cell who’s neighbor’s metrics we’re calculating right now). This score is called “g” and, once calculated, it is attached to the current neighbor cell along with the cell through which we’ve passed to get to this neighbor cell. This score doesn’t have a meaning without the parent cell, because it may be possible to get a better such score if going through a different (parent) cell. Each time such a pair of “g” score and parent is obtained, we’re also checking a list of other possible cells to see if this neighbor cell doesn’t already have a better “g” score with a different parent, aka, if there’s not a shorter path from the start cell to it. This list is called the “open list” because it is open to adding/removing of cells during the execution of the A* algorithm.

Another metric that’s calculated is a “guesstimation” of the distance from the current (neighbor) cell to the destination cell. This score is obtained by simply applying whatever distance calculation we’ve decided to use to the current and destination cells (and ignoring any possible obstacles that may exist between them). This value is called the “h” score (for Heuristics). Once this has been calculated for all neighbors, it contributes to deciding which cell we’re gonna analyze next: the “g” and “h” scores are summed up, the value is assigned to what’s called the “f” score. Once all neighbors have their “f” scores calculated, the one with lowest such “f” score value will be the next analyzed cell.

We’ve mentioned the notion of “open list” above. This list contains cells for which the metrics (g,h,f scores and parent) have been calculated, but which have not yet been themselves used (analyzed). By this we mean that they haven’t yet went through the process of getting their neighbors, etc etc. Once the metrics for a cell are calculated, it’s the value of the “g” score that decides whether or not it will be added to the open list for future analysis: if the cell is not already present on the open list, we simply add it. If it is already present on the open list, we check the g score. If it’s lower than our current g score for that cell, we leave it there. If however our newly calculated g score is lower, we replace the cell in the open list. What this translates to is that the open list should only contain cells who’s parents make for the shortest distance between it and the start cell. That’s because, when (if) the destination cell is reached, the actual path to it is reconstituted by going backwards to its parent cell, and its parent’s parent cell and so on until we get to the start cell.

Once a cell has been analyzed (namely, its neighbors have been obtained, etc etc.) the cell is placed in what’s called the “closed list”. This is simply so that if we stumble upon this cell again, we know to skip it from analysis (as, once one analysis on it has been performed, doing another one will not yield new information).

To recap, the last lot of A* notions presented above are:

– open list: used to hold cells which still need to be analyzed, aka cells who can potentially be part of the shortest path
– closed list: cells who have already been analyzed (And that’s all! The closed list does NOT contain the path cells exclusively. The cell’s parent, and parent’s parent, etc. are used to actually obtain the shortest path).
– g score: distance from start cell to current cell, when going through its parent. Used to determine if this cell, with its current parent deserves to be added to the open list for further analysis or if it should simply be disregarded (as a shorter path from start to it already exists).
– h score: estimation of current’s cell distance (cost) to destination cell. Used in conjunction with the g score to decide what cell to analyze next, aka, what cell is most promising to be the next cell is the shortest path.
– f score: see h score above. The sum of g and h scores.
– parent cell. The g score refers to the cost of getting from start cell to current cell VIA A CERTAIN PATH, aka via a certain set of cells. This set of cells is denoted by the parent cell (which in turn has a parent cell as well and so on and so forth up to when the parent is the start cell).

A step-by-step example

The A* algorithms is not very complex. Some of the things it does are however (perhaps…) a little counter-intuitive. Rather than doing a whole theory analysis of it, we’ll just look at a step-by-step proceedings of how A* actually performs the iterations when calculating a simple path:

As we said, A* “splits” the area into discrete cells. For this example, the cells are square. This leads to inaccuracies into modeling the exact terrain (maybe there’s some round cornered walls there, etc.). The inaccuracies can be diminished by splitting the area into smaller and smaller cells, but this will also increase computation time for the path. The number of cells in which the area is split (and which are then used to approximate the shape of the terrain) must be chosen so as to have a balance between how closely it resembles what the player sees as obstacles, and the time it takes to calculate the path(s). For our example, the cells are perfect squares. Also, although the actual area is 20×20 cells, for the path in this particular example, the calculations will not go outside a 5×5 cells area.

Let’s take a closer look at that 5×5 cells area. We’re gonna assign numbers to the rows and columns of cells that constitute it, going from top to bottom and from left to right:

Now, let’s take a look at the core code of the A* JavaScript implementation. This is the “getPath” function of the “AStar” object, which is declared in the AStar.js file:

var openList = new NodeOpenList(area.getWidth(),area.getHeight());
var closedList = new NodeList(area.getWidth(),area.getHeight());

var g = distanceCalculator.getDistance(new Node(destinationX,destinationY),new Node(startX,startY));
var finish = new Node(destinationX,destinationY,g);

var startNode = new Node(startX,startY,0,distanceCalculator.getDistance(new Node(startX,startY),finish));

openList.add(startNode);

var currentNode = null;	

var arrivedAtDestination = false;
while(openList.size!=0) {
	currentNode = openList.getLowestFNode();
				
	if(finish.equals(currentNode)) {
		arrivedAtDestination = true;
		break;
	}
	
	openList.remove(currentNode.x,currentNode.y);
	closedList.add(currentNode);
	
	this.processSucessors(currentNode, openList, closedList, finish, area, somewhatPassableGAddendum);
}

The above contains the high level operation of A*:

When the calculation of the shortest path between two points is demanded (by calling the “getPath” function with the necessary arguments, start position, destination position, terrain description, etc) what happens is

– the open and closed lists are created. They are initially empty.
– the Node object for the start and destination positions are created. The start node is added to the open list
– a while loop starts. This loop advances the path calculations, by getting the next nodes and analyzing them.
– the node with the lowest “f” score is extracted (removed) from the open list (on the very first step, this is the start node, since it’s the only one in the open list)
– if the node has the coordinates of the destination location, then we’re done looping and the shortest path has been obtained
– if not, we process it (via the “processSuccessor” function, which we’ll look at in a minute) and then add it to the closed list, so we know not to process it again. This processing may add new nodes to the open list (if any are available)
– this goes on until we’re out of nodes in the open list, at which point we can declare that the destination is unreachable.

Now let’s see what processing a node actually entails:

AStar.prototype.processSucessors = function(currentNode, openList, closedList, finish, area, somewhatPassableGAddendum) {
	//Get all neighbours of analyzed node, that DO NOT contain impassable terrain:
	var neighbours = this.distanceCalculator.getNeighbours(currentNode, area);
	
	for(var i = 0; i < neighbours.length; i++) {
		var neighbour = neighbours[i];
		
		//if neighbour node is already IN THE CLOSED LIST, skip it:
        var closedNode = closedList.searchByPosition(neighbour.x, neighbour.y);
        if (closedNode != null) {
        	continue;
        }
		
		//(...)
		//calculate the cost of getting to the neighbour THROUGH the currently analyzed node:
		var newG = currentNode.g + this.distanceCalculator.getDistance(currentNode, neighbour);
		
		//(...)
		
		//if the neighbour is alerady IN THE OPEN LIST *AND* IT'S THERE WITH A LOWER G COST, skip it
		//as this means that this neighbour is already present in some other path (has some other parent) which is better than
		//what we're investigating rigth now:
		var openNode = openList.searchByPosition(neighbour.x, neighbour.y);
        if (openNode != null && openNode.g <= newG) {
        	continue;
        }

        //if we're here, that means that this neighbour represents a node worth investigating as a potential member of the best path
		
		//if this neighbour is present in the open list, but with a worse (higher) g score, then remove it from the opened list
		//this means that this neighbour has made it to the open list already, but with a parent which constitutes for a path which is
		//longer (costlier) than if it were to go through the currently analysed node (have it as parent)
        if (openNode != null /* implicit: && openNode.g <= newG */) {
            openList.remove(neighbour.x,neighbour.y);
        }
        
		/* 
		 * at this point we know that this neighbour is:
		 * - not on the closed list
		 * - is walkable (does not contain impassable terrain)
		 * - either
		 *    - not on the open list
		 *    - on the open list, but with a g cost higher than if it were to pass through the currently analysed node 
		 *    (aka: if we replace it's current parent with the currently analyzed node, it will make for a less costly (shorter) path
		 * 
		 * Set it's g and h scores, set the currently analyzed node as its parent, and add it to the opened list:		 * 
		 */        
        neighbour.g = newG;
        neighbour.h = this.distanceCalculator.getDistance(neighbour, finish);
        neighbour.setParent(currentNode);
        //(...)
        openList.add(neighbour);        
	}
};

Let’s now go through the calculation of the path, using some visual aids:

– Initialisation

At the beginning of the “getPath” function, before getting into the loop, we initialize our two empty lists. We then create a node for the start coordinates, and add it to the open list. So our two lists look like this:

open list:
[x:0 y:2 g:0.00 h:12.00 f:12.00 parent:n/a]
closed list:
-empty-

– Iteration 1

The first iteration of the “while” loop in the “getPath” function is pretty straight forward: there’s only one node in the open list (the start node), so it will be extracted as per the “get the node with the lowest f score from the open list” rule. It’s added to the closed list. We’ll create neighbor nodes from all of its surrounding cells. Metrics are calculated for each of these neighbors (g, h and f scores, they also get the current node as their parent):

At the end of this iteration, the two lists look like this:

OpenList:
[x:0 y:1 g:3.00 h:12.37 f:15.37 parent:(x:0 y:2)]
[x:0 y:3 g:3.00 h:12.37 f:15.37 parent:(x:0 y:2)]
[x:1 y:1 g:4.24 h:9.49 f:13.73 parent:(x:0 y:2)]
[x:1 y:2 g:3.00 h:9.00 f:12.00 parent:(x:0 y:2)]
[x:1 y:3 g:4.24 h:9.49 f:13.73 parent:(x:0 y:2)]

ClosedList:
[x:0 y:2 g:0.00 h:12.00 f:12.00 parent:n/a]

The open list contains all of the start node’s neighbors, and the start node itself is the only element in the closed list.

– Iteration 2

The node with the lowest f score currently in the open list is the one at x=1, y=2. We remove it from the open list and add it to the closed list.

We then proceed to get all of its neighbor cells and create nodes for each of them, then calculate their metrics. It’s important to note that:

– the neighbor cells on its right (those at (2, 1), (2, 2) and (2, 3)) are ignored because they contain impassable terrain. They can’t ever take part in the shortest path.
– other than that, all other surrounding cells are taken, nodes are created from them and the g score is calculated for each, ignoring whether any of these cells have already went through the process in any prior steps. For this particular case, all these cells HAVE in fact been analyzed in the prior iteration as well. However back then we looked as them from the perspective of the start node at (0, 2), while now we are re-analyzing them from the perspective of the current node at (1, 2) to see if going through this node doesn’t make us reach any of these cells faster.

Now, none of the neighbor nodes of the current node are good candidates for the shortest path (when getting to them through the current node) because either:

– they’re already on the closed list. This is the case of the neighbor node at (0, 2). It has already been analyzed in the previous iteration.
– they’re already on the open list with better g scores than those obtained when getting to them through the current node. This is the case for all the other remaining neighbor nodes. We’re gonna take a look at what this means for only on of them, the one at (1, 1).

The current node’s g score is 3. That means that that’s how costly it is to get to it from the start node. In order to obtain the g score for its neighbor at (1, 1) when going there through the current node, we calculate the distance from the current node to the (1, 1) node, and then add it to the current node’s g score, and this value is the g score for the (1, 1) node. It’s 3 + 3 = 6. We then check the open list to see if the (1, 1) node isn’t already there. We find it there (from the previous iteration). That (1, 1) node already present in the open list has a g score of 4.24, and if we check it’s parent, we can see that it’s the node at (0, 2) – the start node. This means that it’s easier (less costly) to reach the (1, 1) node via the (0, 2) node, not via the (1, 2) node and it’s this more optimal version of the node that we keep (leave as-is in the open list). The same reason is analogously applied to the other remaining neighbors, making it so none of them are (re) added to the open list.

At the end if this iteration the lists look like this:

OpenList:
[x:0 y:1 g:3.00 h:12.37 f:15.37 parent:(x:0 y:2)]
[x:0 y:3 g:3.00 h:12.37 f:15.37 parent:(x:0 y:2)]
[x:1 y:1 g:4.24 h:9.49 f:13.73 parent:(x:0 y:2)]
[x:1 y:3 g:4.24 h:9.49 f:13.73 parent:(x:0 y:2)]

ClosedList:
[x:0 y:2 g:0.00 h:12.00 f:12.00 parent:n/a]
[x:1 y:2 g:3.00 h:9.00 f:12.00 parent:(x:0 y:2)]

The only thing that’s changed from the previous iteration is that the (1, 2) node has been analyzed, removed from the open list and placed on the closed list.

– Iteration 3

This iteration is a mix of the 2 previous ones: we’ll stumble upon some never-before-explored nodes (and immediately add them to the open list) as well as re-investigate some that are already on the open list (and find none better then what was there already):

The lowest f nodes currently on the open list are (1, 1) and (1, 3) each with an f score of 13.73… . In this situation, we can chose any of them as the next node to investigate. The way the search for the first lowest f node is implemented, the (1, 1) node will be selected. We remove it from the open list and add it to the closed list.

We get all of its neighbors, save those 2 that have impassable terrain on them. Form the 7 remaining neighbors, the 3 ones at the top are all previously unexplored, so we calculate their metrics and add them to the open list. Out of the remaining neighbors, after calculating their current g score, we notice that they all have better counterparts (with lower g score) already present on the open list, so we ignore them (in their present incarnation) or they’re on the closed list and we ignore them from the start.

At the end of the iteration, the lists look like this:

OpenList:
[x:0 y:0 g:8.49 h:13.42 f:21.90 parent:(x:1 y:1)]
[x:0 y:1 g:3.00 h:12.37 f:15.37 parent:(x:0 y:2)]
[x:0 y:3 g:3.00 h:12.37 f:15.37 parent:(x:0 y:2)]
[x:1 y:0 g:7.24 h:10.82 f:18.06 parent:(x:1 y:1)]
[x:1 y:3 g:4.24 h:9.49 f:13.73 parent:(x:0 y:2)]
[x:2 y:0 g:8.49 h:8.49 f:16.97 parent:(x:1 y:1)]

ClosedList:
[x:0 y:2 g:0.00 h:12.00 f:12.00 parent:n/a]
[x:1 y:1 g:4.24 h:9.49 f:13.73 parent:(x:0 y:2)]
[x:1 y:2 g:3.00 h:9.00 f:12.00 parent:(x:0 y:2)]

– Iteration 4

This time there’s only one node for the lowest f score, the one at (1, 3) with an f score of 13.73… :

It’s removed from the open list and added to the closed list. After wards, the treatment of it’s neighbors is pretty similar to those of the node from the previous iteration so we won’t detail it.

At the end of the iteration, the lists look like this:

OpenList:
[x:0 y:0 g:8.49 h:13.42 f:21.90 parent:(x:1 y:1)]
[x:0 y:1 g:3.00 h:12.37 f:15.37 parent:(x:0 y:2)]
[x:0 y:3 g:3.00 h:12.37 f:15.37 parent:(x:0 y:2)]
[x:0 y:4 g:8.49 h:13.42 f:21.90 parent:(x:1 y:3)]
[x:1 y:0 g:7.24 h:10.82 f:18.06 parent:(x:1 y:1)]
[x:1 y:4 g:7.24 h:10.82 f:18.06 parent:(x:1 y:3)]
[x:2 y:0 g:8.49 h:8.49 f:16.97 parent:(x:1 y:1)]
ClosedList:
[x:0 y:2 g:0.00 h:12.00 f:12.00 parent:n/a]
[x:1 y:1 g:4.24 h:9.49 f:13.73 parent:(x:0 y:2)]
[x:1 y:2 g:3.00 h:9.00 f:12.00 parent:(x:0 y:2)]
[x:1 y:3 g:4.24 h:9.49 f:13.73 parent:(x:0 y:2)]

– Iteration 5

This iteration has as lowest f node the one at (0, 1). The node seems promising because getting to it from the start node is very cheap (the g cost is 3, it’s right under the start node). However all of its neighbors are proven undesirable after their metrics are calculated: either they’re on the closed list, or their g score are worse than previous incarnations of them. There’s is one unexplored node at (0, 0) which will be added to the open list. However, we’ll see from the next iterations that we’ll never get back to the (0, 0) node, as its large f score (mostly due to its large h score) makes it uninteresting. There are more promising nodes which don’t stray that far from the start node while at the same time keeping away from the destination node as well as this node does.

At the end of the iteration, the lists look like this:

OpenList:
[x:0 y:0 g:6.00 h:13.42 f:19.42 parent:(x:0 y:1)]
[x:0 y:3 g:3.00 h:12.37 f:15.37 parent:(x:0 y:2)]
[x:0 y:4 g:8.49 h:13.42 f:21.90 parent:(x:1 y:3)]
[x:1 y:0 g:7.24 h:10.82 f:18.06 parent:(x:1 y:1)]
[x:1 y:4 g:7.24 h:10.82 f:18.06 parent:(x:1 y:3)]
[x:2 y:0 g:8.49 h:8.49 f:16.97 parent:(x:1 y:1)]
ClosedList:
[x:0 y:1 g:3.00 h:12.37 f:15.37 parent:(x:0 y:2)]
[x:0 y:2 g:0.00 h:12.00 f:12.00 parent:n/a]
[x:1 y:1 g:4.24 h:9.49 f:13.73 parent:(x:0 y:2)]
[x:1 y:2 g:3.00 h:9.00 f:12.00 parent:(x:0 y:2)]
[x:1 y:3 g:4.24 h:9.49 f:13.73 parent:(x:0 y:2)]

– Iteration 6

The lowest f node for this iteration is (0, 3). It’s a mirrored version of the node treated in the previous iteration, with a similar outcome, so we won’t detail it any further:

At the end of the iteration, the lists look like this:

OpenList:
[x:0 y:0 g:6.00 h:13.42 f:19.42 parent:(x:0 y:1)]
[x:0 y:4 g:6.00 h:13.42 f:19.42 parent:(x:0 y:3)]
[x:1 y:0 g:7.24 h:10.82 f:18.06 parent:(x:1 y:1)]
[x:1 y:4 g:7.24 h:10.82 f:18.06 parent:(x:1 y:3)]
[x:2 y:0 g:8.49 h:8.49 f:16.97 parent:(x:1 y:1)]
ClosedList:
[x:0 y:1 g:3.00 h:12.37 f:15.37 parent:(x:0 y:2)]
[x:0 y:2 g:0.00 h:12.00 f:12.00 parent:n/a]
[x:1 y:1 g:4.24 h:9.49 f:13.73 parent:(x:0 y:2)]
[x:1 y:2 g:3.00 h:9.00 f:12.00 parent:(x:0 y:2)]
[x:1 y:3 g:4.24 h:9.49 f:13.73 parent:(x:0 y:2)]
[x:0 y:3 g:3.00 h:12.37 f:15.37 parent:(x:0 y:2)]

– Iteration 7

This iteration selects the node at (2, 0) as it’s the one with the lowest f score on the open list right now: f=16.97 :

At the end of this iteration, the lists look like this:

OpenList:
[x:0 y:0 g:6.00 h:13.42 f:19.42 parent:(x:0 y:1)]
[x:0 y:4 g:6.00 h:13.42 f:19.42 parent:(x:0 y:3)]
[x:1 y:0 g:7.24 h:10.82 f:18.06 parent:(x:1 y:1)]
[x:1 y:4 g:7.24 h:10.82 f:18.06 parent:(x:1 y:3)]
[x:3 y:0 g:11.49 h:6.71 f:18.19 parent:(x:2 y:0)]
[x:3 y:1 g:12.73 h:4.24 f:16.97 parent:(x:2 y:0)]
ClosedList:
[x:0 y:1 g:3.00 h:12.37 f:15.37 parent:(x:0 y:2)]
[x:0 y:2 g:0.00 h:12.00 f:12.00 parent:n/a]
[x:1 y:1 g:4.24 h:9.49 f:13.73 parent:(x:0 y:2)]
[x:1 y:2 g:3.00 h:9.00 f:12.00 parent:(x:0 y:2)]
[x:1 y:3 g:4.24 h:9.49 f:13.73 parent:(x:0 y:2)]
[x:0 y:3 g:3.00 h:12.37 f:15.37 parent:(x:0 y:2)]
[x:2 y:0 g:8.49 h:8.49 f:16.97 parent:(x:1 y:1)]

– Iteration 8

At the end of this iteration, the lists are:

ClosedList:
[x:0 y:1 g:3.00 h:12.37 f:15.37 parent:(x:0 y:2)]
[x:0 y:2 g:0.00 h:12.00 f:12.00 parent:n/a]
[x:0 y:3 g:3.00 h:12.37 f:15.37 parent:(x:0 y:2)]
[x:1 y:1 g:4.24 h:9.49 f:13.73 parent:(x:0 y:2)]
[x:1 y:2 g:3.00 h:9.00 f:12.00 parent:(x:0 y:2)]
[x:1 y:3 g:4.24 h:9.49 f:13.73 parent:(x:0 y:2)]
[x:2 y:0 g:8.49 h:8.49 f:16.97 parent:(x:1 y:1)]
[x:3 y:1 g:12.73 h:4.24 f:16.97 parent:(x:2 y:0)]

– Iteration 9

When the loop begins for this iteration, it will find the destination node (4, 2) in the open list. This signals the fact the the path-finding part of the run has ended and that the shortest path to the destination exists.

– Backtracking to get path

In order to actually construct the ordered sequence of nodes that represent the shortest path, going from start to end node, what we need to do is backtrack from the destination node, up to its parent and then THAT parent’s parent and so on. Doing this will eventually make us stumble on to the start node (which has a null parent) at which point we stop. The path is then the reverse order of the above described backtrack:

Note that the end node from where the backtracking begins is the destination node as it has been added to the open list during our iterations.

Considerations on the implementations (it rhymes!)

The 3 A* implementations presented here: using JavaScript, Java and C, have as main goal to clearly illustrate how to implement the A* algorithms in various languages. They are not made to be used in professional path-finding libraries/game engines. What this means is that, in order to keep the code clear and easy to follow, its performance (optimization) is less than great. While the implementations are perfectly usable for scenarios involving relatively small areas and/or units, they will quickly bog down for large enough simulations (They’d work fine for some turn based game where only one unit moves at a time, or some real time game with just a handful of units like a 6 members party in an RPG, but will not work fast for something like Starcraft).

Here are the main areas in the code that suffer from this:

– The fact that we’re using separate object instances (structs in C’s case) for each cell of the area (the Node objects). This can be somewhat optimized by replacing them with a 2-dimensional (or even pseudo-2-dimensional stored as 1-dimensional) array having a “column” for each of the fields “g”, “h”, “f”, “parent”, etc. and a “line” for each node instance. Arguably, addressing array elements is cheaper than calling methods/functions on object instances.

– The way in which the OpenNodeList retrieves the Node with the lowest “f” score. Here’s a snippet:

/**
 * @private
 */
NodeOpenList.prototype.calculateLowestFNode = function() {
	if(this.nodePositionHash.length == 0) {
		return null;
	}
	
	var lowestFNode = null; 
	for(var x=0; x<this.nodePositionHash.length; x++) {
		var column = this.nodePositionHash[x];
		if(column!=null) {
			for(var y=0; y<column.length; y++) {
				var n = column[y];
				if(n!=null) {
					if(lowestFNode==null) {
						lowestFNode=n;
					} else {
						if(n.getF()<lowestFNode.getF()) {
							lowestFNode = n;
						}
					}
				}
			}
		}
	}
	return lowestFNode;
};

Yep, sadly all that does is simply iterate through all the nodes, and then return the one with the lowest “f” score. It’s used in conjunction with a a bit of caching (once calculated, a lowest “f” score node will be kept in a global field until the list is modified, at which point it will be recalculated), but still, all in all, it’s not the best way to do this. A better way would be to use a binary heap as described here.

– Also, for calculating the g score for the neighbors of a given cell: normally there’s only two possible values for this: one for the neighbors immediately to the left, right, up or down of the current cell. For these, the distance is the number we’ve chosen as the distance between 2 non-diagonal cells, namely 3. For the other 4 diagonal neighbors, Pythagorean theorem tells us that the distance is 3*sqrt(2) which is roughly 4.24. In order to underline that this is how these distances are calculated, the code still does the actual math each time to calculate these (using the DistanceCalculator object). This can be optimized by using 2 constants for the two possible values.

– The code uses doubles (Double-precision floating-point format) for the decimal values. This can be replaced with floats for a boost in performance, keeping in mind that for some scenarios this may also lead to not finding the shortest path between 2 points, but only a path.

– Even for other (between non-adjacent cells) distance calculations, the Pythagorean theorem can be replaced with some other means of approximating distance, which are less resource demanding (the Pythagorean theorem uses power and square root to do its thing) like the Manhattan Distance. Again, for certain scenarios this may lead to not getting the shortest possible path (still, if it exists, a path will be found). The code for distance calculations is abstracted in the “DistanceCalculator” object/class, which is passed as argument when computing a path so as to easily allow for new implementations of the distance calculation formula.

Finally, something the A* algorithm doesn’t do well at all is calculating the path to the closest reachable point of an unreachable destination. In order to compensate for this, when a path is calculated, we keep track of the node which is closest to the destination node. If that destination node proves to be unreachable, we return the path to the closest node. This was done as per the answer to this question.

The Code Examples

The A* algorithm described here has been implemented in 3 different languages (platforms): JavaScript, Java and C. The implementations are equivalent (with small differences accounting for each platform’s various idiosyncrasies). Each implementation only uses the default libraries that come with each respective platform (no extra 3rd party libraries are added).

The JavaScript code can be git-cloned from here:

https://gitlab.com/shivansite/jsAStar.git

It uses prototype-based objects. The actual A* algorithm code can be found in the “WebContent/js/astar” folder. The other folders contain the code for the small web example.

The Java code can be git-cloned from here:

https://gitlab.com/shivansite/jAStar.git

It uses class-based objects, and a few of the Java Lists for certain variables (which are declared as arrays in the JS and C implementations). Other than that, the structure of the code is similar to the other implementations.

The C code can be git-cloned from here:

https://gitlab.com/shivansite/cAStar.git

For C, the code is organized as follows: for each of the previously declared classes/prototypes (Node, Area, etc.) a C-style struct has been declared. The corresponding functions are still present here as well, the only difference is that instead of having them declared on their respective objects, they accept a pointer to the corresponding struct as argument. Otherwise, the overall organization of the code is similar to the one in the other implementations.

12 Responses to The Basics of A Pathfinding – part 1*

Pingback: The Basics of A* pathfinding – part 2 | Shivan's Blog
rockmarc says:

January 2, 2014 at 23:46

Very interesting post. Thanks. However A* doesn’t find always the shortest path, it finds an estimation of the shortest path (with low computation). Dijkstra always finds the shortest path because it computes every path possible (at a high computation cost).

- shivand says:
  
  January 3, 2014 at 23:02
  
  Thanks for the comment. I may be mistaken, but I was sure A* does indeed find the shortest path, meaning that for a given cost of stepping from one node (cell) to the other, A* will find the lowest cost path between two given cells. Is there some special context in which A* finds a path, but not the shortest one?
  
  - rockmarc says:
    
    January 4, 2014 at 02:21
    
    I have to find my examples again but basically it happens because A* stops when he found a path from start to goal (whether the path is the shortest or not) whereas Dijkstra stops when it compared every path possible and returns the shortest.
    For A*, it also depends if your way to compute your heuristic is good or not.
    
    I know A* had some trouble in a Constrained Delaunay Triangulation (so not a grid) in which the triangles don’t have the same size or shape and are constrained by the obstacles. Maybe it’s more efficient in a grid with tiles of the same size.
rockmarc says:

January 4, 2014 at 02:23

Ah yeah, for example this could happen if your heuristic is wrong:
http://i792.photobucket.com/albums/yy204/axaviar/Example_zpsfebe4af0.jpg?t=1379902085

- shivand says:
  
  January 4, 2014 at 12:02
  
  Totally agree with you. However, since I only discuss A* in the limited context of games path-finding, where the area is split in square cells and the Heuristic is done using Pythagoras, this will not happen. I think the rule is: if your heuristic estimation gives a distance shorter than any actual (real) path, then you’re good. If the heuristics can sometimes estimate a distance to be longer than the actual path to it is, then you have to redo that heuristic (aka, you have no closed list, you need to also check the h score (as you do the g score) each time you reach a cell with an already calculated h score).
  
VBA4all says:

August 18, 2014 at 10:08

Thanks for this article it’s really good and helpful!

One question though:

– how did you arrive at h = 9.49 for the cell (1,1)? What formula/method did you use to calculate the heuristics?

- VBA4all says:
  
  August 18, 2014 at 10:11
  
  Oh hold on, did you use :
  
  =SQRT((1-4)^2+(1-2)^2)*3
  
  = 9.4868….
  
  - VBA4all says:
    
    August 18, 2014 at 10:15
    
    Since you were rounding other results to 2 decimal places I thought the (1,1) should hav ebeen 9.49 but it’s so minor nearly unnoticeable ;)
shivand says:

August 19, 2014 at 09:21

The code doesn’t do any extra rounding. Also the precision is the default Javascript number precision.

However, in the screenshots for this post I did do a manual rounding of the values so that they’d all fit in the tiny cell shown in the image. It’s just a manual 2-decimals, half up rounding done for the sake of the screenshots.

- VBA4all says:
  
  August 19, 2014 at 09:41
  
  =SQRT((1-4)^2+(1-2)^2)*3 = 9.4868
  
  rounding up to 2 decimal places should be 9.49 but the screen shot says 9.48 so no offence but your manual rounding is off.
  
  Anyway, great article I really enjoyed reading it :)
  
  - shivand says:
    
    August 19, 2014 at 09:49
    
    I’m sorry, but I can’t find the place where the 9.48 is. I’ve re-looked at the screen-shots, everywhere the (1, 1) cell has a h score shown, its value is 9.49. Fair warning, I haven’t drank any coffee yet, so it might be because of that that I’m missing it :)