Kruskal's algorithm for generating perfect mazes

Kruskal’s algorithm is a greedy algorithm used to find a minimum spanning tree (MST) in a connected, weighted, and undirected graph. In the context of maze generation, it is used to create a structure where each cell is connected to others without cycles or unreachable regions. The result is a so-called “perfect maze” in which there is only one path from any point to any other.

Problem statement:

It is necessary to develop an algorithm for generating a labyrinth that satisfies the following conditions:

The labyrinth does not contain cycles;
Each area of the maze is accessible from any other;
There is exactly one path between every pair of points.

How the algorithm works:

Any algorithm for constructing an MST boils down to checking whether adding an edge creates a cycle. To do this, the Union-Find data structure is used . This algorithm allows you to efficiently track which vertices belong to the same set and merge them without creating cycles.

Kruskal’s algorithm is performed as follows:

All possible connections between cells of the maze are listed and given random weights.
All edges are sorted by weight in non-decreasing order.
Construction of spanning tree:
- The edge with the smallest weight is selected;
- It is checked whether it connects two different connected components (using Union-Find);
- If an edge does not create a cycle, it is added to the MST.
Iteration: The steps continue until (V - 1) edges have been added, where V is the number of vertices in the graph.

Step-by-step analysis of the example:

How does this algorithm work visually? Let’s assume we have a weighted, connected, undirected graph consisting of 6 vertices. Let’s write its edges in sorted order:

1 <—> 3 weight 1

3 <—> 4 weight 1

3 <—> 6 weight 2

1 <—> 2 weight 3

5 <—> 6 weight 4

1 <—> 6 weight 5

2 <—> 3 weight 5

4 <—> 5 weight 8

4 <—> 6 weight 9

Now we add edges to our tree one by one, of course avoiding the formation of cycles.

We add vertices further, we can notice that, for example, we cannot add an edge between vertices 1 and 6, since then a cycle will be formed. Then, after the second bypass, we get the following result:

As a result, we have the following subgraph. We have connected all the vertices with edges with the minimum possible weights, which means our minimum spanning tree is ready! :)

Then the total weight of the desired MST will be the sum of the weights of all the selected edges, namely 11

Implementation:

1. Basic representation of a labyrinth.

To implement Kruskal on such a problem, one must take into account the features of the matrix representation. When using Kruskal’s algorithm to generate a maze, one usually works with a base grid of cells, where each cell is a potential “room”, and between them there are walls. The original matrix is formed in such a way that each cell of the base grid is located at coordinates 2 × rows_cells + 1, 2 × cols_cells + 1. Between them there are potential passages, and the remaining positions are filled with walls.

2. Construction of a spanning tree.

A spanning tree is a subgraph that connects all nodes of a graph without cycles and with a minimum number of edges. Kruskal’s algorithm starts by forming a list of all edges that connect adjacent cells of the base grid. Each edge is assigned a random weight, and the list is then sorted. Union-Find is a data structure for efficiently tracking unions and membership of elements in different sets. It sequentially merges cells belonging to different connectivity components, and an edge is added only if it does not create a cycle.

3. Transforming the base grid into the final maze.

After constructing the spanning tree, a final maze matrix of size:

result_rows = 2⋅rows_cells − 1, result_cols = 2⋅cols_cells − 1.

In this case:

At positions 2 × rows_cells, 2 × cols_cells the base grid cells are copied.
Between the cells (at positions with odd and even indices) values are set that reflect the presence of passages.
A special case - a single-cell maze - is corrected by removing the extra wall.

4. Adjusting sizes.

Given an input of, say, 4 × 4, the basic logic results in a 3 × 3 maze. To ensure that the resulting maze meets the requirements, helper functions are used:

add_extra_row: Adds a row filled with walls to the end of the matrix, and opens passages in every second column.
add_extra_col: similarly adds a column, adjusting the positions to preserve the cell and wall layout rule.

Advantages and disadvantages

Advantages

Generating a “perfect” maze:

Fact: Kruskal’s algorithm constructs a spanning tree of a given graph. In the context of a maze, the vertices are cells and the edges are potential passages.

Proof: A spanning tree, by definition, does not contain cycles and connects all vertices of the graph. This means that there is exactly one unique path between any two cells. Thus, the generated labyrinth has no redundant passages, and its structure is unambiguous (there are no alternative routes), which is often called a “perfect” labyrinth.

Randomness and uniformity of distribution of passes:

Fact: When generating a maze, Kruskal assigns random weights to the edges, making the generated algorithms visually beautiful and uniform.

Proof: Since all possible edges are sorted by random weights, the probability of choosing any particular edge is the same. Thus, the resulting spanning tree (and hence the maze) is an equally likely choice from all possible spanning trees of the graph. This mathematically guarantees that the maze will be random and unpredictable, which is a desirable property for games and puzzles.

Disadvantages of Kruskal’s algorithm

Limited control over the structural characteristics of the labyrinth:

Fact: Kruskal’s algorithm generates an equiprobable spanning tree, which can sometimes be a disadvantage.

Proof: Mathematically, if you consider all possible spanning trees on a given graph, each is generated with equal probability. This means that there is no easy way to bias the distribution toward mazes with long corridors, many dead ends, or other desirable features without changing the basic algorithm. Some applications require precisely this kind of control over the “nature” of the maze, and an equally probable distribution of spanning trees may not be ideal.

Now we will compare with two more algorithms suitable for this task: Recursive Backtracker and Hunt and Kill

2. Time complexity

Kruskal’s algorithm is worse than Recursive Backtracker and Hunt and Kill in terms of speed, since sorting edges gives an additional logarithmic factor.

Kruskal: O(E Log⁡E), where E is the number of edges (approximately O(RC log⁡(RC)) for an R×C maze). Edge sorting dominates the running time.
Recursive Backtracker: O(RC) - visits all cells in one pass.
Hunt and Kill: O(RC) - also in one go.

3. Memory usage

Kruskal isn’t as memory efficient as Hunt and Kill , but is roughly equal to Recursive Backtracker at worst.

Kruskal: stores the edge list and the array for Union-Find, which requires O(RC) memory.
Recursive Backtracker: uses a recursion call stack, O(RC) in the worst case but less on average.
Hunt and Kill: O(1), as it works without a stack and additional structures.

4. Complexity of implementation

Recursive Backtracker is the easiest to understand and implement, Hunt and Kill is a bit more complex, and Kruskal is the most complex.

Kraskal: requires edge handling, sorting, Union-Find structure. Harder to implement.
Recursive Backtracker: simpler since it is a DFS with recursion.
Hunt and Kill: also quite simple, but less intuitive.