Weighted union and find operations in disjoint subsets data structure

Dikshant Dwivedi
5 min readMar 24, 2021

For the implementation of Kruskal’s algorithm, a minimum spanning tree is built adding one edge at a time. Along the way, we need to keep track of the connected components and also detect cycles. For achieving this, it is essential to understand disjoint set data structure and its find-union operation.

Disjoint subset data structure visual representation

On my YouTube channel, I have thoroughly explained what a disjoint subset data structure is, how it is represented visually and using array and also briefed through weighted-union and find operation. If the readers wish to have a thorough understanding, I encourage them to go through my video embedded at the end of the article. In this article, I will try to provide a more detailed implementation of weighted-union and find operation. I will also analyze the time complexity of each of these algorithms.

Let’s start by briefly explaining what a union operation is: In mathematics, union between a collection of sets is a set having all the elements of those sets.

Similar is the case in disjoint data structure. Union operation in disjoint data structure is known as weight union. The only question we need to answer is this:

When performing union in disjoint set data structure, which element will be the head/representative/parent node of the resulting disjoint set?

In the above example, suppose 2 is the head of subset A, and 6 is the head of subset B. Upon performing union both 2 and 6 are going to be present in the resultant subset, then out of the two which element will the head now? Here’s the answer:

When performing union in disjoint set data structure, the head of the disjoint set with more weight out of the two will become the final head of the result disjoint set.

By doing that final height of the tree is shorter helping in efficient implementation. So, for our example, since subset A has more weight than subset B, the head of the resultant set will be the head of subset A, here 2. This is the final set here:

The crown on the top of the elements refer to the head of a particular subset.

Let’s see how the union of these two subsets looks like visually:

Visual representation of union operation between two disjoint subsets

The two trees in the left represent subsets A and B which the right most tree represents the subset obtained after performing the union. When the union is performed, both the trees are merged. But since there can only be one head of the subset, the head node of tree with more weight, here node 2 in subset A becomes the head node of the final tree while the head node of the tree with lesser weight, here node 6 in subset B becomes the child node of the new head. Finally let’s look at array representation of this operation.

Array representation of union operation.

Since element 2 is now the parent node with weight 7, we have -7 at index 2. And since element 6 is now the child of element 2 along with elements 3,4 and 5, we have value 2 in the array. The interesting thing to note here is that for node 7 and 8, the value in array is still only positive meaning that the value is not representing the head node. What this represents is this:

Elements 7 and 8 have element 6 as parent, while element 6 has element 2 as a parent which is also its head, so the head of elements 7 and 8 is also element 2, but not the immediate parent.

From here, we have learnt that value in array for the non head element is of the immediate parent and not the head and just now what we did for 7 and 8 is search its head, also known as the find operation.

Find operation in a disjoint set data structure is used to find the representative or the head element of the subset. Let’s look at how we can perform the find operation for element 7.

For node 7, the value at index 7 is 6 representing the parent element 6 because it positive. In the tree representation too, it can be seen that node 7 traces to the parent node 6.

So, now we jump to index 6 where the value is 2, representing the parent element 2.

Then again at index 2, the value is negative representing the head. So, the final answer is 2.

Moving forward, we will now look at the algorithm of find operation. Basically, the algorithm involves finding the parent until the parent is itself the head of the subset.

If the height of the tree is very high, this find algorithm can have a high complexity. In the worst case, one might have to visit all the elements of the tree to find the head element which is not ideal.

Worst case: for finding the head of element 7 in the above tree, all the nodes need to be visited.

A better and ideal algorithm will be the collapsing find operation. for trees with skewed heights like in the above case, the algorithm only goes through the worst case scenario only once and optimize by complexity by collapsing the nodes to single head like this:

This algorithm can significantly improve the complexity of algorithm. The main idea of collapsing find is: if the immediate parent of an element isn't the head of the subset, then find the subset and collapse the element by setting its immediate parent to the head of the subset.

Finally, let’s look at the weighted union algorithm with collapsing find operation

For a better understanding of disjoint subset data structure, check out this video:

--

--