Sets


Sets


September 10, 2023

tutorial interviewing

At this point, you’ve seen techniques for solving interview questions that involve arrays, and you’ve expanded that by using techniques that use Map data structures.

Similar to Map data structures, Set data structures provide functions that run more efficiently than if you were using an array, so they’re very handy for improving the efficiency of your algorithms.

Set

Set data structures cannot contain any duplicates. They provide efficient functions for adding and removing elements, and for checking whether an element is already present.

Here’s an example:

Set<String> animals = new HashSet<>();

animals.add("cats");
animals.add("dogs");
animals.add("cats"); // duplicate! (or should I say dupli-cat)
animals.add("lizards");

System.out.println(animals.size()); // prints 3

// prints "cats" "dogs" "lizards"
for(String animal : animals){
  System.out.println(animal);
}

This code creates a Set and then fills it with animals. The code tries to add "cats" twice, but because sets cannot contain duplicates, it’s only added once.

Set data structures have a few hand functions:

  • add(value)
  • remove(value)
  • contains(value)

Unlike arrays and List data structures, Set data structures cannot be accessed by index. But you can loop over the values using an iterator, or an enhanced for loop like in the example.

Java offers a few implementations of the Set data structure:

  • HashSet uses the hashCode() function to internally store values. This is the most common implementation you’ll use.
  • LinkedHashSet is very similar to HashSet, except it maintains the insertion order of its entries. In other words, if you iterate over the values, they’ll be in the same order as you added them to the set. (With HashSet, the order is basically random.)
  • TreeSet maintains the natural ordering of its values. In other words, if you iterate over the values, they’ll be in alphabetical order for strings and numerical order for numbers. You can also specify a custom Comparator to sort Object values.

Algorithmic Complexity

The power of Set data structures come from the fact that add(value), remove(value), and contains(value) are very efficient. For HashSet and LinkedHashSet, they’re constant time O(1), and for TreeSet, they’re O(log n).

This makes Set data structures, especially HashSet, very handy for improving the efficiency of your algorithms.

Example: Contains Duplicates

Let’s start with an example question that we’ve seen a couple times already: can you take an array of integers, and whether the array contains any duplicates?

You might approach this with a brute force algorithm that uses nested for loops:

boolean containsDuplicate(int[] nums) {

  for (int i = 0; i < nums.length; i++) {
    for (int j = i + 1; j < nums.length; j++) {
      if (nums[i] == nums[j]) {
        return true;
      }
    }
  }

  return false;
}

This code iterates over every possible pair of elements, checks whether the pair contains two duplicate values, and returns true for the first match it finds. The nested for loop gives this an algorithmic complexity of O(n ^ 2).

Can you do better?

Rather than looking ahead in the array, you could look back at values you’ve already seen.

For example, for this array:

{3, 4, 1, 4, 2}

The algorithm that looks back goes something like this:

Index Value Seen So Far Have we seen value?
0 3 {} No, keep iterating
1 4 {3} No, keep iterating
2 1 {3, 4} No, keep iterating
3 4 {3, 4, 1} Yes, return true

Beware Hidden Complexity

To implement that algorithm, you need a way to track which elements you’ve seen so far, along with their indexes.

You might use an ArrayList:

boolean containsDuplicate(int[] nums) {

  List<Integer> seenSoFar = new ArrayList<>();

  for (int i = 0; i < nums.length; i++) {
    if (seenSoFar.contains(nums[i])) {
      return true;
    }

    seenSoFar.add(nums[i]);
  }

  return false;
}

At first glance, this looks like an improvement over the brute force solution. It’s only a single for loop, so that’s linear O(n) algorithmic complexity, right?

However, the code relies on the ArrayList#contains() function, both of which have a linear algorithmic complexity of O(n). And because the code calls this function inside a for loop, that means the algorithmic complexity is still O(n ^ 2).

HashSet to the Rescue

Improving the efficiency of your solution often comes down to choosing a data structure that has properties that help you get rid of the inefficient parts of your solution.

You need an efficient way to check whether a value is in a collection. Luckily, HashSet offers a contains() function that runs in constant O(1) time!

This is a single-line change:

boolean containsDuplicate(int[] nums) {

  Set<Integer> seenSoFar = new HashSet<>();

  for (int i = 0; i < nums.length; i++) {
    if (seenSoFar.contains(nums[i])) {
      return true;
    }

    seenSoFar.add(nums[i]);
  }

  return false;
}

The only change is that seenSoFar is now a HashSet. Although the change is small, this code now runs in linear O(n) time, thanks to the HashSet#contains() function running in constant O(1) time!

More Info

Check out these links for more info:

  • Set - JavaDocs

Practice Questions


Comments

Happy Coding is a community of folks just like you learning about coding.
Do you have a comment or question? Post it here!

Comments are powered by the Happy Coding forum. This page has a corresponding forum post, and replies to that post show up as comments here. Click the button above to go to the forum to post a comment!