At this point, you’ve seen techniques for solving interview questions that involve arrays, and you’ve expanded that by using techniques that use Map
data structures.
Similar to Map
data structures, Set
data structures provide functions that run more efficiently than if you were using an array, so they’re very handy for improving the efficiency of your algorithms.
Set
data structures cannot contain any duplicates. They provide efficient functions for adding and removing elements, and for checking whether an element is already present.
Here’s an example:
Set<String> animals = new HashSet<>();
animals.add("cats");
animals.add("dogs");
animals.add("cats"); // duplicate! (or should I say dupli-cat)
animals.add("lizards");
System.out.println(animals.size()); // prints 3
// prints "cats" "dogs" "lizards"
for(String animal : animals){
System.out.println(animal);
}
This code creates a Set
and then fills it with animals. The code tries to add "cats"
twice, but because sets cannot contain duplicates, it’s only added once.
Set
data structures have a few hand functions:
add(value)
remove(value)
contains(value)
Unlike arrays and List
data structures, Set
data structures cannot be accessed by index. But you can loop over the values using an iterator, or an enhanced for loop like in the example.
Java offers a few implementations of the Set
data structure:
HashSet
uses the hashCode()
function to internally store values. This is the most common implementation you’ll use.LinkedHashSet
is very similar to HashSet
, except it maintains the insertion order of its entries. In other words, if you iterate over the values, they’ll be in the same order as you added them to the set. (With HashSet
, the order is basically random.)TreeSet
maintains the natural ordering of its values. In other words, if you iterate over the values, they’ll be in alphabetical order for strings and numerical order for numbers. You can also specify a custom Comparator
to sort Object
values.The power of Set
data structures come from the fact that add(value)
, remove(value)
, and contains(value)
are very efficient. For HashSet
and LinkedHashSet
, they’re constant time O(1)
, and for TreeSet
, they’re O(log n)
.
This makes Set
data structures, especially HashSet
, very handy for improving the efficiency of your algorithms.
Let’s start with an example question that we’ve seen a couple times already: can you take an array of integers, and whether the array contains any duplicates?
You might approach this with a brute force algorithm that uses nested for loops:
boolean containsDuplicate(int[] nums) {
for (int i = 0; i < nums.length; i++) {
for (int j = i + 1; j < nums.length; j++) {
if (nums[i] == nums[j]) {
return true;
}
}
}
return false;
}
This code iterates over every possible pair of elements, checks whether the pair contains two duplicate values, and returns true for the first match it finds. The nested for loop gives this an algorithmic complexity of O(n ^ 2)
.
Can you do better?
Rather than looking ahead in the array, you could look back at values you’ve already seen.
For example, for this array:
{3, 4, 1, 4, 2}
The algorithm that looks back goes something like this:
Index | Value | Seen So Far | Have we seen value? |
---|---|---|---|
0 |
3 |
{} |
No, keep iterating |
1 |
4 |
{3} |
No, keep iterating |
2 |
1 |
{3, 4} |
No, keep iterating |
3 |
4 |
{3, 4, 1} |
Yes, return true |
To implement that algorithm, you need a way to track which elements you’ve seen so far, along with their indexes.
You might use an ArrayList
:
boolean containsDuplicate(int[] nums) {
List<Integer> seenSoFar = new ArrayList<>();
for (int i = 0; i < nums.length; i++) {
if (seenSoFar.contains(nums[i])) {
return true;
}
seenSoFar.add(nums[i]);
}
return false;
}
At first glance, this looks like an improvement over the brute force solution. It’s only a single for loop, so that’s linear O(n)
algorithmic complexity, right?
However, the code relies on the ArrayList#contains()
function, both of which have a linear algorithmic complexity of O(n)
. And because the code calls this function inside a for loop, that means the algorithmic complexity is still O(n ^ 2)
.
Improving the efficiency of your solution often comes down to choosing a data structure that has properties that help you get rid of the inefficient parts of your solution.
You need an efficient way to check whether a value is in a collection. Luckily, HashSet
offers a contains()
function that runs in constant O(1)
time!
This is a single-line change:
boolean containsDuplicate(int[] nums) {
Set<Integer> seenSoFar = new HashSet<>();
for (int i = 0; i < nums.length; i++) {
if (seenSoFar.contains(nums[i])) {
return true;
}
seenSoFar.add(nums[i]);
}
return false;
}
The only change is that seenSoFar
is now a HashSet
. Although the change is small, this code now runs in linear O(n)
time, thanks to the HashSet#contains()
function running in constant O(1)
time!
Check out these links for more info:
I posted a new article about techniques for using Set data structures in technical interviews.
Check it out here:
There can be only one.
Happy Coding is a community of folks just like you learning about coding.
Do you have a comment or question? Post it here!
Comments are powered by the Happy Coding forum. This page has a corresponding forum post, and replies to that post show up as comments here. Click the button above to go to the forum to post a comment!