At this point, you’ve seen techniques for solving interview questions that involve arrays, and you’ve expanded that by using techniques that use `Map`

data structures.

Similar to `Map`

data structures, `Set`

data structures provide functions that run more efficiently than if you were using an array, so they’re very handy for improving the efficiency of your algorithms.

`Set`

data structures cannot contain any duplicates. They provide efficient functions for adding and removing elements, and for checking whether an element is already present.

Here’s an example:

```
Set<String> animals = new HashSet<>();
animals.add("cats");
animals.add("dogs");
animals.add("cats"); // duplicate! (or should I say dupli-cat)
animals.add("lizards");
System.out.println(animals.size()); // prints 3
// prints "cats" "dogs" "lizards"
for(String animal : animals){
System.out.println(animal);
}
```

This code creates a `Set`

and then fills it with animals. The code tries to add `"cats"`

twice, but because sets cannot contain duplicates, it’s only added once.

`Set`

data structures have a few hand functions:

`add(value)`

`remove(value)`

`contains(value)`

Unlike arrays and `List`

data structures, `Set`

data structures **cannot** be accessed by index. But you can loop over the values using an iterator, or an enhanced for loop like in the example.

Java offers a few implementations of the `Set`

data structure:

`HashSet`

uses the`hashCode()`

function to internally store values. This is the most common implementation you’ll use.`LinkedHashSet`

is very similar to`HashSet`

, except it maintains the insertion order of its entries. In other words, if you iterate over the values, they’ll be in the same order as you added them to the set. (With`HashSet`

, the order is basically random.)`TreeSet`

maintains the*natural ordering*of its values. In other words, if you iterate over the values, they’ll be in alphabetical order for strings and numerical order for numbers. You can also specify a custom`Comparator`

to sort`Object`

values.

The power of `Set`

data structures come from the fact that `add(value)`

, `remove(value)`

, and `contains(value)`

are very efficient. For `HashSet`

and `LinkedHashSet`

, they’re constant time `O(1)`

, and for `TreeSet`

, they’re `O(log n)`

.

This makes `Set`

data structures, especially `HashSet`

, very handy for improving the efficiency of your algorithms.

Let’s start with an example question that we’ve seen a couple times already: can you take an array of integers, and whether the array contains any duplicates?

You might approach this with a brute force algorithm that uses nested for loops:

```
boolean containsDuplicate(int[] nums) {
for (int i = 0; i < nums.length; i++) {
for (int j = i + 1; j < nums.length; j++) {
if (nums[i] == nums[j]) {
return true;
}
}
}
return false;
}
```

This code iterates over every possible pair of elements, checks whether the pair contains two duplicate values, and returns true for the first match it finds. The nested for loop gives this an algorithmic complexity of `O(n ^ 2)`

.

Can you do better?

Rather than looking ahead in the array, you could look back at values you’ve already seen.

For example, for this array:

```
{3, 4, 1, 4, 2}
```

The algorithm that looks back goes something like this:

Index | Value | Seen So Far | Have we seen value? |
---|---|---|---|

`0` |
`3` |
`{}` |
No, keep iterating |

`1` |
`4` |
`{3}` |
No, keep iterating |

`2` |
`1` |
`{3, 4}` |
No, keep iterating |

`3` |
`4` |
`{3, 4, 1}` |
Yes, return true |

To implement that algorithm, you need a way to track which elements you’ve seen so far, along with their indexes.

You might use an `ArrayList`

:

```
boolean containsDuplicate(int[] nums) {
List<Integer> seenSoFar = new ArrayList<>();
for (int i = 0; i < nums.length; i++) {
if (seenSoFar.contains(nums[i])) {
return true;
}
seenSoFar.add(nums[i]);
}
return false;
}
```

At first glance, this looks like an improvement over the brute force solution. It’s only a single for loop, so that’s linear `O(n)`

algorithmic complexity, right?

However, the code relies on the `ArrayList#contains()`

function, both of which have a linear algorithmic complexity of `O(n)`

. And because the code calls this function inside a for loop, that means the algorithmic complexity is still `O(n ^ 2)`

.

Improving the efficiency of your solution often comes down to choosing a data structure that has properties that help you get rid of the inefficient parts of your solution.

You need an efficient way to check whether a value is in a collection. Luckily, `HashSet`

offers a `contains()`

function that runs in constant `O(1)`

time!

This is a single-line change:

```
boolean containsDuplicate(int[] nums) {
Set<Integer> seenSoFar = new HashSet<>();
for (int i = 0; i < nums.length; i++) {
if (seenSoFar.contains(nums[i])) {
return true;
}
seenSoFar.add(nums[i]);
}
return false;
}
```

The only change is that `seenSoFar`

is now a `HashSet`

. Although the change is small, this code now runs in linear `O(n)`

time, thanks to the `HashSet#contains()`

function running in constant `O(1)`

time!

Check out these links for more info:

- Set - JavaDocs

- Contains Duplicate
- Contains Duplicate II
- Intersection of Two Arrays
- Valid Anagram
- Longest Substring Without Repeating Characters
- Repeated DNA Sequences

I posted a new article about techniques for using Set data structures in technical interviews.

Check it out here:

There can be only one.

Happy Coding is a community of folks just like you learning about coding.

Do you have a comment or question? Post it here!

Comments are powered by the Happy Coding forum. This page has a corresponding forum post, and replies to that post show up as comments here. Click the button above to go to the forum to post a comment!