Happy Coding has a new (-ish) homepage! See it here: HappyCoding.io
I’ve wanted to redesign the homepage for a while now. The last site redesign was back in 2022, and it mostly focused on the directory structure of tutorials, along with the introduction of left and right navigation bars.
I pretty openly hate the idea of worrying about SEO or “engaging” “users”, but web design involves a weird cultural communication phenomenon, where people expect to see certain patterns in the sites they visit. If those patterns aren’t present, people tend to dismiss the site regardless of the content.
To see what I mean, take a look at CERN’s first website from 2001:
(Thank you Wayback Machine!)
What are your initial thoughts when you see this page? If you landed here from a search engine, would you click around more, or would you hit the back button? Why?
If you’re like most people, you’d assume that whoever made this website is an unprofessional amateur, and that the content on any subsequent pages isn’t worth your time. You’d likely hit the back button and keep looking for something that looks more “official”.
The funny thing is, CERN is super official, even back on this early webpage. It’s where the internet was invented, for Tim’s sake!
But compare that to CERN’s current homepage:
Something about this feels more official. This is a real website. It’s got a big hero image, and you know you have to scroll down to get to a bunch of stylized sections that explain how important the page you’re looking at is. It’s even got a cookie notification!
Joking aside, I admit that this effect is real, and I’m not immune to it. As much as I wish don’t judge a book by its cover was true, and as much as I love what some folks are doing with the old web aesthetic (see Max Bittker and Everest Pipkin for examples), the fact is that following modern design patterns communicates something to the people looking at your website.
All of that has been bouncing around in my brain for a long time now, but I’ve been busy teaching and moving and procrastinating because honestly updating the homepage to be more “modern” sounded pretty boring.
But as I slowly emerged from my cave over the past few weeks, I started tinkering with the homepage. And once I start tinkering with something, I can’t really stop until it’s finished. (Sorry Genuary, I’ve got a homepage to redesign!)
Here’s what the homepage looked like before I made any changes:
At the risk of getting defensive in my own blog post, I don’t think this homepage is bad, but I don’t think it communicates what I want it to.
I had a rough plan that involved some combination of the following:
The first thing I tried was getting rid of the left nav and increasing the width of each section:
That felt like a move in the right direction, but I definitely wasn’t done yet.
Next I tried adding backgrounds to each individual section and spent way too much time playing with CSS gradients. I also added a couple new sections to the homepage, and I went back and forth between making each section fill the height, which ended up being a single line of CSS:
height: 1vh;
I played with a fancy scroll library called Locomotive Scroll, but that didn’t work on my phone, so I switched over to AOS. That worked, and it was surprisingly easy to use, but I’m still not sure if I love the effect.
I also used Discorse’s embedding feature to show recent forum posts on the homepage, and I decided to hard-code a few Etsy links instead of randomly generating them.
After all of that, the homepage looked like this:
I had pretty mixed feelings about this. It was maybe getting closer to feeling like a “real” webpage. But it didn’t feel quite right, and more importantly it didn’t feel like me. That’s a hard feeling to put into words, but it was pretty obvious to me that I was trying to copy patterns instead of doing my own thing.
Since the beginning, Happy Coding has used subtle background images (see the first screenshot above for an example). I generated the images myself using Processing, but I intentionally made them subtle so they weren’t too distracting.
Next, I tried switching all of this up, and I added a colorful background to the page itself, rather than on each individual section:
You can’t see this in the screenshot, but the background is actually animated. It’s a full-screen p5.js sketch positioned behind everything else. I really liked the effect, and I thought about showing a different random background animation whenever the page was loaded.
I debated with myself whether that would be too distracting, especially on long tutorial pages with lots of text, and whether it would work on every device and browser. I’ve tried pretty hard to keep Happy Coding’s footprint small, and having a couple million pixels animating in the background of every page felt like a violation of that principle.
In the end, I decided on a compromise: I created a few colorful animations, took screenshots, and used those screenshots as background images. This way the background is still interesting, but not distracting or CPU-intensive. I’m also going to invite other folks to contribute their own backgrounds- this was already a “feature” of Happy Coding, but it’ll be fun to revisit.
To help the page sections stand out from the background, I also increased their borders. I laughed a little at myself when I settled on a thickness of 5px. This is my tiny rebellion, because at my day job margins are always multiples of 4px, so choosing 5px felt like a petty way to prove I was doing this for myself. 🤘
You can see a lot of these changes on every page on Happy Coding- including this blog post! But here’s what the homepage looked like in the end:
I’m pretty happy with the end result. It’s not really what I originally pictured, but it feels more authentically mine than if I forced myself to conform to a pattern just for the sake of conformity. I know I’m overthinking all of this, but it’s mine to overthink.
I could keep tinkering with single-pixel differences, comparing screenshots and asking myself whether I’m making things better or worse with each change. But I’m at the “good enough, ship it” stage, so I’m shipping it!
I’ll probably spend some time playing with more backgrounds. I’d also love for other folks to contribute their own backgrounds, so I’m going to rewrite the guide for that as well. (Editor’s note: See How to Contribute a Background!)
I’d love to hear any feedback y’all have. Do the new backgrounds look okay, or are they too distracting? Do you notice any weirdness on any particular OS or browser? What would make Happy Coding feel more like a “real” website?
]]>Happy Coding shows a random background image on every page. Those images are generated with code, and I’d love if other people contributed their own!
This guide walks through the process of creating and contributing your own background image.
Using your coding language of choice, write some code that generates a cool pattern.
I use Processing or p5.js. Processing tends to be much faster, which can be helpful if your animation takes time to fill the screen.
If you want some inspiration, check out the tutorials and examples on Happy Coding, or scroll to the bottom of any page to find a link to the source code for its background.
Tips
draw()
function in a for
loopIn p5.js, you can right-click the canvas and save your canvas as an image. In Processing, you can call the save()
or saveFrame()
functions to create an image file.
I’ve been creating image files with a resolution of 1920x1080.
By default, backgrounds will be stretched to fill the screen. Their aspect ratios will be preserved (so they won’t look distorted), but they might be cut off (so users with small screens might only see the top-left corner). Try resizing your browser window and watching how the background resizes to see what I mean.
I’m open to other ideas. If you want to use a different size, or if you want your background to repeat instead of stretch, let me know!
After you’re happy with your background image, submit it to Happy Coding!
There are two main ways to do that:
Your submission should include a few things:
If you’re curious about how the random background images work, check out the backgrounds.js file.
That file contain an array of objects. Each object specifies an image file (stored in the backgrounds directory) and a link to the code that generated the file.
Then on page load, the code picks a random object, and changes the background of the page to that object’s image file. It also adds a link to the footer of the page.
If you made it this far, I’d love to see what kinds of background images you come up with!
]]>Genuary is an event that provides a different prompt every day through January, and creative coders make generative art based on the prompt each day. This year, Genuary 6th’s prompt was “In the style of Vera Molnár”.
Vera Molnár was an early digital artist, and she created really interesting artwork back before creative coding was really a thing!
This sketch creates a design similar to some of her artwork, by randomly drawing squares in a grid.
Click here to edit this code in the p5.js editor.
Happy new year! As has become tradition, this post reflects on the past year, and looks forward to the next one. For previous years, see also 2018, 2019, 2020, 2021, 2022, and 2023.
I feel like each year is defined by a couple big events that set the tone for the rest of the year, and this year’s big event was on January 20th, when Google laid off 12,000 people.
I was not “directly affected”, but I know many people who were. I was not under any false pretenses about my business relationship with Google, but the layoffs made it clear that the company values the opinions of billionaire shareholders and following the big tech herd than it cares about its employees.
After the initial layoffs in January, Google has continued “quiet layoffs” that contribute to a constant paranoia: am I next? This feeling really set the tone for the rest of 2023, which I believe was the whole point of the layoffs. Many of my decisions and general attitude came from this feeling of uncertainty and helplessness, which was… not great for my mental health.
I don’t even disagree with the argument that Google has too many employees. A system is what it incentivizes, and for most of my time at Google, the only thing the company incentivized was growth without a real goal or plan. This isn’t unique to Google, as big tech in general has a more more more approach to capitalism.
But if you had asked me back in 2017 what I would do if I was suddenly in charge of Google, I would have said that we should freeze hiring, stop launching new features, and instead base all internal incentives on improving the quality of existing products.
January 2023: Google at night
I know that’s Monday morning quarterbacking, and there are a million reasons why I shouldn’t be in charge of Google. But the reason this bothers me so much is because nobody in a position of power has been held accountable for the system of incentives they created. Jokes about taking full responsibility aside, it would have gone a really long way if a decision maker stood up and said “I was the one who advocated for growth, and that was a mistake, and as a result I’m stepping down so somebody else can set the new direction”.
That didn’t happen. Instead, the only people affected by the mismanagement from the people at the top have been the people at the bottom. I’m so tired of reading emails from billionaires explaining that “macroeconomic headwinds” are forcing them to destroy the mental health of an entire industry, from the employees they unceremoniously fire over email, to the students struggling to find roles after they graduate. This is a choice, being made by humans. I am very aware of how privileged a problem this is to complain about, but the arbitrariness of it all has been extremely frustrating, and more than a little nerve-wracking this past year.
In last year’s new year post, I mentioned that I was thinking about switching teams within Google. That wasn’t an idle idea: two weeks after the new year, I was on the verge of moving to a very cool team.
When you think of Google employees, you probably imagine a bunch of software engineers writing code. And Google certainly has a ton of software engineers. But Google also employs many other roles: bus drivers, cafeteria workers, recruiters, administrative business partners, etc. The team I was moving to helped “non technical” Google employees learn how to code and land software engineering roles within the company. This is as close to a dream job I’m going to find at Google, so I was pretty excited about it.
Then the layoffs happened. (I told you they set the tone for the rest of the year.)
That put my transfer on hold, and I had to plead with management to let it go through. Four months later they agreed, and I started on Google Developer Academy in April. The work was rewarding albeit hectic, and it felt like the culmination of all the “20%” stuff I had done in previous years. I met a lot of really interesting people, and it felt good to be doing something that helped other folks navigate the big tech industry.
May 2023: lil bee
But the layoffs continued to cast a shadow. Working on an internal mobility team when there wasn’t any internal mobility felt like working against Google instead of working with it. Towards the end of the year we were hoping to make a pivot but still maintain the human side of our work, but in the end the team was dissolved, and I was moved to an internal machine learning education team.
That brings me to my next major theme of 2023: machine learning pervading the tech industry. Over the past couple years, tech companies have become obsessed with large language models like ChatGPT. Pretty much every big tech company is spending hundreds of millions of dollars to build their own LLMs and launch new ML features and products.
I have, to put it mildly, some very mixed feelings about this. On one hand, the development of LLMs is genuinely fascinating, especially the questions it raises about how language works. I believe that information wants to be free, and I have to admit that tools like ChatGPT can make certain tasks much easier. But from the education side of things, I also believe that companies that are leaning into improving “worker productivity” with ML are in for a bit of a shock when they realize newer employees don’t really understand the systems they’re building. (Although I also recognize that previous generations could have said the same thing about modern languages, frameworks, and tools, not to mention Wikipedia and Stack Overflow.)
On the other hand, this is all built on stolen work (including content stolen from Happy Coding), and it raises a ton of ethical questions around misinformation, the value of human labor and creativity, the integrity of the internet, and how it’s an especially horrifying time to be a young girl. And despite big tech’s promises of democratizing data, the inevitable enshittification (my favorite new word of 2023) of these platforms will serve only to extract wealth from the masses, and move it into billionaire shareholder accounts.
I’m also bothered by something that I’m not sure how to put into words: who exactly is consenting to all of this? Should I have to give my consent for a multibillion dollar company to train its model on content I create in my spare time? Should workers have to consent for their employers to replace them with ML? I’m not afraid that ML is going to take my job from me, but I am afraid that some middle manager will think ML can take my job from me. I understand that technology is a driver of cultural change, but it feels wrong to me that these decisions are being made by a shockingly small number of the very privileged elite, who aren’t spending any time asking whether this is good for people as a whole.
August 2023: Sculpture in front of Google’s newest building on campus
Right now, the whole subject feels pretty Emperor’s New Clothes-y to me, as giant tech companies declare that machine learning is the future, and a weird technological FOMO prevents anyone from going against that narrative.
Google as a company is betting that machine learning will unlock the next money printer, now that Google search quality is getting worse. Nobody has suggested the alternative of actually improving Google search quality instead.
So now Google has gone from years of incentivizing growth, where every individual inside the company needed to show evidence that they helped grow the company - even if it didn’t make any sense for their role - to incentivizing ML, where every individual inside the company must show evidence that they’re working on “something ML related” - even if it doesn’t make any sense for their role.
And from the company’s perspective, maybe that’s the right play. If everyone works towards some theoretical ML-driven future, and if just 1% of them stumble upon the next money printer, then the company as a whole comes out on top. But what happens to the vast majority, the 99% who obeyed the top-down edicts and worked on ML that ends up going nowhere? Will there be yet another round of layoffs, or will the tech industry start holding its “leaders” accountable?
I don’t know the answers to those questions, but I think they’ll be a big part of my 2024.
On a happier note, I bought a house and moved to Oregon!
The process was long and stressful, because the house hunt began just as Google restricted remote work and tried to force everyone to return to the office. I spent most of 2023 searching for a home in Oregon, without knowing whether I’d be fired when I moved. More of that leftover stress caused by the layoffs.
But after months of uncertainty and pleading with the company, I was finally officially approved to work remotely, and in August I moved to Eugene, Oregon!
June 2023: Welcome to Eugene, Oregon
I love it here. I love my 100-year-old grandma house, even though many weekends in 2024 will involve fighting the invasive bamboo in the backyard. And I love this weird little Pacific Northwest artsy hippy town. I’m looking forward to exploring more of Eugene and Oregon in 2024.
I also continued teaching at Millersville in 2023. Previously, my classes were mostly asynchronous, meaning I recorded videos ahead of time, and students watched them on their own instead of attending class. But in 2023 my classes were fully synchronous, which meant I got a ton of practice giving more typical lectures. This was still over Zoom, but it felt like the logical next step towards teaching in person, so I’m glad I took it.
I taught a class on technical interviewing during the fall semester. I was pretty excited about this, because I figured this was a great way to help students from my hometown find their paths into big tech. But it was a brand new class, so I needed to create new content (written tutorials, presentations, lectures, assignments, class discussions, projects, etc) for it. I had planned to move in early summer, and then spend the rest of summer preparing all of that class material. Instead, I spent my entire summer making random trips up to Eugene for house hunting, and I didn’t move until the end of August- during the first week of class!
That meant I spent my fall semester working full time, writing brand new material for a class I hadn’t taught before, giving lectures three times a week, grading, unpacking, and spending every waking moment obsessing over the big tech job market and being paranoid for myself, my coworkers, and my students. For months at a time, I didn’t have a single day where I didn’t have too much to do, or where I wasn’t thinking about the perils of big tech. I don’t want to complain, but it became pretty obvious that this was too much for my mental health.
I’m taking the spring 2024 semester off from teaching. Only having one fulltime job for a change will be a very welcome relief. I’m planning on using that time to think more about what teaching looks like for me going forward.
I’ve said before that I want to be Daniel Shiffman when I grow up, and that’s only partially a joke. I love how he combines teaching, creativity, and building new things to form a community that feels joyous and authentic. My main goal with both Happy Coding and teaching is to work towards my own version of that. I imagine teaching a class or two each semester, using that as motivation to create new content, releasing that content as written tutorials and YouTube videos, and building a community around folks following along.
And logistically, that’s pretty much what I’ve been doing. This year, I published 54 new articles on Happy Coding. To put that in perspective, if I physically printed them out, they’d take up roughly 200 pages. All but nine of those articles were for school. So I think the model of “teach a class, create new content, and post it online” works.
November 2023: Welcome to the Pacific Northwest
But I admit that something is missing. As much as I want to believe if I build it, they will come, and as much as I hate the tech industry’s obsession with shallow engagement metrics, if I’m being honest with myself I recognize that I can’t just keep putting stuff out there and hoping for the best.
I’m not really on social media anymore, so I haven’t done a great job of sharing the articles and projects I post on Happy Coding anywhere else. I did try sharing my technical interviewing lessons on LinkedIn, but LinkedIn’s feed of non-stop obnoxious ML clickbait was pretty soul-crushing. I also tried sharing on Mastodon, but that felt like yelling into the void. I’m a member of a few Discord servers, but I don’t want to spam my stuff there. I admit this is probably a me thing, and that I need to spend more time on building the Happy Coding community. I’m very open to suggestions!
I spent most of 2023 working, teaching, and moving, which included a bunch of weekend trips between California and Eugene. I also spent a weekend helping my friend move from California to Colorado. White-knuckling a giant U-Haul truck with shoddy steering through the mountains on a 20-hour overnight trip was type two fun for sure, but I’m really glad I did it.
At the end of the year, I also made a trip back to the east coast. It was nice to catch up with everyone and show my partner around my hometown, but I’m feeling pretty drained from all the socializing (not to mention eating nothing but pizza and french fries for the past three weeks).
I didn’t do much in terms of personal projects this year, other than these few:
That’s not nothing, but it’s much less than I’ve done in previous years. I also didn’t write any blog posts in 2023, other than last year’s new year post! I’m definitely hoping to spend more time on my own stuff in 2024.
That brings us to this year. I’m not into new year resolutions, but I do like to think about my goals and plans for the year.
July 2023: Passing through Utah at dawn
My main goal in 2024 is to get a better understanding of what it would look like to teach in person. Eugene has at least two colleges, and a bunch of community education programs. (I’m personally planning on taking a bee keeping class this year!) I want to figure out whether it’s feasible for me to teach here, and what I’d need to do to make that happen.
As part of that, I also want to figure out what I’m doing at Google. I’m starting on a new team (tomorrow!), and although I don’t love how it happened, it is a chance to try something different. I have very mixed feelings about ML and how it intersects with ethics and capitalism, but I’m hoping that the fuzzy space of “teaching ML inside Google” has some interesting corners. I don’t know if that will spill out into my personal stuff, or how permanent it is. But I’m going to give it a chance, while also staying open to other possibilities. I also wouldn’t be shocked if I get laid off- working on internal education does not feel very safe right now. So maybe all of this will be moot in a month! 🥲
I mentioned above that I’d like to spend more time on “just for fun” projects and blog posts, and I’d like to get better at working towards building the Happy Coding community. I’ve also been daydreaming about leaning more into the art side of things. The idea of selling my stuff at the weekly art fair in Eugene is pretty exciting.
June 2023: one more bee, as a treat
But the meta-goal encompassing all of the above is to figure out how it all fits together. Should I do something like go down to 80% time at Google and teach more in person? Or should I lean more into teaching virtually and spend more time on production quality? Should I try to get a job with a mission I believe in? Or should I just work at Google by day, collect my paycheck, and then do little art stuff on the side? Some combination of all of the above?
Aside from all of that, I’m also really looking forward to spending some time on my house. I barely had time to unpack since moving in August, so it’ll be nice to do some nesting, tame the garden, make this place feel like home, and meet some friends here.
Last year, I said that one of my main goals was to be more intentional about connecting with other humans. I’ll give myself half credit for that one: I did a lot more human-ing in 2023 than in previous years, but I also didn’t reach out to people as much as I should have.
With that in mind, I’d love to hear from you. How was your 2023? What are you looking forward to in 2024?
]]>I taught this class in fall 2023. It was loosely inspired by Stanford CS 9.
During this class, students learned about various data structures and algorithms, and practiced using them in technical interviewing questions.
Each week, the class focused on a different concept. The class met Monday, Wednesday, and Friday, where Mondays and Wednesdays were lectures, and Fridays were generally group discussions or activities. Weekly assignments included Leetcode problems and peer interviews.
Instead of a final exam, I met with every student individually for a mock interview, which I graded based on a rubric that I shared ahead of time.
Your interviews might include a linked list, but they'll definitely include a for loop.
During this week, we revisited algorithmic complexity and talked about the process for applying to jobs.
This was a short week, so I gave an impromptu lecture on machine learning!
During this week, we had an informal chat about entering the job market, and then we talked about continuing interview practice after the class ends.
]]>System design is the process of creating a high-level end-to-end plan for an entire product or feature. It might involve a little coding, but it’s not as focused on the details like a Leetcode problem would be. Instead, system design interviews test your ability to break down a large project into individual steps, and your knowledge of how different technologies interact.
We’ve talked about driving the conversation during technical interviews, and that becomes even more important in system design interviews!
System design questions are often framed around what I think of as the ever-growing problem where the interviewer continually introduces more scale to the question. A system design conversation might go something like this:
Interviewer: How would you represent the data for a social media site?
Candidate: I might create a
User
class, and a user can follow other users, so maybe theUser
class contains aList
of otherUser
instances to represent following. Then I might create aPost
class which could contain text and images, and a timestamp. TheUser
class could also contain aList
ofPost
instances. Then to create somebody’s feed, I could sort the posts of every user they follow and show the most recent posts.Interviewer: What if you had so many users and posts that they wouldn’t fit in memory?
Candidate: In that case, I would store the users and posts in a database. Instead of storing everything in in-memory lists, I would implement logic that queried that database to return the data I needed to fulfill a single request.
Interviewer: what if you had so many users and posts that they wouldn’t fit onto a single hard drive?
Candidate: That means I’d have to shard my database. I might start by sharding by username, so users with usernames that start with A would be in one database, and users with usernames that start with B would be in another, etc. I could do the same thing for posts. If I did that, I’d need another layer of logic that routed queries for a particular username to the correct database shard.
Your interviewer will always have a follow-up question that complicates your initial design. This might sound frustrating, but the rest of this tutorial talks about strategies you can use to drive the conversation.
Similar to how we’ve talked about how important it is to ask questions during technical interviews, the most important part of a system design interview is to establish the requirements of the problem.
Try to drive the conversation and offer your own ideas instead of asking questions. Consider the difference between these two conversations:
Interviewer: How would you design a pizza delivery service?
Candidate: What are the requirements for this pizza delivery service?
Interviewer: How would you design a pizza delivery service?
Candidate: First, I’d start out by defining a set of requirements. Obviously, the most important functionality would be to allow users to order pizzas. Users should be able to provide their addresses, as well as their orders. Orders would include a size like small, medium, or large, as well as a list of toppings like pineapples, onions, and green peppers. Our service would also need to route these orders to kitchens and drivers. Can you think of anything I’m missing?
This is a bit contrived, but hopefully it shows the difference between asking for requirements and coming up with requirements.
Also notice that this phase of the conversation does not include any technical details. The goal is to establish requirements, not to dive immediately into the implementation or even the representation. Make sure you understand all of the requirements before you start designing your system!
Design patterns represent different ways of breaking down a system into related parts.
To be honest, I have mixed feelings about design patterns: I think it is important to understand that you can approach every problem in a number of different ways, but I also think that “thinking in design patterns” tends to be too prescriptive in the real world.
That being said, I personally experienced an “ah-ha” moment when I finally mapped the MVC (model-view-controller) design pattern to my understanding of how real-world projects are organized. In college, MVC was mostly a theory, but I didn’t “get it” until I had worked in quite a few systems myself.
MVC breaks the system down into three pieces:
I learned about MVC in college, and I remember not really understanding the point of the whole thing. Fast forward a few years, after I had worked on a few systems in the real world, and I finally understood the goal of the MVC design pattern.
I realized that it makes more sense to me if I think about it as the MCV design pattern:
In this understanding, the user interacts with the view to accomplish some goal. The view then interacts with the controller to translate that user goal to a technical process or request. The controller then interacts with the model to modify or request specific data. Finally, the view renders the data from the model so the user can take their next action.
MVC (or MCV) isn’t the only design pattern out there, but I have found it especially useful to think about system design interviews in terms of MVC.
After you’ve established the requirements of your system, start thinking about how you’d store your data. This is probably a database with a table for each object you need to store.
You don’t need to draw out a full diagram, but list out the tables you’d need, and the fields they’d contain.
After you describe your database, a common follow-up question asks what you’d do if your database contained too much data to hold on a single hard drive. The general answer to this is to shard your database so that data is stored on multiple machines.
You can vertically shard your database by splitting your data into separate tables, and storing each table in its own database. In the pizza delivery example, you might have a Customer
table and an Order
table. You could also split your tables further: for example, you could store addresses directly in the Customer
table, or you could split that data into its own Address
table.
Alternatively, you can horizontally shard your database by splitting your data into rows, and storing different rows in particular databases. For example, you might store customers with names starting with A
in one database, with B
in another, and so on.
You can also geographically shard your data by storing data for particular regions in their own databases. This has the benefit of being able to physically locate your databases and servers closer to the users who will request that data.
No matter what your sharding approach, you’ll then need another layer that takes a request and routes it to the correct database shard. This doesn’t have to be very involved, but you should mention that you’ll need it.
After you talk through your database design, move on to design the business logic of your system.
This will often take the form of a REST API, where a server exposes endpoints that interact with your database.
Talk about what endpoints you need, based on the ways users can interact with your data. Acronyms like CRUDL (create, read, update, delete, list) can help remind you of the typical actions you should be thinking about.
The outcome of this phase of the design discussion is likely a drawing of your services, and the endpoints in each service.
Finally, now that you’ve fleshed out your database and server layers, you can talk about the user interface.
Is this a website? A mobile app? A desktop application? Does every user see the same thing, or do you need different UIs for different users?
Draw out a few mocks for the main views in your UI, and map them to the endpoints of your server. Talk about how the users interact with the UI, which interacts with the server, which interacts with the database.
Your mocks don’t have to be perfect! The goal is to communicate your overall design. You can always come back and talk about specific implementation details later.
Similar to database sharding, you can use load balancing to handle requests (either to your server, or to your UI if it’s web-based) that become too complicated for a single computer.
In other words, if your system needs to scale to handle many requests, you might add multiple servers that all talk to the same database. Then you’d add another layer (the load balancer) that routes requests to the least-busy server.
As you’re talking through your requirements and your system design, think about your failure modes. What functionality in your system is essential? What could you disable if you needed to?
For example, let’s say my pizza delivery system contains a service that autocompletes addresses, so new users can more easily enter their delivery addresses. That feature is nice to have, but if suddenly the whole system receives a ton of traffic, it might make sense to disable the address autocomplete feature to free up resources for more important services, like ordering pizzas.
The majority of a system design interview is about talking through a high-level approach to solving a large problem, like creating an entire product or feature. The conversation stays pretty general.
However, if you’re familiar with specific tech stacks, you can volunteer specific solutions to the above problems. If you know about a tool that helps with database sharding, then you should absolutely bring that up. Bonus points if the tech stack is what the company actually uses! (Which you can figure out by asking questions earlier in the interview process.)
You might use a stack like LAMP, or MEAN, or LYME. If you don’t know what those are, that’s okay! Focus on the parts of the stack that you do know- what would you actually use if your job was to implement the overall feature?
System design interviews are more about communicating an overall plan, and the “deliverable” is probably a series of diagrams that help you explain how the different pieces of your system fit together.
But you should be prepared to drill down into any part of the system, and write some code that accomplishes a specific task. For example, you might be asked to implement an example load balancer, or you might be asked to write the code for your UI.
You can help yourself by designing your system in ways that you’d be able to implement. You can also tell your interviewer which pieces you’d like to spend more time on, to highlight the parts of the stack that you’re more familiar with.
System design interviews can seem scary, because they require talking about pretty much every part of software engineering. But it’s impossible to be an expert in everything.
The good news is, you don’t have to be an expert in everything! System design interviews start at a very high level, and you can “hand wave” your way through a lot of it. You don’t need to be an expert in database sharding or load balancing or API implementation or UI design. But you do have to know that each of those pieces exist, and how they fit together.
The overall goal of a system design interview is to communicate the fact that you understand how a large system works together, and the tradeoffs that you might make at each level.
This guide introduces one more data structure: matrices, or two-dimensional arrays. We already talked about arrays, but matrices are so common in interview questions that they deserve their own guide.
Matrix questions broadly fall into two categories: questions involving iterating over or moving the elements of the matrix in a particular order, and questions treating the matrix as a search space.
Matrices are generally represented as a two-dimensional array, which I think of as a table:
0 |
1 |
2 |
3 |
|
0 |
A | B | C | D |
1 |
E | F | G | H |
2 |
I | J | K | L |
3 |
M | N | O | P |
In this case, the table has four rows and four columns. You can reference each cell by its row number and column number. For example, the coordinate 1,2
references the cell at row 1
and column 2
, which in this table is G
.
That might feel counter-intuitive, because most people are accustomed to thinking in terms of x,y
coordinates. If that’s how you think of coordinates, then you’d expect 1,2
in the above table to be J
instead of G
.
But think about it in terms of a 2D array:
String[][] matrix = {
{"A", "B", "C", "D"},
{"E", "F", "G", "H"},
{"I", "J", "K", "L"},
{"M", "N", "O", "P"},
};
// Prints "G"
System.out.println(matrix[1][2]);
If you run this code, you’ll see that matrix[1][2]
points to G
.
This is because in Java, a 2D array is really an array that contains arrays. That means you can do this:
String[][] matrix = {
{"A", "B", "C", "D"},
{"E", "F", "G", "H"},
{"I", "J", "K", "L"},
{"M", "N", "O", "P"},
};
String[] row = matrix[1]; // {"E", "F", "G", "G"}
String cell = row[2]; // "G"
// Prints "G"
System.out.println(cell);
This code creates a 2D array matrix
, which is itself an array that contains four arrays. It then asks for the array at index 1
and stores that array in the row
variable. Finally, it asks the row
array for the element at index 2
, which is G
.
In other words, the first dimension of the 2D array represents the row, and the second dimension represents the column.
Keeping this in mind is trickier than you might expect!
A common type of interview question involves iterating over the elements in a matrix in a particular order, or moving the elements in a particular way.
For example, Spiral Matrix asks you to take a matrix and then return the values generated by iterating over the matrix in a spiral path.
{1, 2, 3, 6, 9, 8, 7, 4, 5}
You can do this with a series of for loops, but you have to think very carefully about off-by-one errors and different matrix sizes.
To help you think about a matrix problem, try starting with a small matrix.
Work through a couple examples until you notice a pattern, and then you can work towards solving the problem with arbitrary matrices.
Here’s the code I came up with:
public List<Integer> spiralOrder(int[][] matrix) {
int spiralHeight = matrix.length;
int spiralWidth = matrix[0].length;
int totalCells = spiralWidth * spiralHeight;
// The spiral moves in by 1 layer at a time.
int layer = 0;
List<Integer> path = new ArrayList<>();
// Loop until you visited all the cells.
while(path.size() < totalCells) {
path.addAll(spiralOneLayer(matrix, layer, spiralWidth, spiralHeight));
layer++;
// After each layer, the spiral decreases by 2 units in each dimension.
spiralHeight -= 2;
spiralWidth -= 2;
}
return path;
}
List<Integer> spiralOneLayer(int[][] matrix, int layer, int width, int height) {
List<Integer> path = new ArrayList<>();
// Calculate the boundaries of this spiral layer.
// topRow and leftColumn are just layer, but separating them makes the code more readable.
int topRow = layer;
int bottomRow = layer + height - 1;
int leftColumn = layer;
int rightColumn = layer + width - 1;
// Move from the top-left corner to the top-right corner.
for(int c = leftColumn; c <= rightColumn; c++){
path.add(matrix[layer][c]);
}
// Move from the top-right corner to the bottom-right corner.
for(int r = topRow + 1; r <= bottomRow; r++) {
path.add(matrix[r][rightColumn]);
}
// If this spiral layer is only a single row or column, then the spiral is complete.
if(width == 1 || height == 1) {
return path;
}
// Move from the bottom-right corner to the bottom-left corner.
for(int c = rightColumn - 1; c >= layer; c--) {
path.add(matrix[bottomRow][c]);
}
// Move from the bottom-left corner to the top-left corner.
for(int r = bottomRow - 1; r > layer; r--){
path.add(matrix[r][layer]);
}
return path;
}
This code uses a helper function to calculate one “layer” of the spiral, and then calls that helper function with decreasing sizes to calculate the whole spiral.
This is the code I came up with, but there are other ways to organize this code. How would you do it?
Matrices also lend themselves to another type of question, where a matrix is treated as a search space. These questions are very similar to graph problems, and algorithms like depth-first search, breadth-first search, and Dijkstra’s are still handy for solving them.
In fact, you can think of a matrix as a graph of interconnected nodes!
For example, take this matrix:
42 | 23 | 37 |
16 | 98 | 56 |
27 | 48 | 52 |
Conceptually, you can think of it as a graph, where each cell is a node with a value or a cost:
This graph contains nodes, where each node is connected to its neighbors. If you’re currently looking at a certain node, you can traverse to its neighbors. In a graph, you do that by navigating through the connected nodes. With a matrix, you traverse to neighboring cells.
For example, Flood Fill uses a matrix to represent a grid of colors. It asks you to start at a certain cell, and change all of the connected cells with that cell’s color to a different color.
This lends itself pretty naturally to depth-first search or breadth-first search. You can start at the original cell, and then traverse to each of its four neighbors. Whenever you encounter a neighbor with the original color, you change its color and then traverse to all of its neighbors.
Remember that depth-first search can be implemented with recursion or with a stack, and breadth-first search can be implemented with a queue.
Similar to graphs, with matrices you have to be careful that you don’t revisit cells you’ve already visited. Matrices have another gotcha: you shouldn’t traverse to cells that are outside the bounds of the matrix!
Here’s a solution to the Flood Fill problem:
// Helper class that represents a cell in the matrix.
class Cell {
// The Cell's row, i.e. its y coordinate.
int r;
// The cells column, i.e. its x coordinate.
int c;
Cell(int r, int c) {
this.r = r;
this.c = c;
}
// Equivalent Cell instances are added to a HashSet, so we need to override equals.
@Override
public boolean equals(Object other) {
Cell otherCell = (Cell) other;
return this.r == otherCell.r && this.c == otherCell.c;
}
// Equivalent Cell instances are added to a HashSet, so we need to override hashCode.
@Override
public int hashCode() {
return Objects.hash(r, c);
}
}
public int[][] floodFill(int[][] image, int sr, int sc, int color) {
// Track visited Cells to avoid traversing backwards.
Set<Cell> visited = new HashSet<>();
// Only fill Cells that match the original color.
int originalColor = image[sr][sc];
// Track the Cells that still need to be filled.
Deque<Cell> cellsToFill = new ArrayDeque<>();
// Add the starting Cell to the list of Cells to fill.
cellsToFill.addLast(new Cell(sr, sc));
while(!cellsToFill.isEmpty()) {
// Treat the Deque as a Queue to implement BFS.
Cell cell = cellsToFill.removeFirst();
// Don't traverse to already-visted Cells.
if(visited.contains(cell)){
continue;
}
// Don't traverse to Cells outside the matrix.
if(cell.r < 0 || cell.c < 0 || cell.r >= image.length || cell.c >= image[0].length) {
continue;
}
// Stop at Cells that don't contain the original color.
int cellColor = image[cell.r][cell.c];
if(cellColor != originalColor) {
continue;
}
// Set the current Cell's color.
image[cell.r][cell.c] = color;
// Traverse to all of this Cell's neighbors.
cellsToFill.addLast(new Cell(cell.r - 1, cell.c));
cellsToFill.addLast(new Cell(cell.r + 1, cell.c));
cellsToFill.addLast(new Cell(cell.r, cell.c - 1));
cellsToFill.addLast(new Cell(cell.r, cell.c + 1));
visited.add(cell);
}
return image;
}
This code uses a helper Cell
class to implement breadth-first search to fill all of the matching connected cells with a new value.
Remember that dynamic programming is a technique where you store the results of calculations so you can reuse them instead of recalculating them over and over again.
This technique is handy for recursive algorithms, but it’s also handy for many matrix problems!
For example, Unique Paths II gives you a matrix that represents an environment that contains obstacles. A robot starts in the upper-left corner, and it can move to the right or down each turn to reach a goal in the lower-right corner.
You might consider using depth-first search or breadth-first search to calculate the number of paths. This animation shows a depth-first search that finds all of the possible paths from the upper-left corner to the lower-right corner:
This approach could work, but notice how many times you need to visit each cell. You end up counting the paths from a single cell multiple times! This algorithm incurs an algorithmic complexity of O(2 ^ n)
where n
is the number of cells. This works, but it’s not very efficient.
Instead, you can take advantage of the fact that there are two ways to get to a cell: from its left neighbor, or from its top neighbor. So the total number of paths to a cell is the total paths to its left neighbor, plus the total paths to its top neighbor. Think of a few example grids to see what I mean:
1 |
In a single-cell grid, there is only one way to get from the upper-left corner to the lower-right corner, because they’re already the same cell.
In a 2x2 grid, there are two paths. You can either move right and then down, or you can move down and then right.
Each cell in this table shows the number of paths leading to it:
1 | 1 |
1 | 2 |
Once you know the paths through a 2x2 grid, you can expand that to calculate the paths for a 3x3 grid:
1 | 1 | 1 |
1 | 2 | 3 |
1 | 3 | 6 |
To calculate the number of paths to a new cell, you can add the number of paths to its top neighbor to the number of paths to its left neighbor.
This means calculating new paths only requires adding a couple numbers, not searching the whole space over and over again!
1 | 1 | 1 | 1 |
1 | 2 | 3 | 4 |
1 | 3 | 6 | 10 |
1 | 4 | 10 | 20 |
You can use this approach to calculate larger grids, until you reach a grid size that matches the input.
1 | 1 | 1 | 1 | 1 |
1 | 2 | 3 | 4 | 5 |
1 | 3 | 6 | 10 | 15 |
1 | 4 | 10 | 20 | 35 |
1 | 5 | 15 | 35 | 70 |
This approach calculates the paths through an open matrix. To include obstacles, you can set the number of paths to any cells that contain obstacles to 0, so that subsequent cells will not count those invalid paths.
Putting all of this into code, it looks like this:
public int uniquePathsWithObstacles(int[][] obstacleGrid) {
// The upper-left cell is blocked, so no paths are possible.
if(obstacleGrid[0][0] == 1){
return 0;
}
// Calculate the number of paths to each cell.
int[][] pathsToCell = new int[obstacleGrid.length][obstacleGrid[0].length];
// Assume there is 1 path to the starting cell.
pathsToCell[0][0] = 1;
// Iterate over every cell in the grid.
for(int r = 0; r < obstacleGrid.length; r++) {
for(int c = 0; c < obstacleGrid[r].length; c++) {
// Skip the upper-left cell, since that was already calculated.
if(r == 0 && c == 0) {
continue;
}
// If this cell contains an obstacle, leave its value at 0.
if(obstacleGrid[r][c] == 1){
continue;
}
int pathsFromAbove = 0;
if(r > 0){
pathsFromAbove = pathsToCell[r-1][c];
}
int pathsFromLeft = 0;
if(c > 0) {
pathsFromLeft = pathsToCell[r][c-1];
}
pathsToCell[r][c] = pathsFromAbove + pathsFromLeft;
}
}
// After calculating the whole grid, return the value for the lower-right cell.
return pathsToCell[obstacleGrid.length - 1][obstacleGrid[0].length - 1];
}
Now the code loops over every cell in the matrix, for a runtime complexity of O(n ^ 2)
. That’s much better than O(2 ^ n)
!
Node
class to model relationships between data.
Specifically, you’ve seen linked lists, where each node contains a value and points to a single child node:
You’ve also seen trees, where each node can point to multiple child nodes:
Trees are good for modeling hierarchical data, or decision-making processes. In a tree, nodes have parent-child relationships.
Graphs are another node-based data structure, but the relationships between nodes are not hierarchical:
In this example, node A
has three connected sibling nodes B
, C
, and D
. Node B
has two sibling nodes A
and E
. Node C
has three sibling nodes A
, E
, and F
, and so on.
Graphs can represent interconnected data, or physical spaces like streets or cities. It’s common to think about paths through graphs. In this example, there are several paths from A
to G
:
A -> B -> E -> G
A -> C -> E -> G
A -> C -> F -> G
A -> D -> F -> G
In the above example, each node is connected to a set of other nodes. Each connection is yes-or-no: two nodes are either connected, or they’re not. Node A
is connected to nodes B
, C
, and D
, but it’s not connected directly to nodes E
, F
, or G
.
A common enhancement to graphs is to include a cost (also called a weight) for each connection. You can think of this as the distance between two cities, or how much work it takes to get from one condition to the next.
Now each path has an associated cost:
A -> B -> E -> G
costs 10
A -> C -> E -> G
costs 12
A -> C -> F -> G
costs 8
A -> D -> F -> G
costs 13
The exact meaning behind costs depends on the problem. Costs are commonly associated with distance, but you can also think of it as how much work it takes to go from one node to another, or how desirable different paths are. For example, Google Maps sometimes highlights biking paths that might have a longer distance but will require less uphill pedaling.
You can also assign costs or weights to the nodes (also called vertexes) themselves.
The above examples assume that if two nodes are connected, they’re connected both ways. For example, if A
is connected to B
, then B
is connected to A
.
That means you can construct longer paths through the graph:
A -> B -> E -> C -> F -> G
costs 19A -> D -> F -> C -> E -> G
costs 20This is called an undirected or bidirectional graph.
Graphs can also be directed, which means connections are one-way. In visualizations, directed connections are often indicated by arrows:
This example contains a few types of connections:
A
to B
A
and C
E
to G
costs 1
but G
to E
costs 4
Again, the meaning behind directionality in a graph depends on the problem you’re solving. You can think of them as one-way streets, or as unreversible decisions.
A path inside a graph is considered a cycle if it starts and ends with the same node.
The above example contains a couple cycles:
A -> D -> F -> C -> A
C -> F -> G -> E -> C
Graphs containing cycles are called cyclic graphs, and graphs without cycles are acyclic.
Graphs that contain cycles require some extra care when traversing, to make sure you don’t get caught in an infinte loop. More on that below!
Graphs can be directed or undirected, and cyclic or acyclic.
Graphs that are both directed (connections are one-way) and acyclic (no path loops back on itself) are so useful that they have their own nickname: DAG, which stands for directed acyclic graph.
DAGs are graphs that model a hierarchical relationship between data. Does that sound familiar? That’s because trees are DAGs, with one extra requirement of each node only having a single parent.
This is a DAG, but not a tree:
Node D
has two parent nodes, which means it’s not a tree.
This is a tree, which is a specific type of DAG:
In other words, all trees are DAGs, but not all DAGs are trees.
To go a level deeper, linked lists are also DAGs. In fact, you can think of linked lists as a specific kind of tree where each node only has a single child!
So far, we’ve focused on the concepts behind graphs, but we haven’t seen any code yet.
That’s because graphs can be represented in a ton of different ways.
You could use a Node
class:
class Node {
Object value; // Could be int, String, etc.
List<Node> neighbors;
}
You could also represent costs for each connection:
class Node {
Object value; // Could be int, String, etc.
Map<Node, Integer> connections; // Map of neighbors to costs.
}
List<Node> graph;
The Node
class could work for a directed graph, but for a bidirectional graph you’d need to make sure every connection was reflected in both nodes.
Alternatively, you could represent a graph by storing the connections as their own objects:
class Node {
Object value;
}
class Connection {
Node one;
Node two;
}
List<Connection> graph;
With that approach, you can also represent directionality and costs by modifying the Connection
class:
class Connection {
Node source;
Node destination;
int cost;
}
Many Leetcode questions involving graphs provide the input in the form of a list of connections, usually 2D arrays where each inner array contains a source index, a destination index, and a cost.
int[][] graph = {
{2, 3, 17}, // Connection from node 2 to node 3 with 17 cost
{8, 1, 5}, // Connection from node 8 to node 1 with 5 cost
{5, 6, 2} // Connection from node 5 to node 6 with 2 cost
};
This example is directed and contains costs, but you’ll also encounter undirected graphs and graphs without costs in similar formats:
int[][] graph = {
{2, 3}, // Connection between node 2 and node 3
{8, 1}, // Connection between node 8 and node 1
{5, 6} // Connection between node 5 and node 6
};
The above examples use Node
or Connection
classes to represent a graph. These object-based representations are handy if you think in terms of objects!
Another way to represent a graph is with a 2D array, where each row in the array represents the paths from a node to the other nodes in the graph. This is called an adjacency matrix.
In other words, you’d fill out the costs in a table like this:
To Node | ||||||
---|---|---|---|---|---|---|
A | B | C | D | E | ||
From Node | A | x | ||||
B | x | |||||
C | x | |||||
D | x | |||||
E | x |
Each row in the table represents a different node in the graph, and the cells in the row represent the cost of the connection from that node to another node. This table contains x
values in the cells that connect a node to itself.
With this approach, your graph is a 2D array:
int[][] graph;
Each node in the graph is assigned an index. To check whether node 1
is connected to node 3
, you can look it up in the array:
int costFromNodeOneToNodeThree = graph[1][3];
The exact values in the matrix depend on your problem and what type of graph you’re working with. You might use boolean
values instead of costs. You might use 0
to represent a lack of a connection, or you might use infinity or null
.
Storing the connections as a list of node or index pairs makes it hard to look up whether a node is connected to another node. But storing the connections in a 2D array requires O(n ^ 2)
space which takes up more space than most graphs need.
Instead, you can convert your connection list into an adjacency list, which is a list of connections for each node.
It looks like this:
int[][] graph = {
{2, 3}, // Node 0 is connected to nodes 2 and 3.
{0}, // Node 1 is connected to node 0.
{0, 3}, // Node 2 is connected to nodes 0 and 3.
{1} // Node 3 is connected to node 1.
};
This example uses an array, but you could use a List<List<Integer>>
or a Map<Integer, List<Integer>>
as well.
You can also add a dimension for costs:
int[][] graph = {
{{2, 7}, {3, 5}}, // Node 0 is connected to nodes 2 with cost 7 and 3 with cost 5.
{{0, 6}}, // Node 1 is connected to node 0 with cost 6.
{{0, 2}, {3, 9}}, // Node 2 is connected to nodes 0 with cost 2 and 3 with cost 9.
{{1, 8}} // Node 3 is connected to node 1 with cost 8.
};
Conceptually this is very similar to the Node
object-based approach, but without going through objects. Both are valid, and you should use whichever approach makes the most sense to you.
As an example, take this graph:
This is a directional graph, where each connection has a cost.
You could represent it using a Node
class:
class Node {
String name;
Map<Node, Integer> connections = new HashMap<>();
public Node(String name) {
this.name = name;
}
public addConnection(Node neighbor, int cost){
connections.put(neighbor, cost);
}
}
Node a = new Node("A");
Node b = new Node("B");
Node c = new Node("C");
Node d = new Node("D");
Node e = new Node("E");
a.addConnection(b, 5);
a.addConnection(c, 3);
b.addConnection(c, 7);
b.addConnection(d, 2);
c.addConnection(a, 2);
c.addConnection(b, 7);
c.addConnection(e, 1);
d.addConnection(e, 9);
e.addConnection(a, 17);
e.addConnection(d, 9);
List<Node> graph = List.of(a, b, c, d, e);
Or you could use a Connection
class:
class Connection {
String fromNode; // Could also be a Node class.
String toNode;
int cost;
public Connection(String fromNode, String toNode, int cost) {
this.fromNode = fromNode;
this.toNode = toNode;
this.cost = cost;
}
}
List<Connection> graph = new ArrayList<>();
graph.add(new Connection("A", "B", 5));
graph.add(new Connection("A", "C", 3));
graph.add(new Connection("B", "C", 7));
graph.add(new Connection("B", "D", 2));
graph.add(new Connection("C", "A", 2));
graph.add(new Connection("C", "B", 7));
graph.add(new Connection("C", "E", 1));
graph.add(new Connection("D", "E", 9));
graph.add(new Connection("E", "A", 17));
graph.add(new Connection("E", "D", 9));
Or you could use a table:
To Node | ||||||
---|---|---|---|---|---|---|
A | B | C | D | E | ||
From Node | A | x | 5 | 3 | ∞ | ∞ |
B | ∞ | x | 7 | 2 | ∞ | |
C | 2 | 7 | x | ∞ | 1 | |
D | ∞ | ∞ | ∞ | x | 9 | |
E | 17 | ∞ | ∞ | 9 | x |
int[][] graph = {
{0, 5, 3, 0, 0},
{0, 0, 7, 2, 0},
{2, 7, 0, 0, 1},
{0, 0, 0, 0, 9},
{17, 0, 0, 9, 0}
}
Or an adjacency list:
int[][][] graph = {
// Node A is connected to Node 1 with 5 cost, and Node 2 with 3 cost.
{{1, 5}, {2, 3}},
// Node B is connected to Node C with 7 cost, and Node D with 3 cost.
{{2, 7}, {3, 2}},
// Node C is connected to Node A with 2 cost, Node B with 7 cost, and Node E with 1 cost.
{{0, 2}, {1, 7}, {4, 1}},
// Node D is connected to Node E with 9 cost.
{{4, 9}},
// Node E is connected to Node A with 17 cost, and Node D with 9 cost.
{{0, 17}, {3, 9}}
};
This adjacency list is a 3D array, where each row in the array represents a node, and each subarray represents a connection and cost. Again, you could use a List
or a Map
instead of an array.
The important thing to understand is that all of these representations reflect the same underlying graph!
Graphs are used for many types of problem:
These problems generally require building answers by traversing the graph. Similar to trees, you can traverse a graph depth-first or breadth-first, recursively or iteratively.
Remember that depth-first search can be implemented recursively or with a stack, and breadth-first search can be implemented using a queue. One catch is that if you’re working with a graph that contains cycles, you also need to keep track of the nodes you already visited to avoid infinite loops.
For example, say you wanted to write a function that checks whether a bidirectional graph contains a path from one node to another.
You could do that with an adjacency matrix:
public boolean containsPath(boolean[][] graph, boolean[] visited, int startNode, int endNode) {
// ...
}
This function takes four arguments:
graph
is an adjacency matrix. If graph[2][3]
is true
, that means node 2
is connected to node 3
.visited
is an array containing true
for the node indexes that have already been visited. The first time this function is called, visited
is filled with false
values.startNode
is the index of the starting node.endNode
is the index of the ending node.To implement depth-first search, you could call this function recursively to traverse the graph:
boolean containsPath(boolean[][] graph, boolean[] visited, int source, int destination) {
// Base case. If the source is the destination, the algorithm has found a path.
if(source == destination) {
return true;
}
// Base case. If the source was already visited, stop exploring in this direction.
if(visited[source]) {
return false;
}
// Mark the source node as visited.
visited[source] = true;
// Check for connections from source to other nodes.
for(int i = 0; i < graph[source].length; i++) {
// If source is connected to node i, try exploring in the direction of node i.
if(graph[source][i] && containsPath(graph, visited, i, destination)) {
return true;
}
}
// Couldn't find any paths to destination.'
return false;
}
This recursive function has two base cases: one if the source
and destination
are equal, and one if source
has already been visited. Then the code marks the current source
as visited, and recursively calls the function with every connected neighbor.
Try stepping through this code with a few example graphs!
You can also do the same thing with an object-based graph:
class Node {
int index;
List<Node> neighbors;
}
boolean containsPath(List<Node> graph, Set<Node> visited, Node source, Node destination) {
// ...
}
Now, instead of an adjacency matrix, the function takes these arguments:
graph
is a list of Node
instances. Each Node
stores its own connections.visited
is a Set
that contains the already-visited nodes.source
is the starting node.destination
is the destination node.The rest of the code is pretty similar to the adjacency matrix function, except now instead of looping through each index in the table row, the code loops through the current node’s neighbors:
boolean containsPath(List<Node> graph, Set<Node> visited, Node source, Node destination) {
// Base case. If the source is the destination, the algorithm has found a path.
if(source == destination) {
return true;
}
// Base case. If the source was already visited, stop exploring in this direction.
if(visited.contains(source)) {
return false;
}
// Mark the source node as visited.
visited.add(source);
// Explore each connection from the current node.
for(Node neighbor : source.neighbors) {
if(containsPath(graph, visited, neighbor, destination)) {
return true;
}
}
// Couldn't find any paths to destination.'
return false;
}
And here’s the same problem, using breadth-first search:
boolean containsPath(List<Node> graph, Node source, Node destination) {
// Keep track of which nodes were visited already.
Set<Node> visited = new HashSet<>();
// Breadth-first search queue of nodes.
Queue<Node> nodeQueue = new ArrayDeque();
nodeQueue.add(source);
while(!nodeQueue.isEmpty()) {
Node currentNode = nodeQueue.remove();
// If the current node is the destination, the algorithm found a path.
if(currentNode == destination) {
return true;
}
// If the current node was already visited, skip it.
if(visited.contains(currentNode)){
continue;
}
// Mark the current node as visited.
visited.add(currentNode);
// Explore the current node's neighbors.
nodeQueue.addAll(currentNode.neighbors);
}
// Couldn't find any paths to destination.'
return false;
}
Instead of using a Node
class, you could use an adjacency list:
boolean containsPath(List<List<Integer>> graph, Set<Integer> visited, int source, int destination) {
// Base case. If the source is the destination, the algorithm has found a path.
if(source == destination) {
return true;
}
// Base case. If the source was already visited, stop exploring in this direction.
if(visited.contains(source)) {
return false;
}
// Mark the source node as visited.
visited.add(source);
// Explore each connection from the current node.
for(int neighbor : graph.get(source)) {
if(containsPath(graph, visited, neighbor, destination)) {
return true;
}
}
// Couldn't find any paths to destination.
return false;
}
Again, this code is very similar to the Node
object-based approach, but it stores the connections in an adjacency list instead of a list of neighboring nodes.
With graph problems, half of the battle is taking the input data and converting it into a more useable format, like an adjacency matrix or a set of Node
instances.
For example, this example problem is on Leetcode as Find if Path Exists in Graph. The input of that function is an array of edges, where each edge connects two node indexes.
public boolean validPath(int n, int[][] edges, int source, int destination) {
// ...
}
By itself, that representation isn’t very useful. So the trick to the problem is converting the input into a more useful format.
This example builds an adjacency matrix and then calls the recursive function from above:
public boolean validPath(int n, int[][] edges, int source, int destination) {
// Build an adjacency matrix.
boolean[][] graph = new boolean[n][n];
for(int[] edge : edges){
graph[edge[0]][edge[1]] = true;
graph[edge[1]][edge[0]] = true;
}
// Call the recursive containsPath function.
return containsPath(graph, new boolean[n], source, destination);
}
This example builds a List
of Node
instances and then calls the depth-first search or breadth-first search function from before:
class Node {
int index;
List<Node> neighbors = new ArrayList<>();
public Node(int index) {
this.index = index;
}
public void addNeighbor(Node neighbor){
neighbors.add(neighbor);
}
}
public boolean validPath(int n, int[][] edges, int source, int destination) {
// Build a graph from Node instances.
List<Node> graph = new ArrayList<>();
for(int i = 0; i < n; i++) {
graph.add(new Node(i));
}
for(int[] edge : edges){
graph.get(edge[0]).addNeighbor(graph.get(edge[1]));
graph.get(edge[1]).addNeighbor(graph.get(edge[0]));
}
// Call the recursive containsPath function.
return containsPath(graph, new HashSet<>(), graph.get(source), graph.get(destination));
}
Finally, this example builds an adjacency list before calling the containsPath()
function from above:
public boolean validPath(int n, int[][] edges, int source, int destination) {
// Build an adjacency list.
List<List<Integer>> graph = new ArrayList<>();
for(int i = 0; i < n; i++) {
graph.add(new ArrayList<>());
}
for(int[] edge : edges){
graph.get(edge[0]).add(edge[1]);
graph.get(edge[1]).add(edge[0]);
}
// Call the recursive containsPath function.
return containsPath(graph, new HashSet<>(), source, destination);
}
In an interview environment, if having the graph in a particular format would make your life easier, ask your interviewer if the input can be in that format to start with. The worst case scenario is that they say no, but the best case scenario is that the problem becomes a lot easier to solve!
Link lists are composed of nodes. Each node contains a value and points to a single child node:
Trees are very similar, except each node can point to multiple child nodes!
In this example, node A points to two child nodes: node B and node C. Node B has three child nodes D, E, and F. Node C has a single child node G, which has two child nodes H and I.
Putting this into code, you might represent a tree using this Node
class:
class Node {
Object value; // Can be int, String, etc.
List<Node> children;
}
Often, trees contain nodes that have at most two children, also called a binary tree. Binary trees can be represented by this Node
class:
class Node {
Object value; // Can be int, String, etc.
Node leftChild;
Node rightChild;
}
Trees are useful for storing hierarchical data like directory structures, XML or HTML content, family relationships, or phylogenetic categorization. They’re also useful for storing data in a way that’s easier to navigate, or for representing decision-making processes.
Most other data structures we’ve studied are handy because they improve the efficiency of specific types of problems. And although trees can improve the efficiency of certain types of problems, their main benefit is unlocking a new way of thinking about problems in the first place.
For example, let’s say I have this directory structure on my computer:
C:/
Desktop/
Code/
HelloWorld.java
Test.java
cat.jpg
Documents/
post.txt
resume.pdf
You could probably represent this using an array if you really wanted to. But a tree data structure is much more natural to work with!
Remember that if you have an array of sorted data, you can use binary search to find the index of an element in O(log n)
time complexity.
Binary search trees are designed with the same idea in mind. A binary search tree is a tree where each node has a value and two children, but with a very important property: every value underneath the left child is less than the parent node’s value, and every value underneath the right child is greater than the parent node’s value.
class Node {
int value;
// Values under this node are less than value.
Node leftChild;
// Values under this node are greater than value.
Node rightChild;
}
Here’s an example:
Like the name suggests, binary search trees are designed to make searching easier. To find 45
in the above tree, you’d follow this algorithm:
45
is less than 50
, so traverse to the left child.45
is greater than 20
, so traverse to the right child.45
is greater than 30
, so traverse to the right child.45
. And it only took 4 steps!Binary search trees are very common in interview questions, so make sure you’re familiar with them!
Remember: Binary search trees are common, but not every tree is a binary search tree, or even a binary tree! Trees come in all shapes and sizes. 🌲🌳🌴🎄🎋
No matter what kind of tree you have, you’ll have to traverse it by navigating from parent nodes to child nodes.. There are a few different ways to do this, and being able to talk about them is a great way to impress your interviewer.
Here are a couple approaches for traversing a tree:
With depth-first search, you search through an entire child subtree before you search another child subtree. In other words, you search down before you search across.
Depth-first search is often implemented using recursion:
class Node {
Object value;
List<Node> children;
}
void depthFirstSearch(Node node) {
// Base case, just in case.
if (node == null) {
return;
}
// Do something with the current node.
System.out.println(node.value);
// Explore each subtree completely.
for (Node child : node.children) {
depthFirstSearch(child);
}
}
You can also use a stack to implement depth-first search iteratively:
void depthFirstSearchIterative(Node head) {
Deque<Node> nodeStack = new ArrayDeque<>();
nodeStack.push(head);
while (!nodeStack.isEmpty()) {
Node node = nodeStack.pop();
// Do something with the current node.
System.out.println(node.value);
// Add each child to the stack.
for (Node child : node.children) {
nodeStack.push(child);
}
}
}
Depth-first search is handy when you know you need to search the entire tree.
Depth-first search traverses the tree by exploring one child’s subtree before exploring another child’s subtree.
But you can specify the order that you visit children. Here are a few common techniques:
The order that you visit nodes depends on the problem. In an interview, talk through your options and the pros and cons of each approach.
Breadth-first search traverses the nodes in each layer of the tree before traversing to the next layer.
Breadth-first search is generally implemented using a queue:
void breadthFirstSearch(Node head) {
Deque<Node> nodeQueue = new ArrayDeque<>();
nodeQueue.add(head);
while(!nodeQueue.isEmpty()){
Node node = nodeQueue.remove();
// Do something with the current node.
System.out.println(node.value);
// Add each child to the queue.
for(Node child : node.children) {
nodeQueue.add(child);
}
}
}
Tree problems come in many shapes and sizes, but you can approach them with a few techniques.
Remember that interviews are meant to be conversations, so try to talk through what you’re thinking. Even naming various techniques, like mentioning that you’re thinking about depth-first search vs breadth-first search, can score you bonus points!
That might sound a little broad, but I’m specifically talking about linked lists, trees, and graphs.
This tutorial walks through the first of those: linked lists!
Let’s say you have an ArrayList
that contains 1000 elements:
List<String> list = new ArrayList<>();
for(int i = 0; i < 1000; i++) {
list.add(String.valueOf(i));
}
Now let’s say you want to remove the element at index 5
:
int indexToRemove = 5;
list.remove(indexToRemove);
What’s the algorithmic complexity?
Removing an element from an ArrayList
is linear O(n)
time, because you need to shift every subsequent element to the previous index. That means if a list contains a million elements, you need to move a million elements every time you delete one single element!
Similarly, let’s say you want to insert a new element at index 3
:
int indexToInsert = 3;
String valueToInsert = "this little manuever is gonna cost us O(n) time";
list.add(indexToInsert, valueToInsert);
What’s the algorithmic complexity?
Again, when you insert an element into an ArrayList
, you incur linear O(n)
complexity, because you need to shift every subsequent element up by an index.
You might think about using other data structures. You might use a HashSet
, but they can’t contain duplicates and don’t maintain insertion order. You might use a Queue
or a Stack
, but they only provide access to the first and last elements.
This is where linked lists come in handy!
Linked lists are represented by a core object, generally called a Node
. A linked list is defined by a head node, which contains a value and points to the next node.
class Node {
Node next;
Object value; // value can be int, String, etc
}
This might not look like much, but this lets you create data structures of arbitrary size, just from nodes that point to each other.
In this example, 42
is the head node, and its next
value points to the node with a value of 67
. That node points to the third node, which contains its own value and its own next node. That pattern continues for the whole list, until the last node (usually called the tail node) which has a next
node that points to null
.
The above example Node
class sets up a singly linked list because each node points to a single next
or child
node.
Another common pattern is for each Node
to also point to its previous
or parent
node.
Visualizing it looks like this:
And putting it into code looks like this:
class Node {
Node next;
Node prev;
Object value;
}
Circular linked lists are linked list where the tail node points back to the head node:
This can be a handy pattern, but make sure the cycle is intentional! See the Two Pointers section below for more info.
In any case, now that you have a linked list, you can insert a new node by pointing its parent to the new node, and pointing the new node’s next
to the node that was the parent’s next node.
That might sound confusing, but visualizing it looks like this:
In this example, a new value 362
is being inserted between 67
and 3
.
And putting it into code looks like this:
void insert(Node parent, Object newValue) {
Node newNode = new Node(newValue);
newNode.next = parent.next;
parent.next.prev = newNode;
parent.next = newNode;
}
Inserting a node is now constant O(1)
time, because the only thing that changes is a couple references to next nodes.
Similar to adding a node, if you want to remove a Node
, you can swap it in for its next node.
Visualizing it looks like this:
In this example, the 3
node is being removed. The only thing that needs to change is the next node that the 67
node points to!
And putting it into code looks like this:
void remove(Node node) {
Node parent = node.prev;
Node child = node.next;
child.prev = parent;
parent.next = child;
}
This assumes you have a doubly linked list. For an extra challenge, try it with a singly linked list! Hint: You can still do this in O(1)
time!
Linked lists are comprised of individual Node
instances that point to each other. You can’t know ahead of time which Node
will be 6th, 7th, 8th, etc, which means you can’t access a linked list by index.
(Well, you can, but it would require looping through every node until you get to that index!)
Instead, you’ll generally iterate through a linked list. That looks something like this:
// Assume you have an instance of Node
// Probably the head of a linked list
Node node;
// Loop until you get to the tail
while(node != null) {
// Do something with each Node
System.out.println(node.value);
// Iterate to the next Node
node = node.next;
}
Linked lists are very common in technical interview questions. I think this is because linked lists require object oriented programming and work in pretty much every language.
Luckily, there are a few common patterns that come in handy when you’re working with linked lists!
The main benefit of linked lists is that you can modify the list (inserting and removing elements) by adjusting a constant number of references.
You saw insert()
and remove()
above:
void insert(Node parent, Object newValue) {
Node newNode = new Node(newValue);
newNode.next = parent.next;
parent.next.prev = newNode;
parent.next = newNode;
}
void remove(Node node) {
Node parent = node.prev;
Node child = node.next;
child.prev = parent;
parent.next = child;
}
If you find yourself iterating over every node in a list and shifting a ton of values around, ask yourself if you could get away with modifying a few references!
For example, Delete Node in a Linked List challenges you to delete a node in a singly linked list, without a reference to the previous node.
You might be tempted to iterate over every node, shift each value up, and then delete the last node:
class Solution {
public void deleteNode(ListNode node) {
// Keep track of the previous node so
// so you can delete the tail node.
ListNode prevNode = node;
// Iterate over every node.
while(node.next != null) {
// Shift the next value up.
node.val = node.next.val;
// Remember the previous node.
prevNode = node;
// Iterate to the next node.
node = node.next;
}
// At this point, prevNode points to the
// tail node's parent. This deletes the
// tail node, which is no longer needed.
prevNode.next = null;
}
}
This is exactly what you’d do if you were working with arrays! But this incurs a linear O(n)
complexity, because it iterates over the entire list.
Instead, you can “think in linked lists” and change two references:
class Solution {
public void deleteNode(ListNode node) {
node.val = node.next.val;
node.next = node.next.next;
}
}
Now this is constant O(1)
time, because no matter how long the list is, you’re only doing a constant amount of work!
That being said, there are plenty of problems where you need to iterate over every element in a linked list. Tasks like finding a particular node or value, reversing a linked list, or detecting cycles all require iterating over the linked list.
Depending on the problem, you might iterate over the list with a while
or for
loop, or you might use recursion.
For example, let’s say you wanted to write a function that determines whether a linked list contains a value.
Here’s an iterative approach:
boolean linkedListContainsValue(Node node, Object value) {
while (node != null) {
if(node.value == value) {
return true;
}
node = node.next;
}
return false;
}
This function uses a while
loop to iterate over every node in the linked list, until it finds one with the target value. If the while
loop exits, that means no node contains the value, so it returns false
.
And here’s the same function, this time using recursion:
boolean linkedListContainsValue(Node node, Object value) {
if (node == null) {
return false;
}
if (node.value == value) {
return true;
}
return linkedListContainsValue(node.next, value);
}
This recursive function sets up two base cases: if node
is null
, that means the function recursed off the end of the list without finding the value. If the node’s value
is the target, then we’ve obviously found the value. Otherwise, the function recursively calls itself with the next node in the list.
Similar to what we discussed in the recursion tutorial, choosing between recursion and iteration often comes down to personal preference and how your brain organizes information. Keep both approaches in mind as tools you might use to solve problems!
Back in the arrays tutorial we talked about a technique that involved using two indexes to iterate through an array instead of relying on a single index. This comes in handy for searching through and processing an array in a single pass, instead of needing to iterate multiple times. In other words, two pointer techniques let you process an array in linear O(n)
time instead of quadratic O(n ^ 2)
time.
You can use a similar approach with linked lists!
For example, there’s a common problem of cycles in linked lists, where a node’s next
variable points to a previous node:
Cycles are a problem for linked lists, because they mean iterating over the list will never exit. How would you detect a cycle?
You might add a boolean visited
property to your Node
class, and set it to true
as you iterate. If you get to a node with a visited
property that’s already true
, then you’ve detected a cycle.
boolean containsCycle(Node node) {
while(node != null){
if(node.visited){
return true;
}
node.visited = true;
node = node.next;
}
return false;
}
But what if you couldn’t modify the underlying data structure?
You might also use another data structure, like a HashSet
that contains the nodes you’ve visited. If you get to a node that’s already in the HashSet
, then you’ve detected a cycle.
boolean containsCycle(Node node) {
Set<Node> visited = new HashSet<>();
while(node != null){
if(visited.contains(node)){
return true;
}
visited.add(node);
node = node.next;
}
return false;
}
But what if you couldn’t create any extra data structures?
Instead, you could use two pointers that move through the linked list at different speeds: one slow, and one fast. If the fast pointer ever “laps” the slow pointer and reaches it again, then you know there’s a cycle!
public boolean hasCycle(Node head) {
// slow moves one node at a time
Node slow = head;
// fast moves two nodes at a time
Node fast = head.next.next;
// iterate until fast falls off the end of the list
while(fast != null && fast.next != null){
// if fast catches up with slow, list contains a cycle
if(slow == fast){
return true;
}
// move by one node
slow = slow.next;
// move by two nodes
fast = fast.next.next;
}
// if the loop exits, no cycle was found
return false;
}
This runs in O(n)
time, and does not require modifying the underlying data or relying on any other data structures!
More formally, this is called Floyd’s cycle-finding algorithm, but the overall idea of using two pointers applies to many problems!
Earlier, I said that linked lists are not built into the language, but rather rely on objects like the Node
class.
That’s mostly true in the context of interviewing, because many questions are based on manipulating nodes directly. But it’s also an oversimplification!
Java does contain a few linked list data structures:
LinkedList
uses an internal Node
class to provide efficient insertion and deletion.
If you read the documentation for Java’s LinkedList
class, you might expect the add(int index, E element)
and remove(Object o)
functions to provide efficient insertion and deletion. But that’s wrong!
Those functions do not run in constant O(1)
time. They run in linear O(n)
time! What’s going on?
It turns out that those functions really do two things: first, they iterate to the correct node, and then they perform the insertion or deletion. The insertion or deletion by itself is constant O(1)
time, but the iteration costs linear O(n)
time.
The benefits of Java’s LinkedList
class are best taken advantage of in conjunction with another class: ListIterator
.
For example, let’s say you wanted to take in a List
of String
values, and insert "world"
after every occurrence of "hello"
in the list.
void addWorld(ArrayList<String> list) {
for(int i = 0; i < list.size(); i++) {
String word = list.get(i);
if(word.equals("hello")) {
list.add(i + 1, "world");
}
}
}
What’s the algorithmic complexity?
This code iterates over the entire list, and then for each element it potentially calls list.add(i + 1, "world")
which internally shifts every subsequent element to the next index. That gives the whole function an algorithmic complexity of quadratic O(n ^ 2)
time!
Here’s the same thing with a LinkedList
and a ListIterator
:
void addWorld(LinkedList<String> list) {
ListIterator<String> iterator = list.listIterator();
while(iterator.hasNext()) {
String word = iterator.next();
if(word.equals("hello")) {
iterator.add("world");
}
}
}
Now the code uses a ListIterator
to iterate over the entire list, and it calls iterator.add("world")
which relies on the underlying efficiency of the LinkedList
class. This gives the whole function an algorithmic complexity of linear O(n)
time!
Java also contains other classes that rely on linked lists behind the scenes. Specifically, LinkedHashSet
and LinkedHashMap
both use a linked list to maintain insertion order.
These data structures can come in handy in interviews, but it’s more common to be asked to work with nodes directly!