On StackOverflow, I was asked what the Java hashCode method does, but I did not have enough room to answer. I decided to write a blog article about the topic. The hashCode method is a part of the Object class. This means that every single class in Java has a hashCode method. To explain what it does, I’m going to use an analogy:
Lets imagine you go to the mall and you’re trying to find a movie poster of your favorite movie. The store has one of those wall poster displays that looks like this:
You want to know if there is a poster for the movie, “Wall-e”, but you notice that the posters are not ordered in any way. How do you find it? Well, assuming you want to be thorough, you’d start at the beginning and flip through each frame until you find that poster. (To relate this back to computer science terms, the wall poster display is an array of Poster objects, and you are using a for loop to iterate through the array.)
This is an acceptable way to find a poster under certain conditions, but it does have some performance concerns. For example, what if there is no “Wall-e” poster? Well, you’d waste a lot of time looking through every poster only to realize that it’s not there.
Lets pretend that at this point you notice the outside edge of each poster frame has a number. The first poster frame is labeled “0”, the second is labeled “1”, the third is labeled “2”, and so on. You turn to the cashier and ask, “What do these numbers on the side mean?”
He replies, “If you’re looking for a particular poster, just tell me the name and I’ll tell you the number. That way you don’t have to flip through every frame and look at every poster. You can just skip right to the number”.
“OK”, you say, “I’m looking for a ‘Wall-e’ poster”.
In the blink of an eye he tells you, “That’s number 53”.
Because he told you this, you don’t have to flip through the display at all, you can see all the numbers on the side so you just skip to 53. You decide you also want to buy a poster for “The Usual Suspects” and he tells you that’s number 103. But when you flip to that part, you notice there’s no posters in that frame.
He shrugs and says, “I guess we’re either out of that one or we never had it”.
In this story, the cashier is acting just like the hashCode method. You ask him for a number based off of an object (like a “Wall-e” poster). That’s a part of what hashCode does: It gives you a number for an object. But as you’ve seen in the story, there’s a little more to it than that. I’ll go into depth about these details below:
The hashCode method must always return the same number for equal objects. For example, if I ask for “The Usual Suspects” once and I get one number and then I ask for it again and get another, I won’t know what to believe: If the poster in the display were missing the first time, that wouldn’t tell you anything because you could ask him again and he’d tell you a different number to look at. It’s because the number is always the same for “The Usual Suspects” that you know exactly where it should be. If the number changed every time, you’d never know if it’s missing or you have to ask again.
Another subtle rule about hashCode is that the numbers are allowed to overlap for different objects. For example, if you ask the cashier the number for “The Godfather” poster, he may say, “It’s number 53”.
“Wait a second”, you think, “isn’t that the same number for ‘Wall-e’?” You flip to 53 again and realize that there are two sides to each frame. “Wall-e” is on the left side, “The Godfather” is on the right. In Java, hashCode values are allowed to overlap just like this, too.
I just want to point out that the analogy breaks down a little bit here because the wall poster display can only put 2 posters in one frame. But in Java, you can have unlimited posters in one frame. This is because in Java, the “frame” could be an array or a list of any size you want. You really want to reduce the number of posters you put in one frame though because you have to look through each one. Using the example above, both “Wall-e” and “The Godfather” were in frame number 53. Imagine if there were thousands of things in frame number 53. It would take a very long time to look through all of them. It would be much quicker if only one poster were in number 53. That’s why a good hashCode method will have as little overlap as possible.
So you may be wondering, why does hashCode allow you to have overlapping numbers for different objects if it can take more time to find what you’re looking for when the objects overlap? The reason is because it’s actually pretty difficult to come up with a perfect hashCode algorithm that returns a different number for each unique object. But, it turns out it’s much easier to come up with a pretty good algorithm: One that usually returns a unique number for each unique object but sometimes overlaps.
In Java terms, you may be wondering what the wall poster display represents. It represents a certain kind of collection. A few of these are named things like HashSet and HashMap . These collections use the hashCode of the objects you pass into them just like wall poster display does. (Warning, the following is an oversimplification) If you put an object that has a hashCode that returns 5 into one of these collections, it will put that object inside a list of an internal array at index number 5. Then you can later ask the HashSet if it contains that object by running this code:
1 |
boolean result = hashSet.contains(myObject); |
Internally, the HashSet will see if it has anything in the list at index 5. If it does, it will iterate each element then check to see if objectPassedIntoTheContainsMethod.equals(myObjectAtIndex5) and return the result.
Conclusion
Hopefully, this gives you a good idea of what a hashCode does and why it is useful. My goal is to make this very easy to understand, but not 100% thorough. If you’d like to have a more in depth understanding of how hashCode works, I’m hoping you will find the other material out there more palatable now that you’ve read this article. If you have any questions, don’t hesitate to ask in a comment or send me an email.