How HashSet works internally in Java

Hello Friends,

In this tutorial,we will see how java.util.HashSet works internally.

I would strongly recommend anybody reading this tutorial to read first How HashMap works Internally in java

HashSet is backed by a HashMap.What this statement means is that whenever we create instance of HashSet,an instance of HashMap is created in background.

HashSet class has reference variable of HashMap class as instance variable,which is declared as below

private transient HashMap<E,Object> map;

So basically whenever we instantiate HashSet using any of it's constructor,this map will be instantiated with HashMap object.





Let's look at the constructors of HashSet to check what happens when we instantiate(create instance of HashSet class).

Constructor 1 : No argument Constructor
   /**
     * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
     * default initial capacity (16) and load factor (0.75).
     */
    public HashSet() {
         map = new HashMap<E,Object>();
    }

Example :
Set<String> set = new HashSet<String>();
So as we can see when we instantiate a HashSet,an instance of HashMap class is created in the background with default capacity of 16 and load factor of 0.75.

Constructor 2 : Single argument constructor which takes initial capacity as parameter.
   /**
     * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
     * the specified initial capacity and default load factor (0.75).
     *
     * @param      initialCapacity   the initial capacity of the hash table
     * @throws     IllegalArgumentException if the initial capacity is less
     *             than zero
     */
    public HashSet(int initialCapacity) {
         map = new HashMap<E,Object>(initialCapacity);
    }
 Example :
Set<String> set = new HashSet<String>(30);

This constructor will in turn call single argument constructor of HashMap class and create instance of HashMap with initial capacity of 30.

Constructor 3 :Two argument constructor which takes capacity and load factor.
   /**
     * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
     * the specified initial capacity and the specified load factor.
     *
     * @param      initialCapacity   the initial capacity of the hash map
     * @param      loadFactor        the load factor of the hash map
     * @throws     IllegalArgumentException if the initial capacity is less
     *             than zero, or if the load factor is nonpositive
     */
    public HashSet(int initialCapacity, float loadFactor) {
      map = new HashMap<E,Object>(initialCapacity, loadFactor);
    }
Example :
Set<String> set = new HashSet<String>(30,0.75f);
This constructor will in turn call two argument constructor of HashMap class with intial capacity as 30 and load factor as .75f.

Constructor 4 : This is a package private(default access) constructor which is NOT FOR INSANTIATING HASHSET.It is called from within LinkedHashSet Constructor.Three argument constructor which takes capacity,loadFactor and a dummy boolean value.  
   /**                                                                                 * Constructs a new, empty linked hash set.  (This package private
     * constructor is only used by LinkedHashSet.) The backing
     * HashMap instance is a LinkedHashMap with the specified initial
     * capacity and the specified load factor.
     *
     * @param      initialCapacity   the initial capacity of the hash map
     * @param      loadFactor        the load factor of the hash map
     * @param      dummy             ignored (distinguishes this
     *             constructor from other int, float constructor.)
     * @throws     IllegalArgumentException if the initial capacity is less
     *             than zero, or if the load factor is nonpositive
     */
    HashSet(int initialCapacity, float loadFactor, boolean dummy) {
      map = new LinkedHashMap<E,Object>(initialCapacity, loadFactor);
    }
Example :
Set<String> set = new LinkedHashSet<String>(30,0.75f,false);
This constructor is used for instantiating LinkedHashSet class. The third variable is just a dummy variable, as it does not have any significance and is used only to differentiate between two argument constructor of HashSet(Backed by HashMap) and LinkedHashSet(Backed by LinkedHashMap), so if you open LinkedHashSet class of java library, you will find following :  
   /**
     * Constructs a new, empty linked hash set with the specified initial
     * capacity and load factor.
     *
     * @param      initialCapacity the initial capacity of the linked hash set
     * @param      loadFactor      the load factor of the linked hash set
     * @throws     IllegalArgumentException  if the initial capacity is less
     *               than zero, or if the load factor is nonpositive
     */
    public LinkedHashSet(int initialCapacity, float loadFactor) {
        super(initialCapacity, loadFactor, true);
    }
and because LinkedHashSet extends HashSet class, super within LinkedHashSet constructor will call superclass constructor i.e constructor of HashSet class with three arguments.

Constructor 5 : Takes a Collection as parameter   
    /**                                                                               * Constructs a new set containing the elements in the specified
     * collection.  The <tt>HashMap</tt> is created with default load factor
     * (0.75) and an initial capacity sufficient to contain the elements in
     * the specified collection.
     *
     * @param c the collection whose elements are to be placed into this set
     * @throws NullPointerException if the specified collection is null
     */
    public HashSet(Collection<? extends E> c) {
         map = new HashMap<E,Object>(Math.max((int) (c.size()/.75f) + 1, 16));
         addAll(c);
    }
Example :
List<String> list = new ArrayList<String>();
list.add("a");
list.add("b");
list.add("c");
 
Set<String> set = new HashSet<String>(list);
for(String str : set1){
 System.out.println(str);
}
Output :
b
c
a

How to add elements in HashSet 

HashSet provides add method to add elements. Let us check how add method of HashSet works.

Below is the source code of add method of HashSet :
/**
*Adds the specified element to this set if it is not already present. More          *formally, adds the specified element e to this set if this set contains no *element e2 such that (e==null ? e2==null : e.equals(e2)). If this set already *contains the element, the call leaves the set unchanged and returns false.
*Parameters:
*e element to be added to this set
*Returns:true if this set did not already contain the specified element
*/
public boolean add(E e) {
     return map.put(e, PRESENT)==null;
}

Here PRESENT is defined in the HashSet Class as below
// Dummy value to associate with an Object in the backing Map
 private static final Object PRESENT = new Object();
So basically whenever we add an element in a HashSet,it will add that element as Key of underlying HashMap and a dummy value PRESENT(instance of Object class) is put as Value of HashMap.As HashMap does not allow duplicate Keys,HashSet achieves its uniqueness behavior because of this.

Now if we see HashMap class's put method's implementation,we can see that it returns null whenever an entry with unique key is made in HashMap

If Key to be added is not unique and already exist,put method simply replaces old value with new value and returns old value against key.Please check how put method of HashMap works internally.check How HashMap Works Internally In Java.

Below is the source code of put() method of HashMap class.
   /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key.hashCode());
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }
Coming back to add method of HashSet, it has following code
map.put(e, PRESENT)==null;

which says that, if put method of underlying HashMap returns null then return true, which means element has been successfully added to HashSet and by now we know very well that put method of HashMap class returns null when Key we are adding is unique.

When we will add duplicate value in HashSet, put method will simply replace old value with new value and return old value, which of-course will not be null, so add method will return false depicting that duplicate element has not been added.

How to get elements from HashSet

There is no get() method as such in HashSet class.Probable reason can be that

1. HashSet is not an ordered collection,which means that order in which elements are retrieved    from HashSet is different from order in which we add elements in the HashSet,so it does not make sense to get element from HashSet on the basis of index like get(index i),which is used in case of ArrayList,Also as we know that HashSet uses HashMap as underlying data structure to store data and it is quite possible that different objects that we add to HashSet may lye at the same index of the underlying HashMap,because of same hash value.In that case  which object HashSet will return?

Note : ArrayList is ordered collection and each object is stored at the separate index of underlying array.To know How ArrayLists in Java work internally,please go through ArrayList features every Java developer should know

2.If we think of get(Object obj) ,which is used in case of HashMap class,this also does not make sense in case of HashSet,as in case of HashMap,we pass key as parameter to the get(Object obj) and retrieve value,but in case of  HashSet,we dont store key value pairs,but only values.

So how to retrieve elements from HashSet

We can retrieve elements of HashSet in traditional ways as below

1.Using Iterator
Set<String> set = new HashSet<String>();
set.add("GB");
set.add("BG");
set.add("GG");
Iterator<String> setItr  = set.iterator();
while(setItr.hasNext()){
 String str = (String)setItr.next();
 System.out.println(str);
}
2. Using enhanced for loop
Set<String> set = new HashSet<String>();
set.add("GB");
set.add("BG");
set.add("GG");
for(String str : set){
 System.out.println(str);
}
To know more about how to iterate over HashSet and other commonly used collections in java,please go through Various ways to iterate over commonly used collections viz ArrayList,HashMap,HashSet

How to remove elements from HashSet

HashSet provides remove(Object o) method to remove elements from it.
   /**
     * Removes the specified element from this set if it is present.
     * More formally, removes an element <tt>e</tt> such that
     * <tt>(o==null&nbsp;?&nbsp;e==null&nbsp;:&nbsp;o.equals(e))</tt>,
     * if this set contains such an element.  Returns <tt>true</tt> if
     * this set contained the element (or equivalently, if this set
     * changed as a result of the call).  (This set will not contain the
     * element once the call returns.)
     *
     * @param o object to be removed from this set, if present
     * @return <tt>true</tt> if the set contained the specified element
     */
    public boolean remove(Object o) {
       return map.remove(o)==PRESENT;
    }
So as you can see from above source code of remove(Object o) method,it calls remove(Object o) method of HashMap class to remove element as elements are actually stored in the underlying HashMap class.

Let us see source code of remove method of HashMap class :
   /**
     * Removes the mapping for the specified key from this map if present.
     *
     * @param  key key whose mapping is to be removed from the map
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V remove(Object key) {
        Entry<K,V> e = removeEntryForKey(key);
        return (e == null ? null : e.value);
    }
 What this method is doing is, it is calling removeEntryForKey() method and if this method returns null then return null ,else it returns value associated with passed key.
   /**
     * Removes and returns the entry associated with the specified key
     * in the HashMap.  Returns null if the HashMap contains no mapping
     * for this key.
     */
    final Entry<K,V> removeEntryForKey(Object key) {
        int hash = (key == null) ? 0 : hash(key.hashCode());
        int i = indexFor(hash, table.length);
        Entry<K,V> prev = table[i];
        Entry<K,V> e = prev;

        while (e != null) {
            Entry<K,V> next = e.next;
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k)))) {
                modCount++;
                size--;
                if (prev == e)
                    table[i] = next;
                else
                    prev.next = next;
                e.recordRemoval(this);
                return e;
            }
            prev = e;
            e = next;
        }

        return e;
    }
So as we can see above remove() method of HashMap is calling removeEntryForKey() method in turn and passing key to it. Let us see what removeEntryForKey() method is doing :

1. Find hash value of the key.
 If key is null, hash value will always be zero.
 If key is not null then call the hashCode() method of key and get hash value. Pass this hash value to standard hash() method.
2. Pass hash value and table length(length of the array of Entry objects). to indexFor() method, which will return the index of Entry objects array where this entry is stored.
3. Now as we know that at every index there can be multiple entry objects (because they would be  having same hash value) maintained in the form of Linked list, we need to traverse through this linked list.
Entry<K,V> prev = table[i];
This will give entry object at the index that we found in step 2 and assign reference to that object to prev.     
Entry<K,V> e = prev;
 This will make e also to point same entry object which prev was pointing.
while (e != null) {
            Entry<K,V> next = e.next;
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k)))) {
                modCount++;
                size--;
                if (prev == e)
                    table[i] = next;
                else
                    prev.next = next;
                e.recordRemoval(this);
                return e;
            }
            prev = e;
            e = next;
        }

This while loop will loop through elements(entry objects) of LinkedList until it finds next variable of Entry object as null while traversing, which shows that it is end of LinkedList OR till the point where matching key has been found, in which case it will return and rest of LinkedList will not be traversed.
e.hash == hash
Within loop, as it traverse through linked list elements(Entry objects),it checks whether hash value of key of that Entry object is equal to the hash value which we calculated of the key object ,corresponding entry of which we want to remove.
(k = e.key) == key || (key != null && key.equals(k)))
Checks whether key ,corresponding to which we want to remove entry and key we are referring  currently while traversing are same objects OR (||) key corresponding to which we want to remove entry is not null and it is equal(in value) to the current key object we are traversing by calling overridden equals() method of key object.

Increase modCount, which is used to trace structural modifications on HashMap. Structural  modifications are those  that change the number of mappings in the HashMap or otherwise modify its internal structure.

 Decrease size by 1.

 If key is at fist position of the LinkedList, follwing if statement will be executed.    
if (prev == e)
            table[i] = next;
If key is found somewhere in middle or at last of LinkedList ,else will be executed.   
else
        prev.next = next;
recordRemoval() method has empty body, so does nothing. It is implemented in case of LinkedHashMap.

And with this we learnt How HashSet works internally in Java.Hope this article was helpful.If you have any question,suggestion,please feel free to give your comments in comments section..