Wednesday, August 3, 2016

Java HashSet and TreeSet

It says that hashset in java doesn't mantain an order but looking at my program
public static void main(String[] args) {
    HashSet<Integer> io=new HashSet<Integer>();

    Integer io1=new Integer(4);
    Integer io2=new Integer(5);
    Integer io3=new Integer(6);

    io.add(io2);
    io.add(io3);
    io.add(io1);

    System.out.println(io);
}
and execute it it give me an ordered set everytime i run it. Why this is happening?
Another question is: if i implement a treeset (like i did in the previous program but instead of hashset using treeset and intead of Integer using my class) i have to implement compareto ?

Answer by Eran:

HashSet doesn't maintain order, but it has to iterate over the elements in some order when you print them. HashSet is backed by a HashMap, which iterates over the elements in the order of the bins in which they are stored. In your simple example, 4,5,6 are mapped to bins 4,5,6 (since the hashCode of an integer is the value of the integer) so they are printed in ascending order.
If you tried to add 40,50,60 you would see a different order ([50, 40, 60]), since the default initial number of bins is 16, so the hash codes 40,50,60 will be mapped to bins 40%16 (8),50%16 (2) ,60%16 (12), so 50 is the first element iterated, followed by 50 and 60.
As for TreeSet<SomeCostumClass>, you can either implement Comparable<SomeCostumClass> in SomeCostumClass, or pass a Comparator<SomeCostumClass> to the constructor.

When to prefer TreeSet over HashSet

1.  Sorted unique elements are required instead of unique elements.The sorted list given by TreeSet is always in ascending order.

2.   TreeSet has greater locality than HashSet.

If two entries  are near by in the order , then TreeSet places them near each other in data structure and hence in memory, while HashSet spreads the entries all over memory  regardless of the keys they are associated to. 
     
As we know Data reads from the hard drive takes much more latency time than data read from the cache or memory. In case data needs to be read from hard drive than prefer TreeSet as it has greater locality than HashSet.

3. TreeSet uses Red- Black tree algorithm underneath to sort out the elements. When one need to perform read/write operations frequently , then TreeSet is a good choice.

No comments:

Post a Comment