首页 > > 程序设计 > Java技术 >

hashMap源码剖析

2019-08-26 06:32:03来源：博客园阅读 ()

hashMap源码剖析

hashMap源码阅读：

1，概述

2，HashMap核心成员变量

3，HashMap构造函数

4，HashMap核心方法

5，一些问题的理解

1，概述

搞java的人，都应该知道hashMap的底层数据结构是一个数组+链表（+红黑树）。

大体思路：首先是基于key做hash操作，然后与数组长度取模，定位到某个数组位置。如果冲突了（可能是hash冲突，或者是hash值与长度取模之后），就会在该数组位置再挂一个链表。jdk1.8以后当链表长度达到8之后，就转化为红黑树（因为链表的时间复杂度是N，红黑树是log N）,提升了性能。

以上是大概思路，其实jdk源码是优化的，比如hash算法，取模操作这些都是位运算来替代的。这篇文章暂时不涉及到树结构，后面有空再来聊聊树结构。画个简易图：

2，HashMap核心成员变量

先看看HashMap的核心成员，先了解下这些，后面源码部分都会看到。

#默认主的数据结构数组的初始化大小：16
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4
#数组的最大大小
static final int MAXIMUM_CAPACITY = 1 << 30;
#负载因子
static final float DEFAULT_LOAD_FACTOR = 0.75f;
#将链表转化为红黑树的临界值
static final int TREEIFY_THRESHOLD = 8;
#将红黑树转化为链表的临界值
static final int UNTREEIFY_THRESHOLD = 6;
#转变成树的数组的最小长度，小于该值则不会进行树化
static final int MIN_TREEIFY_CAPACITY = 64;
#主的数据结构数组
transient Node<K,V>[] table;
#map的键值对的集合
transient Set<Map.Entry<K,V>> entrySet; 
#map中键值对的数量
transient int size;
#用于统计map修改次数的计数器，用于fail-fast抛出ConcurrentModificationException
transient int modCount;
#数组长度大于该值，则数组会进行扩容
int threshold;
#负载因子
final float loadFactor;

3，HashMap构造函数

hashMap构造函数有4种：

3.1 无参构造函数

    public HashMap() {
        #初始化只是设置了默认的负载因子为0.75
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }

3.2 传初始化容量构造函数

    public HashMap(int initialCapacity) {
        #推荐使用这种（因为数组扩容，数据再重新分配是比较消耗性能的。）
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

3.3 传初始化容量和负载因子

    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        #tableSizeFor()方法用于将传入的初始化容量转化为大于等于2的N次方的最接近初始化容量的数。比如传入7，threshold=8。threshold前面说过，这个数其实是当数组达到该长度，就会进行扩容
        this.threshold = tableSizeFor(initialCapacity);
    }

看看这个方法，tableSizeFor(initialCapacity)。

    /**
     * Returns a power of two size for the given target capacity.
     */
    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

其实看jdk源代码，会发现运算绝大多数都是用的位运算，主要是为了提升性能。

|（按位或）：如

1101 | 0011 = 1111    0010 | 0101 = 0111

>>>(无符号右移运算符):如

0100 >>> 2 = 0001

算了。证明这个tableSizeFor()貌似没啥意义，作为码农，大概知道这个结论足够了。就跳过了。

3.4 传map转化为hashMap

    public HashMap(Map<? extends K, ? extends V> m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        putMapEntries(m, false);
    }

putMapEntries(m, false):

    final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
        #map的键值对的数量
        int s = m.size();
        if (s > 0) {
            if (table == null) { // pre-size
                float ft = ((float)s / loadFactor) + 1.0F;
                int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                         (int)ft : MAXIMUM_CAPACITY);
                #设置threshold值
                if (t > threshold)
                    threshold = tableSizeFor(t);
            }
            else if (s > threshold)
                resize();
            #将map的值遍历出来赋值到hashMap的数据结构。下面的putVal()下面会重点讲，暂时跳过
            for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
                K key = e.getKey();
                V value = e.getValue();
                putVal(hash(key), key, value, false, evict);
            }
        }
    }

4，hashMap核心方法

4.1 put()

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}


  #这是hashMap重写的hash算法。先右移16位，然后将高低16位做^操作，可以减少hash冲突的概率。
    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
     
     final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            #主数据结构数组如果为空，则需要扩容。所以第一次put操作肯定会执行resize()。
            n = (tab = resize()).length;
        #如果数组对应的角标没有值，执行newNode()方法。（这里其实就是用的&位运算符来达到取模的效果的）
        if ((p = tab[i = (n - 1) & hash]) == null)
            #就是创建一个Node,并指向它。这里相当就是只有数组的赋值操作。数组存放的元素就是Node,Node是hashMap的一个内部类
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            #这个判断条件其实就是判断传入的key是否相同（判断数组上的那个节点，不包含树和链表）。
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            #判断是否是数结构，是数结构则进行数的插入。后续会讲解数的数据结构
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            #走到这里说明数组该角标有值，也就是hash冲突了，所以这里是挂载链表操作。挂载链表就会遍历链表操作，如是否含有相同的key，有执行覆盖操作，没有就会遍历到链表最后然后挂载上去，而且还要判断是否需要把链表转化红黑树等等逻辑在里面
            else {
                for (int binCount = 0; ; ++binCount) {
                    #如果节点遍历到最后了，就创建一个新的Node挂载到链表最后
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        这里判断是否需要将链表转化为红黑树，这里可以看出来当链表长度为8时就会转化为红黑树了
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                           treeifyBin(tab, hash);
                        break;
                    }
                    #判断是否有key相同的情况（链表的节点，不包含数组那个节点，那个节点在前面就已经判断过了）
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            #这里是将相同的key的value做覆盖操作。这里onlyIfAbsent默认是false，就是要做覆盖。如果是true，就不会覆盖原有值。提出需求，怎么使用hashMap不覆盖相同key的value值。
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                #钩子函数，用于子类继承来实现，这里是空的。比如linkedHashMap就实现了。后面会说linkedHashMap
                afterNodeAccess(e);
                return oldValue;
            }
        }
        #修改计数器
        ++modCount;
        #如果map的键值对数量>threshold时，进行扩容
        if (++size > threshold)
            resize();
        #钩子函数，用于子类继承来实现，这里是空的。比如linkedHashMap就实现了。后面会说linkedHashMap
        afterNodeInsertion(evict); 
        return null; 
}

put()方法需要了解的点：

1，重写了hash算法(将hash值右移了16位，然后与原有的hash值做了^操作)，可以减少hash冲突；

2，将优化算出来的hash值与(数组长度-1)做了&操作，替代了取模的操作；

3，插入流程：

1)首先基于前面1,2,找到数组角标，如果没有值，就插入到数组位置。

2)如果数组位置有值了，表明就是冲突了。然后先判断是否是相同的key，有就做覆盖操作；没有的话先判断是否是数结构，是数结构就插入到数里面；不是数结构就遍历插入到链表里。注意这里是单向链表。

4，数组扩容是2倍操作，源码为：

            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold

5，扩容以后，所有的元素会重新rehash()。之前没冲突的可能会冲突了，之前冲突的可能没冲突了。不过基于[hash值与(数组长度-1)做了&操作]的rehash，会有这么一个规律：之前在index位置的，重新rehash之后，要么在index位置，要么在index + oldCap位置。有兴趣的可以去研究下这个算法：

    final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if (oldTab != null) {
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if ((e = oldTab[j]) != null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                else
                                    hiTail.next = e;
                                hiTail = e;
                            }
                        } while ((e = next) != null);
                        if (loTail != null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if (hiTail != null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }

6，数据结构有数组，单向链表，红黑树。链表长度达到8就转化为红黑树。

7，预留了两个钩子函数，LinkedHashMap会实现的。

4.2 get()

    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }

    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            #这个其实就是判断传过来的key是否在数组这个数据结构，如果在就可以直接返回。
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            #如果不在，进入到这里就表示要么在链表中，要么在树节点了
            if ((e = first.next) != null) {
                #在树节点遍历
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                #在链表中遍历这个操作
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

get()方法需要了解的点：

get()方法还是蛮简单的。流程如下：首先定位到数组的位置，如果存在，就判断是否是第一个元素，是的话就直接返回，因为这时该节点肯定不在树或者单向链表了。不是就去树和单向链表查询。

5,一些问题的理解

5.1 hashMap的成员变量entrySet的生成过程

分析完put操作之后，发现并没有给entrySet做赋值操作。一开始我一直以为是插入的时候顺便将键值对放在Set集合中，如果真的是这样，不是又多了一个数据结构，而且数据保存两份，显然不会这样，下面来分析下这个问题。

当你执行完map.entrySet(),返回一个EntrySet对象，这个对象是hashMap的内部类，它的迭代器方法：

        public final Iterator<Map.Entry<K,V>> iterator() {
            return new EntryIterator();
        }

返回一个EntryIterator对象，源码如下：

    final class EntryIterator extends HashIterator
        implements Iterator<Map.Entry<K,V>> {
        public final Map.Entry<K,V> next() { return nextNode(); }
    }

这里初始化的时候，会先初始化父类HashIterator,看看这个：

        HashIterator() {
            expectedModCount = modCount;
            Node<K,V>[] t = table;
            current = next = null;
            index = 0;
            if (t != null && size > 0) { // advance to first entry
                do {} while (index < t.length && (next = t[index++]) == null);
            }
        }

这里的table就是主的数据结构数组。所以分析下来能够总结出来：

entrySet是不存储数据的，是一个视图，遍历的时候还是去读取hashMap那些数据结构存储的数据的。

这里也有个小疑惑，测试代码如下

        HashMap<String,String> map=new HashMap<String,String>();

        map.put("zhangsan","18");
        map.put("lisi","25");
        map.put("wangwu","21");

        Set set=map.entrySet();

为啥执行到这一步的时候，set就已经有值了。何时赋值进去的呢？源码没找到，希望知道的说一下。

5.2,怎么使用hashMap不覆盖相同key的value值

这个问题前面提出来过。其实实现起来很简单，就是装饰模式。在put的时候做个判断，如果有相同的key，就不需要执行hashMap的put操作了。那么是如何判断key相同呢？看源码：

            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))

可以看出来只要实现了hashCode(）方法和equals()方法就能完成操作。

而String自身实现了这两个方法，所以实现demo如下：

/**
 * 如何实现hashMap相同key,不会覆盖的操作
 */
class MyHashMap<K> extends HashMap<K, String> {
    @Override
    public String put(K key, String value) {
        //如果key相同就不执行操作
        if(containsKey(key)){
            return null;
        }
        //key不相同就执行hashMap的put()操作
        return super.put(key, value);
    }
}

public class TestHashMap {
    public static void main(String[] args) {
        MyHashMap<String> hashMap = new MyHashMap<String>();
        hashMap.put("zhangsan", "18");
        hashMap.put("zhangsan", "25");
        hashMap.put("zhangsan", "16");

        for (Map.Entry entry : hashMap.entrySet()) {
            System.out.println(entry);
        }
    }
}

结果为：

而如果是自定义的类，就需要再实现下hashCode()和equals()方法即可。例子：

/**
 * 如何实现hashMap相同key,不会覆盖的操作
 */
class MyHashMap<K> extends HashMap<K, String> {
    @Override
    public String put(K key, String value) {
        //如果key相同就不执行操作
        if (containsKey(key)) {
            return null;
        }
        //key不相同就执行hashMap的put()操作
        return super.put(key, value);
    }
}

/**
 * 假设身份证号相同就视为同一个人
 */
class Person {
    String name;
    String cardId;

    @Override
    public String toString() {
        return "Person{" +
                "name='" + name + '\'' +
                ", cardId='" + cardId + '\'' +
                '}';
    }

    public Person(String name, String cardId) {
        this.name = name;
        this.cardId = cardId;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }


    public String getCardId() {
        return cardId;
    }

    public void setCardId(String cardId) {
        this.cardId = cardId;
    }

    @Override
    public boolean equals(Object obj) {
        if (obj == this) {
            return true;
        }

        Person p = (Person) obj;
        if (this.getCardId().equals(p.getCardId())) {
            return true;
        }

        return false;
    }

    @Override
    public int hashCode() {
        return Objects.hash(cardId);
    }
}

public class TestHashMap {
    public static void main(String[] args) {
        MyHashMap<Person> hashMap = new MyHashMap<Person>();
        Person p1 = new Person("zhangsan", "123");
        Person p2 = new Person("lisi", "123");
        Person p3 = new Person("wangwu", "234");
        hashMap.put(p1, p1.getName());
        hashMap.put(p2, p2.getName());
        hashMap.put(p3, p3.getName());
        for (Map.Entry entry : hashMap.entrySet()) {
            System.out.println(entry);
        }
    }
}