redis – hash冲突(Redis – hash conflict)

hash table 也叫做时 “散列表”、哈希表

redis的数据结构也有用到这个数据结构。哈希表用的时数组支持下标随机访问数据的特性,所以哈希表其实就是数组得一种扩展,是由数组演化而来的。

通过hash函数得到的hash值有一下几个特点:

1、hash函数得到的 value值 是一个非负整数

2、如果key相同 通过hash函数得到的 value值肯定相同

3、如果key不相同的话,通过hash函数得到的value不一定不同(涉及到了哈希冲突问题)

相对于哈希冲突而言,业界比较著名的哈希算法函数 MD5、SHA、CRC 都不发完全避免哈希冲突的问题。数组得存储空间有限,也会加大哈希冲突的概率。

以下为解决hash冲突问题的方法:

1、开放寻址法

  开发寻址法的核心思想,hash(key)得到的hash值,在哈希表中找到下标值,发现已经被占用了,就会从存储位置下一个开始进行探测寻找,寻找没有被占用的空闲位置,

找到了位置以后将数据存入。如果发现找到尾部以后都被占用,则会从头开始继续寻址,直到找到为止。数据量大的话,性能会就下降

  查找操作:

      通过hash(key) 查找该hash值对应的数据时,会通过hash函数得到一个hash值,然后比较数组种下标为散列值得数据和要查找的元素是否相等,不相等则顺序往后查找,

如果遍历到空闲位置还没有找到的话就返回为 未找到。

      删除操作:

  在hash表中是否可直接删除一个数据,然后将hash表中直接置为空呢?!其实时不可以的,因为在查找数据的过程中,遇到空闲位置时,则认为该数据不存在hash表中,所以如果在hash表中删除一个

一个数据时将位置 设置为空,会造成查找数据出现问题。解决方案就是将删除的数据置位deleted ,这样在查找数据时就会跳过继续查找下一个位置。

2、链表法

     hash表有一个指标来表示hash冲突严重程度:

    hash表装载因子 = 填入表中的元素个数 / hash表长度;

    “hash表装载因子” 这个值越大表明hash冲突越严重,hash表的性能就会下降的。

下面一个比较常用的解决hash冲突的办法就是 链表法 。在hash表有一个概念叫做hash槽,每个hash槽会对应一个链表,所有的通过hash函数得到的散列值相同的元素会放在相同的hash槽位对应的链表中。

插入数据:

  通过hash函数查询到对应的hash槽位,将数据插入到对应链表的中。这个插入数据的时间复杂度时O(1)

查找、删除数据:

      查找数据、删除数据时,通过hash函数得到的哈希值,找到对应的哈希槽位,遍历槽位对应的链表,找到要删除或查找的数据进行对应的操作。

这个时间复杂度取决于hash槽位对应的链表的长度。

比如:元素个数为n个

        hash槽位有m个

那时间复杂度就是 O(n/m)

————————

Hash table is also called hash table

Redis’s data structure is also useful. The time array used by the hash table supports the characteristics of subscript random access to data, so the hash table is actually an extension of the array, which is evolved from the array.

The hash value obtained by the hash function has the following characteristics:

1. The value obtained by the hash function is a nonnegative integer

2. If the key is the same, the value obtained through the hash function must be the same

3. If the keys are different, the values obtained through the hash function are not necessarily different (hash conflicts are involved)

Compared with hash conflict, the well-known hash algorithm functions MD5, Sha and CRC in the industry do not completely avoid hash conflict. The limited storage space of the array will also increase the probability of hash conflict.

The following are the methods to solve the hash conflict problem:

1. Open addressing method

The core idea of developing the addressing method is to find the subscript value of the hash value obtained by hash (key) in the hash table. If it is found that it has been occupied, it will start from the next storage location to find the unoccupied free location,

After the location is found, the data is stored. If the tail is found to be occupied after it is found, addressing will continue from the beginning until it is found. If the amount of data is large, the performance will decline

Find operation:

When searching the data corresponding to the hash value through the hash (key), a hash value will be obtained through the hash function, and then compare whether the data with the index of the array as the hash value is equal to the element to be searched. If it is not equal, it will be searched later in order,

If the traversal to the free location has not been found, it will return as not found.

Delete operation:

Can you delete a data directly in the hash table and then set the hash table to empty?! In fact, it is not allowed, because in the process of finding data, when an idle location is encountered, it is considered that the data does not exist in the hash table, so if you delete one in the hash table

Setting the location to null when a data is, will cause problems in finding data. The solution is to set the deleted data to deleted, so that the next location will be skipped when looking for data.

2. Linked list method

The hash table has an indicator to indicate the severity of hash conflicts:

Hash table loading factor = number of elements filled in the table / hash table length;

The larger the “hash table loading factor”, the more serious the hash conflict, and the performance of the hash table will decline.

The following common method to solve hash conflicts is the linked list method. In the hash table, there is a concept called hash slot. Each hash slot will correspond to a linked list. All elements with the same hash value obtained through the hash function will be placed in the linked list corresponding to the same hash slot.

Insert data:

Query the corresponding hash slot through the hash function and insert the data into the corresponding linked list. The time complexity of inserting data is O (1)

Find and delete data:

When searching and deleting data, find the corresponding hash slot through the hash value obtained by the hash function, traverse the linked list corresponding to the slot, find the data to be deleted or searched, and perform the corresponding operation.

The time complexity depends on the length of the linked list corresponding to the hash slot.

For example, the number of elements is n

There are m hash slots

The time complexity is O (n / M)