c – 如何实现std :: unordered_map
作者:互联网
c++ unordered_map collision handling , resize and rehash
这是我之前提出的一个问题,我看到我对unordered_map的实现方式感到很困惑.我相信很多其他人都会和我分享这种困惑.基于我所知道的信息而不阅读标准:
Every unordered_map implementation stores a linked list to external
nodes in the array of buckets… No, that is not at all the most
efficient way to implement a hash map for most common uses.
Unfortunately, a small “oversight” in the specification of
unordered_map all but requires this behavior. The required behavior is
that iterators to elements must stay valid when inserting or deleting
other elements
我希望有人可以解释实现以及它如何与c标准定义(在性能要求方面)相对应,如果它实际上不是实现哈希映射数据结构的最有效方法,那么如何改进它?
解决方法:
该标准有效地强制要求使用开放散列的std :: unordered_set和std :: unordered_map实现,这意味着一个桶阵列,每个桶都包含一个逻辑(通常是实际)列表的头部.这个要求是微妙的:它是默认最大载荷因子为1.0的结果,并且保证除非增长超过该载荷因子,否则表不会被重新加载:如果没有链接则这是不切实际的,因为与封闭散列的碰撞变得势不可挡负载系数接近1:
23.2.5/15: The
insert
andemplace
members shall not affect the validity of iterators if(N+n) < z * B
, whereN
is the number of elements in the container prior to the insert operation,n
is the number of elements inserted,B
is the container’s bucket count, andz
is the container’s maximum load factor.amongst the Effects of the constructor at 23.5.4.2/1:
max_load_factor()
returns1.0
.
(为了允许最佳迭代而不通过任何空桶,GCC的实现将带有迭代器的桶填充到一个包含所有值的单个链接列表中:迭代器指向紧靠该桶元素之前的元素,因此下一个指针可以是如果删除桶的最后一个值,则重新连接.)
关于你引用的文字:
No, that is not at all the most efficient way to implement a hash map for most common uses. Unfortunately, a small “oversight” in the specification of unordered_map all but requires this behavior. The required behavior is that iterators to elements must stay valid when inserting or deleting other elements
没有“疏忽”……所做的是非常慎重的,并且充分意识到了.确实可以实现其他妥协,但开放式散列/链接方法对于一般用途来说是一种合理的折衷方案,它可以合理地优雅地处理来自普通哈希函数的冲突,对于小型或大型键/值类型来说并不是太浪费,并且处理任意多个插入/擦除对而不会像许多闭合哈希实现那样逐渐降低性能.
作为意识的证据,从Matthew Austern’s proposal here:
I’m not aware of any satisfactory implementation of open addressing in a generic framework. Open addressing presents a number of problems:
• It’s necessary to distinguish between a vacant position and an occupied one.
• It’s necessary either to restrict the hash table to types with a default constructor, and to construct every array element ahead of time, or else to maintain an array some of whose elements are objects and others of which are raw memory.
• Open addressing makes collision management difficult: if you’re inserting an element whose hash code maps to an already-occupied location, you need a policy that tells you where to try next. This is a solved problem, but the best known solutions are complicated.
• Collision management is especially complicated when erasing elements is allowed. (See Knuth for a discussion.) A container class for the standard library ought to allow erasure.
• Collision management schemes for open addressing tend to assume a fixed size array that can hold up to N elements. A container class for the standard library ought to be able to grow as necessary when new elements are inserted, up to the limit of available memory.
Solving these problems could be an interesting research project, but, in the absence of implementation experience in the context of C++, it would be inappropriate to standardize an open-addressing container class.
特别是对于只有插入的表,其数据足够小以便直接存储在存储桶中,未使用的存储桶的方便的标记值,以及良好的散列函数,封闭的散列方法可能大约快一个数量级,并且使用的内存大大减少,但是这不是一般目的.
散列表设计选项及其含义的完整比较和详细说明是S.O.的主题.因为它太宽泛而无法在这里妥善解决.
标签:unordered-map,c,c11,hashmap 来源: https://codeday.me/bug/20190916/1806629.html