diff options
| author | mo khan <mo.khan@gmail.com> | 2020-08-03 19:07:51 -0600 |
|---|---|---|
| committer | mo khan <mo.khan@gmail.com> | 2020-08-03 19:07:51 -0600 |
| commit | f5303d4e0f398323d1d7ee74877d86bee357341b (patch) | |
| tree | 88f6c4295bc4e6fd6bb5e97e0d1a34deb3317f6e /doc | |
| parent | 83718e496d670cb2ae02236630b0ba24a06d523b (diff) | |
Hashing with Chaining
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/mit-ocw/hash.md | 123 |
1 files changed, 123 insertions, 0 deletions
diff --git a/doc/mit-ocw/hash.md b/doc/mit-ocw/hash.md new file mode 100644 index 0000000..da989d3 --- /dev/null +++ b/doc/mit-ocw/hash.md @@ -0,0 +1,123 @@ +# Hashing + +https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/lecture-videos/lecture-8-hashing-with-chaining/ + +movitation + +* database +* compilers and interpreters +* network router +* substring search +* string commonalities +* file/dir synchronization + +Direct Access table: + + ---------- +0 | | +1 | | +2 | | +3 | | +4 | | + | ... | + | ... | + | ... | + ---------- + +Hash requires mapping to an integer. + +ideally: + +* prehash hash(x) == hash(y) only when x == y + +badness: + +1. keys may not be non negative integers +1. gigantic memory hog + +hashing -> hatchet + +- reduce universe of all possible keys and reduce them down to some reasonable set of integers. +- idea: m = theta(n) + - size of table proportial to size of # of things. + + ----------- +0 | | | | | | + ---------- +1 | | + ---------- +2 | | + ---------- +3 | | + ---------- +4 | | + ---------- + | ... | + ---------- + | ... | + ---------- +m-1 | ... | + ---------- + +Collision: Multiple keys map to same slot in hash table. +* h(ki) == h(kj) but ki <> kj +* we can use chaining + +How to define? + +* function (h) +* all keys map to `h = { 0, ..., m-1 }` + +Two ways to deal with collisions: + +* Chaining: store collisions as a list. (i.e. linked list) +* Open addressing + +## Chaining + + ---------- +0 | | --> (item1) --> (item2) --> (nil) + ---------- +1 | | + ---------- +2 | | + ---------- +3 | | + ---------- +4 | | + ---------- + | ... | + ---------- + | ... | + ---------- +m-1 | ... | + ---------- + +Worst case: + +* theta(n) (i.e. all items have a collision) + +Simple uniform hashing: + +* Convenient for analysis. +* each key is equally likely to be hashed to any slot of the table, independent of where other keys hashing. + +Analysis: + +* expected length of a chain for n keys, m slots. + * n/m == alpha == load factor + * O(1 + alpha) + +## Hash functions: + +1. division method: + * `h(k) == k mod m` + * works okay if `m` is prime and not close to power of 2 or power of 10. +2. multiplication method: + * `h(k) = [(a*k) mod 2^w] >> (w - r)` +3. Universal hasing: + * `h(k) = [(ak+b) mod p] mod m + * `a`, `b` are random between 0 and `p-1`. p is a big prime number larger than the universe. + * {0, ..,p-1} + * worst case keys: k1 != k2 + * `1/m` |
