diff options
| author | mo khan <mo.khan@gmail.com> | 2020-08-03 21:18:31 -0600 |
|---|---|---|
| committer | mo khan <mo.khan@gmail.com> | 2020-08-03 21:18:31 -0600 |
| commit | b08b548837785b89af0ccca8f3c0b58aae75ea01 (patch) | |
| tree | 6b74aeb1cff34d48c61e447adfeb5306dbc4ed3e /doc/mit-ocw | |
| parent | f5303d4e0f398323d1d7ee74877d86bee357341b (diff) | |
Add notes on Karp-Rabin algorithm
Diffstat (limited to 'doc/mit-ocw')
| -rw-r--r-- | doc/mit-ocw/hash.md | 49 | ||||
| -rw-r--r-- | doc/mit-ocw/kr.md | 74 |
2 files changed, 118 insertions, 5 deletions
diff --git a/doc/mit-ocw/hash.md b/doc/mit-ocw/hash.md index da989d3..f1aaa13 100644 --- a/doc/mit-ocw/hash.md +++ b/doc/mit-ocw/hash.md @@ -1,8 +1,8 @@ # Hashing -https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/lecture-videos/lecture-8-hashing-with-chaining/ +[video][mit-ocw] -movitation +## Motivation * database * compilers and interpreters @@ -15,16 +15,23 @@ Direct Access table: ---------- 0 | | + ---------- 1 | | + ---------- 2 | | + ---------- 3 | | + ---------- 4 | | + ---------- | ... | + ---------- | ... | + ---------- | ... | ---------- -Hash requires mapping to an integer. +> Hash requires mapping to an integer. ideally: @@ -116,8 +123,40 @@ Analysis: 2. multiplication method: * `h(k) = [(a*k) mod 2^w] >> (w - r)` 3. Universal hasing: - * `h(k) = [(ak+b) mod p] mod m + * `h(k) = [(ak+b) mod p] mod m` * `a`, `b` are random between 0 and `p-1`. p is a big prime number larger than the universe. - * {0, ..,p-1} + * `{0, ..,p-1}` * worst case keys: k1 != k2 * `1/m` + +## How to choose m? + +* Want `m = theta(n)` + * `0(1)` + +Idea: + +> Start small; grow/shrink as necessary + +E.g. + +* If n > m: grow table + +* grow table: + * m -> m' + * make table of size `m'` + * build new hash function `h'` + * rehash + * for item in T: + * t:insert(item) + * `m' = 2m` table doubling + +Amortization: +* operation takes `T(n) amortized` + * if `k` operations take <= `k*T(n)` time +* spread out the high cost so you pay a little bit, but more often. +* ~ Think of meaning `T(n)` on average, where average is taken over all the oeprations. + +> spread out the high cost so that it's cheap on average all the time. + +[mit-ocw]: https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/lecture-videos/lecture-8-hashing-with-chaining/ diff --git a/doc/mit-ocw/kr.md b/doc/mit-ocw/kr.md new file mode 100644 index 0000000..43d663f --- /dev/null +++ b/doc/mit-ocw/kr.md @@ -0,0 +1,74 @@ +# Karp-Rabin string matching algorithm +* https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/lecture-videos/lecture-9-table-doubling-karp-rabin/ + +* given two string `s` & `t`: does `s` occur as a substring of `t`? + +```plaintext +t: [ | | | | | | | | | | | | | | | | ] +s: [ | | | ] + s: [ | | | ] + s: [ | | | ] + s: [ | | | ] + s: [ | | | ] + s: [ | | | ] + s: [ | | | ] + s: [ | | | ] + s: [ | | | ] + s: [ | | | ] + s: [ | | | ] + s: [ | | | ] + s: [ | | | ] + s: [ | | | ] +``` + +Time: + * `theta(|s| * |t| - |s|) + * `theta(|s| * |t|) # length of s * length of t + +Can we use hashing to get this down to linear time? + +* We are looking for `O(|s|+|t|)` length of `s` + `t` is better than `s` times `t`. + +Rolling hash ADT: + +* `r.append(c)`: + * add char.c to end of x +* `r.skip(c)`: + * delete fist character of x (assuming it is c) + + +* `r` maintains a string `x` + * `r()` hash value of x = `h(x)` + +```python +for c in s: rs.append(c) +for c in t[:len(s)]: + rt.append(c) +if rs() == rt(): ... +for i in range(len(s), len(t)): + rt.skip(t[i - len(s)] + rt.append(t[i]) + if rs() == rt: + s == t[i-len(s) + 1: i + i] + if equal: + found match + else: + happens with probability <= 1/|s| +``` + +division method: +* `h(k) = k mod m` where `m` is a random prime >= |s| (prime is greater than length of s) +* treat string x as a multi digit number base is size of alphabet. + * ascii -> 256 + * unicode -> x + +```python +r.append(c): + u -> u * a + ord(c) + r -> r * a + ord(c) mod m + +r.skip(): + u -> u - c * a^|u|-1 +``` + +* https://www.youtube.com/watch?v=qQ8vS2btsxI |
