Name Description Size
memchr.rs ! Provides architecture independent implementations of `memchr` and friends. The main types in this module are [`One`], [`Two`] and [`Three`]. They are for searching for one, two or three distinct bytes, respectively, in a haystack. Each type also has corresponding double ended iterators. These searchers are typically slower than hand-coded vector routines accomplishing the same task, but are also typically faster than naive scalar code. These routines effectively work by treating a `usize` as a vector of 8-bit lanes, and thus achieves some level of data parallelism even without explicit vector support. The `One` searcher also provides a [`One::count`] routine for efficiently counting the number of times a single byte occurs in a haystack. This is useful, for example, for counting the number of lines in a haystack. This routine exists because it is usually faster, especially with a high match count, then using [`One::find`] repeatedly. ([`OneIter`] specializes its `Iterator::count` implementation to use this routine.) Only one, two and three bytes are supported because three bytes is about the point where one sees diminishing returns. Beyond this point and it's probably (but not necessarily) better to just use a simple `[bool; 256]` array or similar. However, it depends mightily on the specific work-load and the expected match frequency. 37698
mod.rs ! Contains architecture independent routines. These routines are often used as a "fallback" implementation when the more specialized architecture dependent routines are unavailable. 8348
packedpair
rabinkarp.rs ! An implementation of the [Rabin-Karp substring search algorithm][rabinkarp]. Rabin-Karp works by creating a hash of the needle provided and then computing a rolling hash for each needle sized window in the haystack. When the rolling hash matches the hash of the needle, a byte-wise comparison is done to check if a match exists. The worst case time complexity of Rabin-Karp is `O(m * n)` where `m ~ len(needle)` and `n ~ len(haystack)`. Its worst case space complexity is constant. The main utility of Rabin-Karp is that the searcher can be constructed very quickly with very little memory. This makes it especially useful when searching for small needles in small haystacks, as it might finish its search before a beefier algorithm (like Two-Way) even starts. [rabinkarp]: https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm 14803
shiftor.rs ! An implementation of the [Shift-Or substring search algorithm][shiftor]. [shiftor]: https://en.wikipedia.org/wiki/Bitap_algorithm 3125
twoway.rs ! An implementation of the [Two-Way substring search algorithm][two-way]. [`Finder`] can be built for forward searches, while [`FinderRev`] can be built for reverse searches. Two-Way makes for a nice general purpose substring search algorithm because of its time and space complexity properties. It also performs well in practice. Namely, with `m = len(needle)` and `n = len(haystack)`, Two-Way takes `O(m)` time to create a finder, `O(1)` space and `O(n)` search time. In other words, the preprocessing step is quick, doesn't require any heap memory and the worst case search time is guaranteed to be linear in the haystack regardless of the size of the needle. While vector algorithms will usually beat Two-Way handedly, vector algorithms also usually have pathological or edge cases that are better handled by Two-Way. Moreover, not all targets support vector algorithms or implementations for them simply may not exist yet. Two-Way can be found in the `memmem` implementations in at least [GNU libc] and [musl]. [two-way]: https://en.wikipedia.org/wiki/Two-way_string-matching_algorithm [GNU libc]: https://www.gnu.org/software/libc/ [musl]: https://www.musl-libc.org/ 33319