aho_corasick

Aho-Corasick algorithm, implemented in Elixir using Erlang's :digraph for the graph structure

Latest version: 0.0.1 registry icon
Maintenance score
0
Safety score
0
Popularity score
70
Check your open source dependency risks. Get immediate insight about security, stability and licensing risks.
Security
  Vulnerabilities
Version Suggest Low Medium High Critical
0.0.1 0 0 0 0 0

Stability
Latest release:

0.0.1 - This version may not be safe as it has not been updated for a long time. Find out if your coding project uses this component and get notified of any reported security vulnerabilities with Meterian-X Open Source Security Platform

Licensing

Maintain your licence declarations and avoid unwanted licences to protect your IP the way you intended.

MIT   -   MIT License

Not a wildcard

Not proprietary

OSI Compliant



AhoCorasick

Aho-Corasick is a cool algorithm. It is useful when you want to search for many things inside an input text all at once.

  • You start with your "dictionary" of terms you're searching for, and build a specialized graph structure representing those search terms (in linear time with the combined length of the dictionary terms).
  • Then you run it against your input text, and it will tell you where all of the search terms appeared (in linear time with the length of the input text plus the combined lengths of any matches found).

For more, see: https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm

How to use this library

graph = AhoCorasick.new(["my", "dictionary", "terms"])

results = AhoCorasick.search(graph, "I wonder if any of the terms from my dictionary appear in this text, and if so, where?")

=> #MapSet<[{"dictionary", 37, 10}, {"my", 34, 2}, {"terms", 23, 5}]>

How to understand the results

The result set contains tuple elements in the following format:

{term_found, start_position, run_length}

Motivation

  • I think this is a cool algorithm and I have used it professionally (in Java code). I used it for searching for open source license text within source code files.
  • I wanted to write some Elixir code and thought this would be a fun challenge. It was.

Caveats

This is the first non-trivial thing I've written in Elixir. I'm not sure if I'm following style conventions, etc. I think it's a little weird that I used Erlang's :digraph to implement this. That means the graph structure is not immutable, it's stored in ETS. However, that did make for an interesting adventure into working with a native Erlang library within Elixir, as well as bridging the gap between functional code and imperative/mutable code.