heavykeeper

Heavykeeper algorithm for Top-K elephant flows

Latest version: 0.7.0 registry icon
Maintenance score
100
Safety score
100
Popularity score
6
Check your open source dependency risks. Get immediate insight about security, stability and licensing risks.
Security
  Vulnerabilities
Version Suggest Low Medium High Critical
0.7.0 0 0 0 0 0
0.6.9 0 0 0 0 0
0.6.8 0 0 0 0 0
0.6.7 0 0 0 0 0
0.6.6 0 0 0 0 0
0.6.5 0 0 0 0 0
0.6.4 0 0 0 0 0
0.6.3 0 0 0 0 0
0.6.2 0 0 0 0 0
0.6.1 0 0 0 0 0
0.6.0 0 0 0 0 0
0.5.1 0 0 0 0 0
0.5.0 0 0 0 0 0
0.4.0 0 0 0 0 0
0.3.1 0 0 0 0 0
0.3.0 0 0 0 0 0
0.2.7 0 0 0 0 0
0.2.6 0 0 0 0 0
0.2.5 0 0 0 0 0
0.2.4 0 0 0 0 0
0.2.3 0 0 0 0 0
0.2.2 0 0 0 0 0
0.2.1 0 0 0 0 0

Stability
Latest release:

0.7.0 - This version is safe to use because it has no known security vulnerabilities at this time. Find out if your coding project uses this component and get notified of any reported security vulnerabilities with Meterian-X Open Source Security Platform

Licensing

Maintain your licence declarations and avoid unwanted licences to protect your IP the way you intended.

MIT   -   MIT License

Not a wildcard

Not proprietary

OSI Compliant


Apache-2.0   -   Apache License 2.0

Not a wildcard

Not proprietary

OSI Compliant



heavykeeper-rs

Crates.io MIT / Apache 2.0 licensed Build Status

📖 Docs

Top-K Heavykeeper algorithm for Top-K elephant flows

This is based on the paper HeavyKeeper: An Accurate Algorithm for Finding Top-k Elephant Flows by Junzhi Gong, Tong Yang, Haowei Zhang, and Hao Li, Peking University; Steve Uhlig, Queen Mary, University of London; Shigang Chen, University of Florida; Lorna Uden, Staffordshire University; Xiaoming Li, Peking University

Example

See examples/basic.rs for a complete example, or examples/ip_files.rs for an example of counting top-k IP flows in a file.

Basic usage:

use heavykeeper::TopK;

// create a new TopK with k=10, width=1000, depth=4, decay=0.9
let mut topk: TopK<String> = TopK::new(10, 1000, 4, 0.9);

// add some items
topk.add("example item", 5);
topk.add("another item", 1);

// check the counts
for node in topk.list() {
    println!("{} {}", node.item, node.count);
}

Variants

The crate ships three top-K sketches that share the same public API (new / with_seed / with_hasher / builder / add / count / contains / list / merge):

Sketch Layout Insert throughput on Zipf(s=1.2), 1M Recall @ φ=0.0005
TopK depth independent rows × width buckets 21.0 Melem / s 0.942
BucketedTopK one bucket of depth cells per key 29.0 Melem / s 0.985
CuckooTopK per-bucket lobby + depth heavy slots, 2-bucket cuckoo 29.8 Melem / s 1.000

Numbers are from cargo bench --bench topk_vs_bucketed at K=100, width=4096, depth=4, decay=0.9 on u64 keys. Recall is from tests/accuracy_compare.rs (paper-style heavy-hitter test, φ = 0.0005, 1 M Zipf samples).

TopK is the canonical implementation from the paper, with its accuracy bounds. BucketedTopK and CuckooTopK are derived variants — they don't carry the paper's row-independence accuracy bounds, but the empirical accuracy on Zipf streams is competitive and often better.

Pick by workload:

  • TopK — when you want the published algorithm and its bounds.
  • BucketedTopK — best general-purpose insert throughput; closest to TopK's cost model with a single bucket per key.
  • CuckooTopK — best accuracy and throughput on heavy-hitter-skewed traffic (the elephant-flow use case). Each bucket has a single lobby cell with probabilistic decay plus depth non-decaying heavy slots; promoted items live in one of two cuckoo candidate buckets and are re-homed on collision via a kick chain (bound configurable via CuckooBuilder::max_kicks, default 8).

All three support seedable construction, custom hashers, and merge between compatible instances. Errors are returned via BuilderError/MergeError enums; the infallible constructors (new, with_seed, with_hasher) trust the caller. Bucket indexing uses an AND-mask fast-path when width is a power of two; pick power-of-two widths in production for the best per-add cost.

Real-world packet trace

examples/ip_files.rs runs all three sketches over a CAIDA-style trace (27.5 M packets, 1.03 M distinct 13-byte flow keys = src IP : src port → dst IP : dst port + protocol). Same K=1000, decay=0.95, equal cell budgets across variants:

Sketch Width × depth Throughput hit_ratio ARE on reported ARE on true top-K
TopK 16384 × 2 14.1 Mpps 0.9270 0.0050 0.0745
BucketedTopK 8192 × 4 18.1 Mpps 0.9860 0.0035 0.0129
CuckooTopK 8192 × (1 + 4 heavy) 17.0 Mpps 0.9990 0.0012 0.0012

Other HeavyKeeper Implementations

Name Language Github Repo
SegmentIO Go https://github.com/segmentio/topk
Aegis Go https://github.com/go-kratos/aegis/blob/main/topk/heavykeeper.go
Tomasz Kolaj Go https://github.com/migotom/heavykeeper
HeavyKeeper Paper C++ https://github.com/papergitkeeper/heavy-keeper-project
Jigsaw-Sketch C++ https://github.com/duyang92/jigsaw-sketch-paper/tree/main/CPU/HeavyKeeper
Redis Bloom Heavykeeper C https://github.com/RedisBloom/RedisBloom/blob/master/src/topk.c
Count-Min-Sketch Rust https://github.com/alecmocatta/streaming_algorithms
Sliding Window HeavyKeeper Go https://github.com/keilerkonzept/topk

Running

Word Count Example

A word count program that demonstrates the HeavyKeeper algorithm can be found at examples/word_count.rs.

Usage

cargo build --example word_count --release
target/release/examples/word_count -k 10 -w 8192 -d 2 -y 0.95 -f data/war_and_peace.txt

Running the basic example

cargo run --example basic --release

Running the IPv4 example

cargo run --example ip_files --release

Run the benchmarks

cargo bench

Benchmark the sample word count app

hyperfine 'target/release/examples/word_count -k 10 -w 8192 -d 2 -y 0.95 -f data/war_and_peace.txt'

Test Data

For information about test data format and how to obtain or generate test data, please see data/README.md.

License

This project is dual licensed under the Apache/MIT license.