Show HN: cuSBF – Faster GPU Bloom Filter for Sequence Data A new GPU-accelerated Bloom filter implementation, cuSBF, achieves up to 234 times faster k-mer queries and 92 times faster insertions compared to CPU-based Super Bloom filters for DNA and protein sequence analysis. Developed for NVIDIA GPUs with compute capability 8.0 or higher, the header-only C++ library uses minimizer-based shard selection and findere false-positive reduction to optimize memory bandwidth for streaming sequence data. Benchmarks on an RTX PRO 6000 Blackwell GPU show cuSBF outperforming existing GPU filters including GBBF, Cuckoo-GPU, TCF, and GQF by 7.6 to 3,427 times across various metrics. cuSBF is a high-performance GPU implementation of the Super Bloom filter https://www.biorxiv.org/content/10.64898/2026.03.17.712354v1.article-info , optimized for high-throughput batch k-mer insertion and query on nucleotide DNA and protein sequences or any other sequence type as long as a valid alphabet is provided . It exploits the streaming nature of sequence-derived k-mers by using minimizers to group consecutive k-mers sharing the same minimiser into super-k-mers, assigning all k-mers of a super-k-mer to the same 256-bit memory shard. This amortizes random memory accesses across consecutive k-mer queries, reducing memory-bandwidth pressure. The findere scheme further reduces false positives dramatically by inserting overlapping s-mers and requiring a full run of consecutive s-mer matches. - CUDA-accelerated batch k-mer insert and query from sequences - Configurable k-mer length, minimiser width, s-mer width, and hash function count - Minimizer-based shard selection for cache-efficient streaming queries - Findere false-positive reduction via overlapping s-mer membership - Header-only library design - FASTA/FASTQ stream and file support Benchmarks use Config<31, 28, 16, 4 on an NVIDIA RTX PRO 6000 Blackwell GPU. CPU Super Bloom runs on an Intel Xeon W9-3595X with 120 threads. Compared against: CPU Super Bloom https://github.com/EtienneC-K/SuperBloom GPU Blocked Bloom filter GBBF https://github.com/NVIDIA/cuCollections GPU Cuckoo-GPU https://github.com/tdortman/Cuckoo-GPU GPU Bulk Two-Choice Filter TCF https://github.com/saltsystemslab/gpu-filters/tree/main/bulk-tcf GPU Counting Quotient Filter GQF https://github.com/saltsystemslab/gpu-filters/tree/main/gqf | Comparison | Insert | Query | |---|---|---| | cuSBF vs Super Bloom | 92× faster | 234× faster | | cuSBF vs GBBF | 9.1× faster | 7.7× faster | | cuSBF vs Cuckoo-GPU | 80× faster | 8.0× faster | | cuSBF vs TCF | 12× faster | 52× faster | | cuSBF vs GQF | 69× faster | 13× faster | | Comparison | Insert | Query | |---|---|---| | cuSBF vs Super Bloom | 59× faster | 165× faster | | cuSBF vs GBBF | 8.2× faster | 7.6× faster | | cuSBF vs Cuckoo-GPU | 3427× faster | 7.8× faster | | cuSBF vs TCF | 12× faster | 67× faster | | cuSBF vs GQF | 42× faster | 11× faster | | Bits/k-mer | cuSBF s=28 | cuSBF s=30 | cuSBF s=31 | GBBF | |---|---|---|---|---| | 21.4 | 0.848% | 0.951% | 1.593% | 3.069% | | 85.7 | 0.091% | 0.107% | 0.210% | 0.126% | | 342.6 | 0.0095% | 0.0114% | 0.0264% | 0.0273% | - Linux x86 64 or aarch64 with an NVIDIA GPU and driver - CUDA Toolkit = 13.1 - GCC or Clang host compiler C++20 - Meson and Ninja - NVIDIA GPU with compute capability 8.0+ Ampere, Lovelace, Hopper, Blackwell cuSBF is developed and tested on Linux only. WSL2 on Windows with is a reasonable dev environment See NVIDIA docs https://docs.nvidia.com/cuda/wsl-user-guide/index.html . Native Windows and macOS are not supported or tested. The build uses Linux-specific FASTX paths for example mmap and host tooling assumptions GCC/Clang, GNU statement expressions in CUSBF TRY / CUSBF UNWRAP . meson setup build ninja -C build When this repo is the root Meson project, benchmarks , tests , and examples build by default. As a subproject they are skipped unless you force them on. | Option | Type | Default | Description | |---|---|---|---| benchmarks | feature | auto | Google Benchmark binaries | tests | feature | auto | GoogleTest suite | examples | feature | auto | Example CLI | param sweep | feature | disabled | Parameter-sweep binaries large, see below | param sweep alphabet | combo | dna | dna or protein when param sweep is enabled | large fastx tests | feature | disabled | Large generated FASTX test CUSBF LARGE FASTX env vars | Each feature option accepts auto , enabled , or disabled : auto — on for a standalone checkout, off when cuSBF is a subproject enabled / disabled — override regardless of project layout Important Enabling param sweep builds many binaries 208 for the DNA alphabet . Leave it disabled unless you need that sweep. Default standalone build meson setup build Faster configure: library + examples only meson setup build -Dbenchmarks=disabled -Dtests=disabled Subproject consumer forcing tests on meson setup build -Dtests=enabled Parameter sweep meson setup build -Dparam sweep=enabled meson setup build -Dparam sweep=enabled -Dparam sweep alphabet=protein Fallible APIs return cusbf::Result