cd /news/ai-infrastructure/show-hn-cusbf-faster-gpu-bloom-filte… · home topics ai-infrastructure article
[ARTICLE · art-15574] src=github.com pub= topic=ai-infrastructure verified=true sentiment=↑ positive

Show HN: cuSBF – Faster GPU Bloom Filter for Sequence Data

A new GPU-accelerated Bloom filter implementation, cuSBF, achieves up to 234 times faster k-mer queries and 92 times faster insertions compared to CPU-based Super Bloom filters for DNA and protein sequence analysis. Developed for NVIDIA GPUs with compute capability 8.0 or higher, the header-only C++ library uses minimizer-based shard selection and findere false-positive reduction to optimize memory bandwidth for streaming sequence data. Benchmarks on an RTX PRO 6000 Blackwell GPU show cuSBF outperforming existing GPU filters including GBBF, Cuckoo-GPU, TCF, and GQF by 7.6 to 3,427 times across various metrics.

read5 min publishedMay 27, 2026

cuSBF is a high-performance GPU implementation of the Super Bloom filter, optimized for high-throughput batch k-mer insertion and query on nucleotide (DNA) and protein sequences (or any other sequence type as long as a valid alphabet is provided).

It exploits the streaming nature of sequence-derived k-mers by using minimizers to group consecutive k-mers sharing the same minimiser into super-k-mers, assigning all k-mers of a super-k-mer to the same 256-bit memory shard. This amortizes random memory accesses across consecutive k-mer queries, reducing memory-bandwidth pressure. The findere scheme further reduces false positives dramatically by inserting overlapping s-mers and requiring a full run of consecutive s-mer matches.

  • CUDA-accelerated batch k-mer insert and query from sequences
  • Configurable k-mer length, minimiser width, s-mer width, and hash function count
  • Minimizer-based shard selection for cache-efficient streaming queries
  • Findere false-positive reduction via overlapping s-mer membership
  • Header-only library design
  • FASTA/FASTQ stream and file support

Benchmarks use Config<31, 28, 16, 4>

on an NVIDIA RTX PRO 6000 Blackwell GPU. CPU Super Bloom runs on an Intel Xeon W9-3595X with 120 threads.

Compared against:

CPU Super BloomGPU Blocked Bloom filter (GBBF)GPU Cuckoo-GPUGPU Bulk Two-Choice Filter (TCF)GPU Counting Quotient Filter (GQF)

Comparison Insert Query
cuSBF vs Super Bloom 92× faster 234× faster
cuSBF vs GBBF 9.1× faster 7.7× faster
cuSBF vs Cuckoo-GPU 80× faster 8.0× faster
cuSBF vs TCF 12× faster 52× faster
cuSBF vs GQF 69× faster 13× faster
Comparison Insert Query
cuSBF vs Super Bloom 59× faster 165× faster
cuSBF vs GBBF 8.2× faster 7.6× faster
cuSBF vs Cuckoo-GPU 3427× faster 7.8× faster
cuSBF vs TCF 12× faster 67× faster
cuSBF vs GQF 42× faster 11× faster

| Bits/k-mer | cuSBF s=28 | cuSBF s=30 | cuSBF s=31 | GBBF | |---|---|---|---|---| | 21.4 | 0.848% | 0.951% | 1.593% | 3.069% | | 85.7 | 0.091% | 0.107% | 0.210% | 0.126% | | 342.6 | 0.0095% | 0.0114% | 0.0264% | 0.0273% |

  • Linux (x86_64 or aarch64) with an NVIDIA GPU and driver
  • CUDA Toolkit >= 13.1
  • GCC or Clang host compiler (C++20)
  • Meson and Ninja
  • NVIDIA GPU with compute capability 8.0+ (Ampere, Lovelace, Hopper, Blackwell)

cuSBF is developed and tested on Linux only.

WSL2 on Windows with is a reasonable dev environment (SeeNVIDIA docs).Native Windows and macOS are not supported or tested. The build uses Linux-specific FASTX paths (for examplemmap

) and host tooling assumptions (GCC/Clang, GNU statement expressions inCUSBF_TRY

/CUSBF_UNWRAP

).

meson setup build
ninja -C build

When this repo is the root Meson project, benchmarks, tests, and examples build by default. As a subproject they are skipped unless you force them on.

Option Type Default Description
benchmarks
feature auto
Google Benchmark binaries
tests
feature auto
GoogleTest suite
examples
feature auto
Example CLI
param_sweep
feature disabled
Parameter-sweep binaries (large, see below)
param_sweep_alphabet
combo dna
dna or protein when param_sweep is enabled
large_fastx_tests
feature disabled
Large generated FASTX test (CUSBF_LARGE_FASTX_* env vars)

Each feature option accepts auto

, enabled

, or disabled

:

auto

— on for a standalone checkout, off when cuSBF is a subprojectenabled

/disabled

— override regardless of project layout

Important

Enabling param_sweep

builds many binaries (208 for the DNA alphabet). Leave it disabled unless you need that sweep.

meson setup build

meson setup build -Dbenchmarks=disabled -Dtests=disabled

meson setup build -Dtests=enabled

meson setup build -Dparam_sweep=enabled
meson setup build -Dparam_sweep=enabled -Dparam_sweep_alphabet=protein

Fallible APIs return cusbf::Result<T>

(a thin wrapper over cuda::std::expected<T, Error>

). Use return Err(error)

(cuda::std::unexpected<Error>

, deduces Result<T>

) or return Ok()

/ return {}

for Result<void>

. For success with a value, return value

is enough. Two helpers unwrap results:

Macro On failure Use when
CUSBF_TRY(expr)
Copies the error, then return cuda::std::unexpected<Error>(...) from the enclosing function
The caller returns Result (library glue, examples/cusbf-main )
CUSBF_UNWRAP(expr)
throw std::runtime_error(message())
Tests, main , or other code that does not return Result

Both work as statements or in initializers (auto x = CUSBF_UNWRAP(...)

). For full control (typed errors, exit codes), use if (!result)

instead.

#include <cusbf/filter.cuh>

using Config = cusbf::Config<31, 28, 16, 4>;

int main() {
    cusbf::filter<Config> filter(1 << 24);

    CUSBF_UNWRAP(filter.insert_sequence("ACGTACGTACGTACGTACGTACGTACGTACGT"));
    const auto hits = CUSBF_UNWRAP(filter.contains_sequence("ACGTACGTACGTACGTACGTACGTACGTACGT"));

    CUSBF_UNWRAP(filter.insert_fastx_file("reference.fasta"));
    const auto summary = CUSBF_UNWRAP(filter.query_fastx_file("queries.fastq"));

    (void)hits;
    (void)summary;
    return 0;
}

When the caller already returns Result

, use CUSBF_TRY

so failures propagate without exceptions:

[[nodiscard]] cusbf::Result<void> run(cusbf::filter<Config>& filter) {
    CUSBF_TRY(filter.insert_fastx_file("reference.fasta"));
    const auto summary = CUSBF_TRY(filter.query_fastx_file("queries.fastq"));
    (void)summary;
    return cusbf::Ok();
}

Async device APIs, record batches, and streaming FASTX callbacks follow the same pattern. filter.load_factor()

and filter.filter_bits()

are synchronous and do not return Result

.

if (const auto result = filter.query_fastx_file("queries.fastq"); !result) {
    const cusbf::Error& err = result.error();
    std::cerr << err.message() << '\n';
    if (const cusbf::FastxParseError* parse = err.as_fastx_parse()) {
        // parse->location.file / .line / .column
    }
    return 1;
}

CUSBF_CUDA_TRY

wraps CUDA runtime calls into Result<void>

; CUSBF_CUDA_CALL

/ CUSBF_CUDA_ABORT

are for throw/abort paths only.

The Config

template accepts the following parameters:

Parameter Description Default
K
k-mer length (max depends on alphabet) -
S
s-mer width for findere Bloom hash seed (1-K) -
M
Minimiser width for shard selection (1-K) -
HashCount
Number of independent Bloom hash functions (4,8,12,16) 4
CudaBlockSize
CUDA threads per block 256
Alphabet
Symbol encoding (DNA or protein) DnaAlphabet
#include <cusbf/filter.cuh>

using ProteinConfig = cusbf::Config<12, 10, 6, 4, 256, cusbf::ProteinAlphabet>;

[[nodiscard]] cusbf::Result<void> run_protein() {
    cusbf::filter<ProteinConfig> filter(1 << 24);
    CUSBF_TRY(filter.insert_sequence("ACDEFGHIKLMNPQRSTVWY"));
    const auto hits = CUSBF_TRY(filter.contains_sequence("ACDEFGHIKLMNPQRSTVWY"));
    (void)hits;
    return cusbf::Ok();
}
  • E. Conchon-Kerjan, T. Rouzé, L. Robidou, F. Ingels, and A. Limasset, “Super Bloom: Fast and precise filter for streaming k-mer queries,” bioRxiv, 2026, doi: 10.64898/2026.03.17.712354.
  • D. Jünger, K. Kristensen, Y. Wang, X. Yu, and B. Schmidt, “Optimizing Bloom Filters for Modern GPU Architectures.” 2025. [Online]. Available: https://arxiv.org/abs/2512.15595
── more in #ai-infrastructure 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/show-hn-cusbf-faster…] indexed:0 read:5min 2026-05-27 ·