cd /news/developer-tools/how-to-analyze-duplicate-processing-… · home topics developer-tools article
[ARTICLE · art-43003] src=dev.to ↗ pub= topic=developer-tools verified=true sentiment=· neutral

How to analyze duplicate processing in an async flow

A developer analyzes duplicate processing in asynchronous flows, defining it as a side effect being applied more than once. The developer proposes an abstract model with evidence checking and atomic side-effect application, emphasizing that deduplication requires making evidence atomic with the effect, visible to all, and tied to a unique identifier.

read3 min views1 publishedJun 29, 2026

In one line: deduplication is about the

evidence that a side effect has been applied— make it atomic with the effect, visible to everyone, and tied to a unique identifier.This is just how I think it out — not a tutorial, not the final answer. I'm sharing my reasoning, and I'd love to hear where it breaks.

Wait a minute pls. Let's step back on this topic and not get stuck in some tech solutions.

So, my questions:

By the way, I tried to enumerate all the failure scenarios the typical implementations aim to prevent, but I gave up — there are too many possibilities to list them all. So I'll start from the essence of duplicate processing instead.

So, what is duplicate processing at its core?

It's not about how many times a message is delivered. It's about how many times the side effect is applied.

So at its root: duplicate processing means the same logical intent has its side effect applied more than once.

And one more thing: even when duplication happens, it only causes damage if the side effect is not idempotent. An idempotent side effect makes a duplicate harmless — but still wasteful, and real business logic is often hard to make idempotent. So idempotency isn't our goal here; the discussion below does not assume it.

Now, instead of jumping to solutions, let's think the other way around: under what conditions does the side effect get applied more than once?

Here is what I think — it happens if any of these is true:

Notice something: each condition above is just a way the guarantee breaks. So if we flip them around, we get the boundaries that guarantee non-duplication.

Flipping the failure conditions, here are the boundaries:

After that, we can decide what the orchestrator and collaborators look like.

Collaborators:

Orchestrator:

public class ConsumerHandler {

    private EvidenceChecker checker;
    private SideEffectHandler handler;
    private OffsetCommitter committer;

    public void consume(Message message, CommitHandle handle) {
        log.info(...);

        // Has the side effect for this identifier already been applied?
        boolean alreadyApplied = checker.check(message.identifierKey());

        // Already applied — skip the work, just commit and return.
        if (alreadyApplied) {
            committer.commit();
            return;
        }

        handler.handle(message -> {
            // Within the same atomic boundary:
            // 1. apply the side effect (business logic)
            // 2. write the evidence record for this identifier
        });

        committer.commit();
    }
}

Take care of this:

  • We haven't discussed any concrete tech (RDBMS, Redis) yet.
  • The early alreadyApplied

check is aperformance optimization, not a correctness guarantee. Even with an idempotent side effect, reprocessing a duplicate still wastes resources — CPU, DB calls, external requests — so the check lets us skip that work and return fast. But it does NOT prevent duplication itself: a check-then-act still has a race window. The real guarantee comes from theunique constraintwhen the evidence record is written atomically.- No matter what MQ we use (Kafka, RabbitMQ, or something else), the consumer always needs the messageand away to commit/confirmit — otherwise it can't know the message was consumed. That's whyconsume(Message message, CommitHandle handle)

is written like this.

So far everything is still technology-agnostic — we went from the essence, to the failure conditions, to the boundaries, and finally to an abstract collaboration model. No Redis, no RDBMS yet.

The abstract model is clean. Reality usually isn't. The handler.handle(...)

above still treats business logic as a black box — and that box might not be simple. When the side effect is more than one step, what does its evidence record look like then?

So I'll leave it here as a question:

What problems do you think are still hiding? What would you have to design or reason about next? And if you'd leave any comment to help refine this post, feel free to let me know — thanks in advance.

The point of this post: find the boundaries first, and every later solution has a place to fit.

── more in #developer-tools 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/how-to-analyze-dupli…] indexed:0 read:3min 2026-06-29 ·