{"slug": "how-to-analyze-duplicate-processing-in-an-async-flow", "title": "How to analyze duplicate processing in an async flow", "summary": "A developer analyzes duplicate processing in asynchronous flows, defining it as a side effect being applied more than once. The developer proposes an abstract model with evidence checking and atomic side-effect application, emphasizing that deduplication requires making evidence atomic with the effect, visible to all, and tied to a unique identifier.", "body_md": "In one line: deduplication is about the\n\nevidence that a side effect has been applied— make it atomic with the effect, visible to everyone, and tied to a unique identifier.This is just how I think it out — not a tutorial, not the final answer. I'm sharing my reasoning, and I'd love to hear where it breaks.\n\nWait a minute pls. Let's step back on this topic and not get stuck in some tech solutions.\n\nSo, my questions:\n\nBy the way, I tried to enumerate all the failure scenarios the typical implementations aim to prevent, but I gave up — there are too many possibilities to list them all. So I'll start from the essence of duplicate processing instead.\n\nSo, what is duplicate processing at its core?\n\nIt's not about how many times a message is **delivered**. It's about how many times the **side effect** is **applied**.\n\nSo at its root: **duplicate processing means the same logical intent has its side effect applied more than once.**\n\nAnd one more thing: even when duplication happens, it only causes damage if the side effect is **not** idempotent. An idempotent side effect makes a duplicate harmless — but still wasteful, and real business logic is often hard to make idempotent. So idempotency isn't our goal here; the discussion below does not assume it.\n\nNow, instead of jumping to solutions, let's think the other way around: **under what conditions does the side effect get applied more than once?**\n\nHere is what I think — it happens if any of these is true:\n\nNotice something: each condition above is just a way the guarantee breaks. So if we **flip them around**, we get the boundaries that guarantee non-duplication.\n\nFlipping the failure conditions, here are the boundaries:\n\nAfter that, we can decide what the orchestrator and collaborators look like.\n\n**Collaborators:**\n\n**Orchestrator:**\n\n```\npublic class ConsumerHandler {\n\n    private EvidenceChecker checker;\n    private SideEffectHandler handler;\n    private OffsetCommitter committer;\n\n    public void consume(Message message, CommitHandle handle) {\n        log.info(...);\n\n        // Has the side effect for this identifier already been applied?\n        boolean alreadyApplied = checker.check(message.identifierKey());\n\n        // Already applied — skip the work, just commit and return.\n        if (alreadyApplied) {\n            committer.commit();\n            return;\n        }\n\n        handler.handle(message -> {\n            // Within the same atomic boundary:\n            // 1. apply the side effect (business logic)\n            // 2. write the evidence record for this identifier\n        });\n\n        committer.commit();\n    }\n}\n```\n\nTake care of this:\n\n- We haven't discussed any concrete tech (RDBMS, Redis) yet.\n- The early\n`alreadyApplied`\n\ncheck is aperformance optimization, not a correctness guarantee. Even with an idempotent side effect, reprocessing a duplicate still wastes resources — CPU, DB calls, external requests — so the check lets us skip that work and return fast. But it does NOT prevent duplication itself: a check-then-act still has a race window. The real guarantee comes from theunique constraintwhen the evidence record is written atomically.- No matter what MQ we use (Kafka, RabbitMQ, or something else), the consumer always needs the\nmessageand away to commit/confirmit — otherwise it can't know the message was consumed. That's why`consume(Message message, CommitHandle handle)`\n\nis written like this.\n\nSo far everything is still **technology-agnostic** — we went from the essence, to the failure conditions, to the boundaries, and finally to an abstract collaboration model. No Redis, no RDBMS yet.\n\nThe abstract model is clean. Reality usually isn't. The `handler.handle(...)`\n\nabove still treats business logic as a black box — and that box might not be simple. When the side effect is more than one step, what does its evidence record look like then?\n\nSo I'll leave it here as a question:\n\n**What problems do you think are still hiding? What would you have to design or reason about next?** And if you'd leave any comment to help refine this post, feel free to let me know — thanks in advance.\n\nThe point of this post: **find the boundaries first, and every later solution has a place to fit.**", "url": "https://wpnews.pro/news/how-to-analyze-duplicate-processing-in-an-async-flow", "canonical_source": "https://dev.to/ethan-zhang-dev/how-to-analyze-duplicate-processing-in-an-async-flow-510j", "published_at": "2026-06-29 04:42:49+00:00", "updated_at": "2026-06-29 05:27:30.035730+00:00", "lang": "en", "topics": ["developer-tools"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/how-to-analyze-duplicate-processing-in-an-async-flow", "markdown": "https://wpnews.pro/news/how-to-analyze-duplicate-processing-in-an-async-flow.md", "text": "https://wpnews.pro/news/how-to-analyze-duplicate-processing-in-an-async-flow.txt", "jsonld": "https://wpnews.pro/news/how-to-analyze-duplicate-processing-in-an-async-flow.jsonld"}}