{"slug": "integration-testing-on-jvm", "title": "Integration Testing on JVM", "summary": "Integration tests for JVM web services should be placed in a separate module to keep fast unit tests fast and ensure honest dependency boundaries. The environment—including databases, message brokers, and HTTP stubs—must start before the server, with all dependencies binding to ephemeral ports to avoid conflicts and enable parallel execution.", "body_md": "Unit tests tell me a function does what I think it does. They don’t tell me my service starts, binds its ports, reads its config, talks to a database, consumes from Kafka, and survives an LLM provider returning a 503 mid-stream. That second category is where most production incidents live, and it’s the one I care about most.\n\n**This post lays out the integration-testing process I’ve converged on for JVM web services. The examples come from three repositories you can clone and run: koog-spring-boot-assistant (Spring Boot + WebFlux), quarkus-assistant-demo (Quarkus), and Mokksy (for the Docker-image variant). They’re Kotlin because suspending functions and a fluent DSL make these tests pleasant to read — but everything here applies to Java too, and with virtual threads on Java 21+ you get the same ergonomics without coroutines.**\n\nAssume a typical service: a REST API, perhaps a WebSocket or messaging endpoint (Kafka/SQS), a database, and an outbound dependency or two — here, an LLM provider. The system under test (SUT) is a real, booted application, not a sliced `@WebMvcTest`\n\ncontext.\n\n## Put end-to-end tests in their own module\n\nThe decision that pays off most: integration tests live in a **separate module**, not in `src/test`\n\nalongside your unit tests.\n\nIn the koog repository the root `pom.xml`\n\ndeclares two modules:\n\n```\n1<modules>\n2    <module>app</module>\n3    <module>integration-tests</module>\n4</modules>\n```\n\nThe `integration-tests`\n\nmodule depends on `app`\n\nas a black box. It builds the application, then drives it from the outside over HTTP and WebSocket — the same surface a real client sees. No reaching into Spring beans, no `@MockBean`\n\n, no shared application-context tricks.\n\nThree reasons the separation earns its keep:\n\n**The two suites run on different clocks.** Unit tests are cheap and run on every save. Integration tests boot a real app and cost real seconds. Mix them, and your fast feedback loop inherits the slow suite’s startup cost.**Failsafe and Surefire already want this split.** Maven’s convention runs unit tests in`test`\n\n(Surefire) and integration tests in`verify`\n\n(Failsafe).`mvn verify`\n\nruns everything;`mvn test`\n\nstays fast.**The dependency direction stays honest.** The test module can only see the public surface, which stops you from accidentally testing implementation details.\n\nOn Gradle, the same idea maps to a dedicated source set or a separate subproject. The module boundary is the point, not the build tool.\n\n## Bring the Environment up before the Server\n\nThere are two layers of infrastructure, and the order matters: **the Environment starts first, the Server second.**\n\nThe Environment aggregates everything the SUT depends on: a database, a Kafka or SQS simulator, an HTTP stub for third-party APIs ([WireMock](https://wiremock.org/) or [Mokksy](https://github.com/mokksy/mokksy)), and — for an AI service — an LLM simulator. In the koog repository the LLM side is [ ai-mocks](https://github.com/mokksy/mokksy), Mokksy’s OpenAI-shaped mock server:\n\n```\n 1object TestEnvironment {\n 2    val mockOpenai = MockOpenai(verbose = true)\n 3\n 4    init {\n 5        Awaitility.setDefaultTimeout(5.seconds.toJavaDuration())\n 6        Awaitility.setDefaultPollDelay(500.milliseconds.toJavaDuration())\n 7        Awaitility.setDefaultPollInterval(500.milliseconds.toJavaDuration())\n 8\n 9        System.setProperty(\"OPENAI_API_KEY\", \"dummyOpenAIKey\")\n10        System.setProperty(\"spring.profiles.active\", \"test\")\n11    }\n12}\n```\n\nEvery dependency binds to an ephemeral port. Don’t hardcode `5432`\n\nor `9092`\n\n— let the OS assign a free port and read it back. This is what lets the full suite run on a laptop while Docker is busy with three other projects, and it’s a hard requirement for parallel CI.\n\n**Real downstreams belong in Testcontainers.** For dependencies you can’t fake faithfully — a real Postgres, a real Redis — the Environment starts them as containers and hands their mapped ports to the Server. Start them individually, or point Testcontainers at a\n\n`docker-compose.yml`\n\nso your test topology and your local-dev topology are the same file. One source of truth beats two that drift apart.For Kafka, reach for ** Redpanda** rather than the full Kafka image. It’s Kafka-API-compatible, starts in a second or two instead of waiting on a ZooKeeper/KRaft dance, and Testcontainers ships a first-class\n\n`RedpandaContainer`\n\n. On a suite where startup time is the budget, that swap alone buys back minutes.The LLM simulator deserves a callout, because it changes what “integration test” even means for an AI service. A real model is slow, nondeterministic, and costs money per call. Mokksy lets me assert on the *request* the app sends and script the *response* — including token-by-token streaming and deliberate failures. A flaky, expensive dependency becomes a fast, deterministic one I fully control.\n\n## Simulate external services; test the real contract separately\n\nThat phrase — *one I fully control* — is the whole argument, and it’s worth dwelling on, because the alternative is a trap I’ve watched good teams fall into.\n\nAt one payment service provider, the integration suite ran against **real bank sandboxes**, and in a few places against production environments with designated test accounts. Those tests are genuinely valuable: they’re the only thing that catches real integration problems and silent API or behavior drift on the bank’s side — the contract changing under you without a changelog. I wouldn’t give that signal up.\n\nBut as your *primary* test suite, they’re a liability. You don’t control the external service, so when the sandbox is down for maintenance, your build is red and it’s not your fault. You can’t make it return a 500, time out, or respond slowly on demand, so the failure paths — the ones that matter most in a payment system — go untested. And the latency and flakiness leak straight into your wall-clock budget.\n\nSo I split the two concerns. The bulk of behavior — happy paths, business rules, and especially failure injection — runs against a simulator I control, on every PR. A small, clearly labeled set of **contract tests against the real sandbox** runs on a schedule (nightly, or pre-release), gated behind an environment flag so it never blocks a PR. The simulator tells me my service behaves correctly; the scheduled contract tests tell me the real service still matches the assumptions my simulator encodes. When the two disagree, that’s the drift you actually wanted to catch — and it surfaces in an isolated, expected place instead of randomly reddening someone’s unrelated PR.\n\nThis exact itch is why I built ** Mokksy** and its LLM-focused layer,\n\n**AI-Mocks**. I wanted a mock server I could assert requests against, script precise responses for, and — crucially — instruct to fail, stall, or stream on command, which is what makes the failure-injection tests later in this post possible at all. A faithful simulator you own is worth more day to day than a real dependency you merely borrow.\n\n## Feed the Environment’s ports into the Server\n\nOnce the Environment is up, you hold a bag of bound ports. The Server needs them as configuration *before* it boots: set them as system properties or environment variables, read them through your normal config mechanism, and the app never knows it’s under test.\n\nThe koog demo does this in the `Server`\n\ninitializer:\n\n```\n 1object Server {\n 2    val port: Int\n 3        get() = (applicationContext as ReactiveWebServerApplicationContext)\n 4            .webServer.port\n 5\n 6    private var applicationContext: ApplicationContext\n 7\n 8    init {\n 9        System.setProperty(\n10            \"ai.koog.openai.base-url\",\n11            TestEnvironment.mockOpenai.baseUrl(),\n12        )\n13\n14        applicationContext = SpringApplication.run(\n15            Application::class.java,\n16            \"--server.port=0\",\n17            \"--spring.profiles.active=test\",\n18        )\n19    }\n20    // ...\n21}\n```\n\n`--server.port=0`\n\napplies the same ephemeral-port trick to the SUT itself, and `webServer.port`\n\nreads back whatever the OS assigned. The property is set *before* `SpringApplication.run`\n\n— configuration has to be in place before the context refreshes.\n\nRaw `System.setProperty`\n\nworks, but it leaks: anything you set stays set for the rest of the JVM and can poison later tests. Two libraries close that gap:\n\nscopes system properties and environment variables to a test or lifecycle, then restores them afterward. No bleed-over between tests.[system-stubs](https://github.com/webcompere/system-stubs)offers a small, typed helper for reading test configuration and env vars, instead of scattering[finchly](https://github.com/kpavlov/finchly)`System.getenv`\n\ncalls across the suite.\n\nFor deciding *whether* a test runs at all, ** JUnit Pioneer** is the tool I reach for; the koog repository pulls it in. Use\n\n`@EnabledIfEnvironmentVariable`\n\nto skip the LLM-hitting tests when no API key is present, `@RetryingTest`\n\nfor the genuinely network-bound cases, and environment-driven toggles to run the full matrix on CI but a fast subset locally. Gating is what keeps the local suite honest about its time budget.## Boot the Server once per JVM\n\nSpring Boot takes a few seconds to start. Booting it per test method would blow any budget you set, so the Server is a **JVM-wide singleton that boots in a static initializer** — it comes up exactly once, before JUnit instantiates any test class.\n\nIn Kotlin, `object`\n\ngives you this for free: a lazily-initialized singleton whose `init`\n\nblock runs the first time the base test class references it. That’s why `TestEnvironment`\n\nand `Server`\n\nabove are `object`\n\ns rather than classes. In Java, a `static final`\n\nfield or a JUnit extension with a static-scoped store does the same job.\n\nRunning the SUT in the same JVM as the test buys a quietly enormous benefit: **you can debug the whole stack by running an integration test in debug mode.** Set a breakpoint in the test, set another deep inside a controller or service, hit debug, and both stop. No remote-debug agent, no attaching to a separate process, no port juggling — the test drives a real request through real application code, and you step straight through it. This single property has saved me more time than any other part of the setup.\n\n“Booted” and “ready to serve traffic” are not the same thing, though. The context can be up while the HTTP listener, the connection pool, or a Kafka consumer is still warming. So I never `Thread.sleep`\n\nand hope — I **probe a real endpoint with Awaitility** until it answers correctly:\n\n```\n1fun awaitServerIsRunning() {\n2    val chatClient = ChatClient(port)\n3    await\n4        .ignoreExceptions()\n5        .until {\n6            runBlocking { chatClient.version() == \"1.0\" }\n7        }\n8}\n```\n\n`ignoreExceptions()`\n\nis doing real work here: during startup the endpoint throws connection-refused, which is expected rather than a failure. Awaitility swallows those and keeps polling until the version endpoint returns the value that means “fully wired.” This is your readiness check, and it has the same shape as a Kubernetes readiness probe — a useful property, since you’re exercising the signal your orchestrator will rely on in production.\n\n## Wrap the SUT in a test client that reads like the domain\n\nOnce the Server answers, wrap the raw HTTP client in a small **test-client abstraction** — a DSL that speaks the language of the feature, not of HTTP. Tests should talk about *sending a message and getting an answer*, not about content-type headers and status codes.\n\nHere’s the chat client from the koog repository, trimmed to essentials:\n\n```\n 1class ChatClient(val port: Int) : ChatSession {\n 2    private val client = HttpClient {\n 3        install(ContentNegotiation) { json() }\n 4    }\n 5\n 6    suspend fun sendMessage(\n 7        message: String,\n 8        requestId: String? = \"REQ_${Uuid.random().toHexString()}\",\n 9        expectedStatusCode: HttpStatusCode = HttpStatusCode.OK,\n10    ): Answer {\n11        val response = client.post(\"http://localhost:$port/api/chat\") {\n12            contentType(ContentType.Application.Json)\n13            setBody(ChatRequest(chatRequestId = requestId, message = message))\n14        }\n15        response.status shouldBe expectedStatusCode\n16        val answer = response.body<Answer>()\n17        answer.chatRequestId shouldBe requestId   // correlation check, always\n18        return answer\n19    }\n20}\n```\n\nTwo details carry weight. First, the default `requestId`\n\nis a fresh UUID per call — the reason for that comes up shortly. Second, the client asserts the response echoes back the request ID it sent. That correlation check lives in the client, so every test gets it for free and no test can accept another’s answer by accident.\n\nI usually extract a `ChatSession`\n\ninterface so the same tests run against both REST and WebSocket transports. The WebSocket client implements the same `sendMessage`\n\n/ `sendMessageStreaming`\n\ncontract, and the tests barely change.\n\n## Let the base class assert readiness\n\nA small abstract base class holds the shared wiring and the per-test guardrails:\n\n```\n 1abstract class AbstractIntegrationTest {\n 2    protected val mockOpenai = TestEnvironment.mockOpenai\n 3    protected val server = Server\n 4    protected val chatClient = ChatClient(server.port)\n 5\n 6    @BeforeEach\n 7    fun awaitServer() {\n 8        server.awaitServerIsRunning()\n 9    }\n10\n11    @AfterEach\n12    fun afterEach() {\n13        mockOpenai.verifyNoUnmatchedRequests()\n14    }\n15}\n```\n\n`@BeforeEach`\n\nre-asserts readiness — cheap once the server is up, and a loud failure if a previous test left things in a bad state. `@AfterEach`\n\nverifies the LLM mock saw no unexpected calls, which catches a whole class of bugs where the app makes a request you never anticipated. Keep this class small; it’s infrastructure, not a place for test logic.\n\n## Write tests that fit on one screen\n\nA test you can’t take in at a glance is a test you can’t trust when it goes red at 5pm. I optimize hard for readability: each test sets up its mocks, does one thing, and asserts on the result. If it spills past one screen, it’s doing too much. With JUnit 6 the suspend test methods stay flat — no nesting inside a `runTest { }`\n\nblock.\n\n**Happy path** — script the success case and assert the answer:\n\n```\n 1class AiChatPositiveTest : AbstractIntegrationTest() {\n 2    @Test\n 3    suspend fun `Should answer a Question`() {\n 4        val seed = nextInt()\n 5        val question = \"To be or not to be, $seed?\"\n 6        val expectedAnswer = \"It's a good question: $question\"\n 7\n 8        mockOpenai.moderation { inputContains(question) } responds { flagged = false }\n 9        mockOpenai.completion {\n10            userMessageContains(question)\n11        } respondsStream { responseFlow = flowOf(expectedAnswer) }\n12\n13        val response = chatClient.sendMessage(question)\n14\n15        response.message.trim() shouldBe expectedAnswer\n16    }\n17}\n```\n\n**Negative path** — the interesting failures are usually business rules, not crashes. Here moderation flags the input and the app must refuse gracefully:\n\n```\n1mockOpenai.moderation { inputContains(question) } responds {\n2    flagged = true\n3    category(ModerationCategory.VIOLENCE, 0.9)\n4}\n5\n6val response = chatClient.sendMessage(question)\n7response.message.trim() shouldBe \"Forgive me, but your message defies our guidelines.\"\n```\n\n**Dependency failures** — your service should survive every dependency returning every error code. That’s what parameterized tests are for, and it’s where the LLM simulator earns its keep: you can’t easily make the real OpenAI API return a 418.\n\n```\n 1@ParameterizedTest\n 2@ValueSource(ints = [400, 401, 403, 404, 418, 500, 503])\n 3suspend fun `Should handle LLM request failure`(errorStatusCode: Int) {\n 4    val question = \"To be or not to be, ${nextInt()}?\"\n 5\n 6    mockOpenai.moderation { inputContains(question) } responds { flagged = false }\n 7    mockOpenai.completion {\n 8        userMessageContains(question)\n 9    } respondsError { httpStatusCode = errorStatusCode }\n10\n11    val response = chatClient.sendMessage(question, expectedStatusCode = HttpStatusCode.OK)\n12    response.message shouldBe \"Alas, I cannot help thee now.\"  // graceful degradation\n13}\n```\n\nOne small class, seven failure modes, and a clear contract: a broken upstream never reaches the user as a 500.\n\n**Flow tests** — for streaming or multi-step interactions, assert on the *sequence* and its *timing*. The WebSocket test scripts a delay between chunks and checks the response actually streamed rather than arriving in one lump:\n\n```\n 1mockOpenai.completion { userMessageContains(question) } respondsStream {\n 2    responseFlow = expectedTokens.asFlow().onEach { delay(500.milliseconds) }\n 3}\n 4\n 5val (tokens, duration) = measureTimedValue {\n 6    wsClient.sendMessageStreaming(question).map { it.message }.toList()\n 7}\n 8\n 9tokens shouldBe expectedTokens\n10duration shouldBeGreaterThanOrEqualTo (500.milliseconds * tokens.size)\n```\n\n## Don’t seed the database for anything your API can create\n\nA tempting shortcut is to insert rows straight into the database in `@BeforeEach`\n\nso the data is simply there. Resist it. **Anything your application can create through its API should be created through its API.** A row inserted behind the app’s back skips validation, skips events, and skips the exact code path a real client hits — so the test passes while the create endpoint quietly rots. Build the fixture by calling `POST /things`\n\n, and you exercise the creation path for free, every time.\n\nThe exception is data that is genuinely outside your service’s scope. You’re testing your service, not the identity provider, so a well-known test user, a fixed API key, or a seeded tenant that your auth layer expects to exist is fair to provision directly, or through the dependency’s own setup. The line is ownership: if your service owns the lifecycle of that data, create it through your service; if it merely consumes data another system owns, stub or seed it and move on.\n\n## Design every test for parallel execution\n\nThis is the hinge of the whole approach. Unit tests are cheap, so you can have thousands of independent ones. Integration tests are expensive, because the Environment takes seconds to come up — so the suite *must* run in parallel to stay inside budget. JUnit makes this a configuration flag:\n\n```\n1junit.jupiter.execution.parallel.enabled=true\n2junit.jupiter.execution.parallel.config.strategy=dynamic\n3junit.jupiter.execution.parallel.mode.default=concurrent\n```\n\nThe moment tests run concurrently against a shared, long-lived SUT, **they will interfere with each other.** That’s not a risk to mitigate; it’s a property to design around. Two rules make it work.\n\n**Rule 1 — every test uses unique data, and verifies that uniqueness in the response.** The `seed = nextInt()`\n\nbaked into every question above isn’t decoration; it guarantees test A never matches test B’s mock or reads test B’s answer. The request-ID correlation check in `ChatClient`\n\nis the other half: a test only accepts a response carrying *its own* ID. Unique in, verified unique out.\n\n**Rule 2 — never assume the size or contents of a shared collection.** If twenty tests create records concurrently, `list().size shouldBe 1`\n\nis a guaranteed flake. Assert that *your* record is present and *your* deleted record is absent — never the total count.\n\nThis reshapes the CRUD lifecycle test. You don’t assert on global state; you trace your own entity through it:\n\n**List**→ your ID is*not*present (don’t assert the list is empty).**Create**→ 201/202.** Get + List**→ busy-wait until your ID appears.** Delete**→ 202.** Get**→ busy-wait until 404.** List**→ your ID is gone.\n\n## Embrace eventual consistency instead of fighting it\n\nReal systems are asynchronous. A create returns `202 Accepted`\n\nand the write propagates afterward; an event fires and a projection updates a beat later. A test that does `create()`\n\nthen immediately `get()`\n\nexpecting `200`\n\nis testing a race — and it will lose that race on a loaded CI box.\n\nSo I build eventual consistency into the tests. Every read-after-write becomes a poll rather than a single assertion, with Awaitility handling the busy-wait under a sane timeout:\n\n```\n1await.atMost(5.seconds).untilAsserted {\n2    val found = client.get(id)\n3    found.status shouldBe HttpStatusCode.OK\n4    found.body<Record>().requestId shouldBe myRequestId\n5}\n```\n\nThis is slower per step than an instant assertion, and that’s fine: it’s correct, and it mirrors how clients actually consume your API. The time budget comes from parallelism, not from skipping the wait.\n\n## For messaging: drain into memory, search by predicate\n\nMessaging tests need one crucial twist over HTTP. The naive shape — “poll the topic, expect to see my event” — breaks under parallelism. If test A’s `poll()`\n\nhappens to pull test B’s message, that message is consumed and gone; test B then waits forever for an event it will never see. The broker’s at-least-once guarantee can’t help you when your own test code drops the message on the floor.\n\nThe fix: **drain the topic continuously into an in-memory buffer, and let each test search that buffer by predicate.** Start one consumer per topic when the Environment comes up, run it on a background thread, and append every message to a lock-free concurrent queue. Tests then query the buffer, not the broker:\n\n```\n 1class CapturingConsumer<T>(topic: String, bootstrap: String, parse: (String) -> T) {\n 2    // Lock-free, O(1) appends, weakly-consistent iteration that's safe to\n 3    // scan while the background thread is still writing.\n 4    private val messages = ConcurrentLinkedQueue<T>()\n 5\n 6    init {\n 7        thread(isDaemon = true, name = \"test-consumer-$topic\") {\n 8            val consumer = KafkaConsumer<String, String>(/* ... */).apply {\n 9                subscribe(listOf(topic))\n10            }\n11            while (!Thread.interrupted()) {\n12                consumer.poll(Duration.ofMillis(200))\n13                    .forEach { messages.add(parse(it.value())) }\n14            }\n15        }\n16    }\n17\n18    fun awaitMessage(predicate: (T) -> Boolean): T =\n19        await.atMost(10.seconds).until(\n20            { messages.firstOrNull(predicate) },\n21            notNullValue(),\n22        )!!\n23}\n```\n\nThe test stays small and obvious:\n\n```\n1chatClient.placeOrder(orderId = myId)\n2\n3val event = orderEvents.awaitMessage { it.orderId == myId }\n4event.status shouldBe \"PLACED\"\n```\n\nThree properties fall out for free: the broker stays drained, so nothing backs up; no message is lost, because the consumer never stops reading; and parallel tests don’t fight over `poll()`\n\ncalls, because they all scan the same shared buffer and filter for their own request ID.\n\nWhen the SUT *consumes* a topic rather than producing one, flip the pattern: publish a test event into the topic, then poll the SUT’s API until the side effect appears. Either way you assert on what actually crossed the broker — not on an in-process publisher capture that proves only that your code called `publish()`\n\n.\n\n## When you can’t boot the app in-process: run it in a container\n\nThe in-process Server is my default, largely for that debugging benefit. But sometimes it isn’t an option — the app isn’t a JVM process you can call `SpringApplication.run`\n\non, or you specifically want to test the **Docker image you’re about to ship**, not just the code inside it. The architecture survives the switch almost untouched: keep the Environment, keep the test client, keep the unique-data discipline, and change only *how the SUT comes up*.\n\nThis is exactly how Mokksy verifies its own published image. An abstract base class holds every behavioral test and exposes a single `getBaseUrl()`\n\n:\n\n```\n 1@TestInstance(TestInstance.Lifecycle.PER_CLASS)\n 2public abstract class AbstractFileConfigIT {\n 3    protected abstract String getBaseUrl();\n 4\n 5    @Test\n 6    void post_withBodyMatch_returnsConfiguredStatusAndHeaders() throws Exception {\n 7        var response = post(\"/things\", \"{\\\"id\\\":\\\"42\\\"}\");\n 8        assertThat(response.statusCode()).isEqualTo(201);\n 9        assertThat(response.headers().firstValue(\"Location\")).hasValue(\"/things/42\");\n10    }\n11    // ...the rest of the contract tests\n12}\n```\n\nOne subclass runs the server in-process. Another, [ DockerJavaIT](https://github.com/mokksy/mokksy/blob/main/integration-tests/src/jvmTest/java/dev/mokksy/it/DockerJavaIT.java), runs the\n\n*actual built image*via Testcontainers and overrides nothing but the base URL:\n\n```\n 1@TestInstance(TestInstance.Lifecycle.PER_CLASS)\n 2class DockerJavaIT extends AbstractFileConfigIT {\n 3\n 4    private final GenericContainer<?> container =\n 5        new GenericContainer<>(DockerImageName.parse(\"mokksy/server-jvm:snapshot\"))\n 6            .withImagePullPolicy(imageName -> false)          // use the locally built image\n 7            .withEnv(\"MOKKSY_CONFIG\", \"/config/it-stubs.yaml\")\n 8            .withCopyFileToContainer(\n 9                MountableFile.forClasspathResource(\"/it-stubs.yaml\"),\n10                \"/config/it-stubs.yaml\")\n11            .withExposedPorts(8080)\n12            .waitingFor(Wait.forLogMessage(\".*Responding at.*\", 1)) // readiness, log-based\n13            .withStartupTimeout(Duration.ofSeconds(10));\n14\n15    @BeforeAll  void beforeAll() { container.start(); }\n16    @AfterAll   void afterAll()  { container.stop();  }\n17\n18    @Override\n19    protected String getBaseUrl() {\n20        return \"http://\" + container.getHost() + \":\" + container.getFirstMappedPort();\n21    }\n22}\n```\n\nSame tests, two runtimes. The in-process variant gives fast feedback and breakpoints; the Docker variant proves the image, the entrypoint, the config-file wiring, and the container’s own readiness signal all work. Note the readiness check is still explicit — `Wait.forLogMessage`\n\nhere instead of an HTTP probe, but the same principle: don’t proceed until the SUT says it’s ready.\n\nThe pattern isn’t even JVM-specific. The same shape — an Environment of containerized dependencies, a SUT brought up once, a thin client speaking the domain, unique data per test, polling for eventual consistency — translates cleanly to a Node service tested with Vitest or Jest, or to anything else. The one thing you may give up off-JVM is single-process debugging across test and SUT; stepping a debugger across that boundary is a real convenience on the JVM and less certain elsewhere. The discipline carries over regardless.\n\n## The metric that governs the design: wall-clock time\n\nEverything above serves two numbers I treat as hard limits:\n\n**Under 3-5 minutes** to run the full suite on a developer’s laptop.**Under 10 minutes** on CI.\n\nCross those thresholds and people stop running tests locally and stop opening small PRs, because the feedback loop hurts. The whole architecture — separate module, boot-once Server, simulated dependencies over real ones, Redpanda over Kafka, aggressive parallelism — exists to defend those numbers. When the suite creeps toward the limit, the fix is almost always *more parallelism* or *a faster simulator*, rarely *fewer tests* or *longer sleeps*.\n\nThat’s the loop I keep returning to: a real booted app, real boundaries, simulated dependencies, unique data, eventual-consistency-aware assertions, and a clock I refuse to blow past. I’ve built this setup — and introduced it to teams — across a wide spread of domains: Forex high-frequency trading platforms, payment gateways, mobile payment providers, and high-scale communication providers such as Twilio. Different stacks, different latency and consistency demands; the pattern held up in every one of them.\n\nI’d be glad to hear how the Kotlin and JVM community handles the parts I still find awkward — particularly making `concurrent`\n\nmode reliable for *every* class rather than falling back to `same_thread`\n\n, and keeping LLM simulators faithful as provider APIs drift. If you have a sharper approach, tell me.", "url": "https://wpnews.pro/news/integration-testing-on-jvm", "canonical_source": "https://kpavlov.me/blog/integration-testing-on-jvm/", "published_at": "2026-06-18 08:49:24+00:00", "updated_at": "2026-06-18 09:23:56.626990+00:00", "lang": "en", "topics": ["developer-tools", "ai-tools"], "entities": ["Spring Boot", "Quarkus", "Mokksy", "WireMock", "Kafka", "Testcontainers", "Maven", "Gradle"], "alternates": {"html": "https://wpnews.pro/news/integration-testing-on-jvm", "markdown": "https://wpnews.pro/news/integration-testing-on-jvm.md", "text": "https://wpnews.pro/news/integration-testing-on-jvm.txt", "jsonld": "https://wpnews.pro/news/integration-testing-on-jvm.jsonld"}}