Announcing Mutation Testing in Haskell

Mutation testing is now generally available in the Haskell testing framework Sydtest, providing developers with an automated tool to verify test suite thoroughness by mutating code and checking whether tests detect the changes. The feature aims to address declining confidence in AI-generated code by establishing an objective, non-cheatable criterion for test coverage that operates independently of any project's subjective standards.

Mutation testing is now generally available in sydtest https://github.com/NorfairKing/sydtest . This is a major step towards a saner development workflow in the age of AI-generated code. What is mutation testing? Mutation testing aims to improve a test suite by automatically mutating code and asserting that the tests start failing. Alternatively: Mutation testing is like a type-system for your tests. It asserts that the tests test the code thoroughly. Example Consider this simple function: php canCastFireball :: Int - Int - Bool canCastFireball level mana = level = 5 && mana = 10 with a corresponding test suite: bash spec :: Spec spec = do describe "canCastFireball" $ do it "allows powerful wizards" $ canCastFireball 10 50 shouldBe True it "rejects exhausted powerful wizards" $ canCastFireball 10 0 shouldBe False it "rejects weak wizards" $ canCastFireball 1 10 shouldBe False Would you say this is a good test suite for this code? How can you tell? We could argue that a good test suite catches more of the mistakes you make. Mutation testing consists of simulating making those mistakes and checking that the test suite would indeed catch the mistake. On this example, it might generate a mutation like this: php canCastFireball :: Int - Int - Bool canCastFireball level mana = level = 5 < && mana = 10 --- && mana 10 When we run the same test suite again, all of the tests still pass. This means that if you had made this exact mistake, your tests wouldn't have caught that. It is called a surviving mutation and it is undesired . When a mutation survives, you can add a test to cover it. For example, this test could cover it: bash spec :: Spec spec = do describe "canCastFireball" $ do it "allows barely-energetic wizards" $ canCastFireball 8 10 shouldBe True Now when we run the test suite on the mutated code, this test will fail. This means that if you had made this exact mistake, the new test suite would have caught that. It is called a killed mutation and it is desired . Don't get me started on how confusing and violent this terminology is. A mutation testing engine automatically generates mutations and runs corresponding tests. Ideally it would generate many mutations of which none survive. For maximum assurance, you would cover every mutation. Realistically you would disable some. Why start mutation testing now? I've been using a coding agent Claude for a while now and noticed that I have ever less confidence in the code it produces. This is not necessarily related to it being less intelligent than I am it's often not , but rather to the sheer volume of code it can produce in the same amount of time. I have good instructions in place to have it write tests, regression tests, and property tests, but often it completely ignores my instructions or writes useless tests. The only thing that really saves me is a non-AI-based CI system https://nix-ci.com that tells me when any of my checks fail. So my aim was to produce a check that would fail if a change were insufficiently tested, without relying on any subjective criterion for determining what "sufficient testing" means. Mutation testing lets me have a completely objective criterion that is independent of my project defined in another repository so that my agent cannot cheat . How can I try it? Mutation testing is now officially available as a part of Sydtest https://github.com/NorfairKing/sydtest . Nix Check You can add a mutation check to your flake.nix 's checks like this: checks.x86 64-linux.mutation = pkgs.haskellPackages.sydtest.mutationCheck { name = "my-mutation-check"; packages = "my-package" "my-other-package" ; }; Sydtest takes care of the rest and produces nice reports. Both human-readable... and machine-readable: { "outcome": "uncovered", "mutation": { "id": "Money.Amount", "Cmp", "801", "79", "92", "<", "1" , "operator": "Cmp", "original": " ", "replacement": "<", "module": "Money.Amount", "source file": "src/Money/Amount.hs", "line": 801, "end line": 801, "col start": 79, "col end": 92, "context before": "", "-- | Validate that an 'Amount' is strictly positive. I.e. not 'zero'.", "validateStrictlyPositive :: Amount - Validation" , "source lines": "validateStrictlyPositive amount = declare \"The Amount is strictly positive\" $ amount zero" , "mutated lines": "validateStrictlyPositive amount = declare \"The Amount is strictly positive\" $ amount < zero" , "context after": , "covering tests": { "really-safe-money-autodocodec-test": , "really-safe-money-test": }, "timeout micros": 30000000 } } Disabling mutations Sometimes you don't care whether a piece of code is fully mutation tested. A good example in my opinion is debug logging: doAThing = do logDebug "Doing a thing" doTheThing Removing the logDebug line is a valid mutation, but I just don't care to test it. In this case I can add an annotation: {- ANN doAThing "DisableMutationsFor logDebug" :: String -} doAThing = do logDebug "Doing a thing" doTheThing There are other annotations available to disable mutations per-module, per-mutation, or per-binding. Conclusion Mutation testing in Haskell is ready to try out. I'm already using it in NixCI https://nix-ci.com and the latest version of really-safe-money https://github.com/NorfairKing/really-safe-money/tree/master is already fully mutation tested. Please let me know if you end up trying it. I'd love to nerd out about this.