# AI Coding Assistants vs. Codee — Insights on Fortran Correctness and Modernization

> Source: <https://fortran-lang.discourse.group/t/ai-coding-assistants-vs-codee-insights-on-fortran-correctness-and-modernization/10472?page=2#post_38>
> Published: 2026-06-23 14:35:00+00:00

A lot of the rewrites are ill inspired but have good intentions. I think it is worth to rewrite codes to adopt GPUs, since a lot of the original algorithms and paradigms might scale and work poorly on GPUs. I have been part of rewrites and ports and porting a codebase to GPUs without willing to rewrite the architecture ends up producing a difficult to work with result. It doesn’t result in the 20x speedup that you’d want and it might make the CPU slower or basically duplicate the codebase.

I feel that a good rewrite focuses on keeping the algorithms as much as possible while creating new APIs that use these algorithms in a safe way so that you can spend more time designing an architecture that will use the algorithms. For example, this bit for some PPM fluxes:

```
      do concurrent(j=1:ny, i=3:nx - 2) &
         local(dh_m1, dh_0, dh_p1, h_left, h_right)
         call ppm_limited_slope(bs%h(i - 2, j), bs%h(i - 1, j), bs%h(i, j), dh_m1)
         call ppm_limited_slope(bs%h(i - 1, j), bs%h(i, j), bs%h(i + 1, j), dh_0)
         call ppm_limited_slope(bs%h(i, j), bs%h(i + 1, j), bs%h(i + 2, j), dh_p1)
         h_left = 0.5_wp*(bs%h(i - 1, j) + bs%h(i, j)) - (dh_0 - dh_m1)/6.0_wp
         h_right = 0.5_wp*(bs%h(i, j) + bs%h(i + 1, j)) - (dh_p1 - dh_0)/6.0_wp
         call ppm_cell_limiter(bs%h(i, j), h_left, h_right)
         if (do_pos) call ppm_limit_pos(bs%h(i, j), h_left, h_right, h_min_pos)
         this%h_face_right_x%data(i, j, 1) = h_left
         this%h_face_left_x%data(i + 1, j, 1) = h_right
      end do
```

Before, the code for `ppm_limited_slope`

was inline in the do concurrent, repeated 3 times like:

```
      dh_left = h_i - h_im1
      dh_right = h_ip1 - h_i
      dh_centered = 0.5_wp*(dh_left + dh_right)
      if (dh_left*dh_right > 0.0_wp) then
         dh = sign(min(abs(dh_centered), &
                       2.0_wp*abs(dh_left), &
                       2.0_wp*abs(dh_right)), &
                   dh_centered)
      else
         dh = 0.0_wp
      end if
```

So I spent the time first refactoring so that I could create `pure subroutines`

that are callable from a do concurrent to get to the first bit of code. Then all the data needed to compute the fluxes can be just passed in by: `pure subroutine continuity_compute_fluxes_barotropic(grid, metrics, this, bs)`

The extent of object orientation for me is data structures that hold all necessary stuff for a computation. Their only type bound procedures are initialization, data transfer, finalization, data deletion:

```
   type :: continuity_t
      logical :: is_init = .false.
         !! True between `init` and `destroy`. 
      integer :: ppm_variant = PPM_VARIANT_H3_MONO
         !! Active PPM scheme variant.
      logical :: monotone = .true.
         !! Enforce monotonicity on the reconstructed face values.
      real(wp) :: h_min = 1.0e-6_wp
         !! Lower clip on cell-centred thickness during the update.
         !! Also used as the floor for `ppm_limit_pos` 
      real(wp) :: cfl_max = 0.5_wp
         !! Soft cap on per-face CFL before falling back to upwind.
      logical :: use_ppm_limit_pos = .false.
         !! Positivity limiter

      ! ---- Face-reconstruction workspace ----
      !
      !   h_face_left_x(i, j, k)  — value AT east face i extrapolated
      !                             from the LEFT-side cell (i-1, j, k),
      !                             i.e. h_R of cell i-1.
      !   h_face_right_x(i, j, k) — value AT east face i extrapolated
      !                             from the RIGHT-side cell (i, j, k),
      !                             i.e. h_L of cell i.
      !
      ! Upwind: the kernel picks left if u >= 0 (left cell donates),
      ! right if u < 0.
      type(scratch_3d_buffer_t) :: h_face_left_x
         !! Left-state thickness at east faces.
      type(scratch_3d_buffer_t) :: h_face_right_x
         !! Right-state thickness at east faces.
      type(scratch_3d_buffer_t) :: h_face_left_y
         !! Left-state thickness at north faces.
      type(scratch_3d_buffer_t) :: h_face_right_y
         !! Right-state thickness at north faces.
   contains
      procedure :: init => continuity_init
      procedure :: destroy => continuity_destroy
      procedure :: enter_data => continuity_enter_data
      procedure :: exit_data => continuity_exit_data
   end type continuity_t
```

So that I can then, for example use the `scratch_3d_buffer_t`

like:

```
      ! East-face shapes: (nx+1, ny, nz)
      call this%h_face_left_x%init(nx + 1, ny, nz, "continuity_h_face_left_x")
      call this%h_face_right_x%init(nx + 1, ny, nz, "continuity_h_face_right_x")
      ! North-face shapes: (nx, ny+1, nz)
      call this%h_face_left_y%init(nx, ny + 1, nz, "continuity_h_face_left_y")
      call this%h_face_right_y%init(nx, ny + 1, nz, "continuity_h_face_right_y")

      this%is_init = .true.
```

All the simplifications I’ve done make it so that my RK2 step looks like:

```
      do stage = 1, 2
         call run_stage_split(grid, metrics, dyn, eos, cor, ct, pgf, hv, bd, ss, &
                              va, hd, vd, vmix, ms, dt, n_inner, &
                              sf=sf, stage=stage, vcoord=vcoord, bc=bc, t=t, &
                              lateral_mix=lateral_mix, epbl=epbl, kshear=kshear)
      end do

      call rk2_average(ms)
      if (allocated(ms%tracers)) then
         do it = 1, size(ms%tracers)
            call rk2_average_field_3d(ms%tracers(it)%hTr0, ms%tracers(it)%hTr, &
                                      size(ms%tracers(it)%hTr, 1), &
                                      size(ms%tracers(it)%hTr, 2), &
                                      size(ms%tracers(it)%hTr, 3))
         end do
      end if
```

It is a bit useless to expose a public API for a continuity PPM solver, since it is quite specific. But then being able to expose a `do_one_dynamics_step(handle)`

API so that someone can call it from Python and have it run on the GPU? That is what a million dollar refactor should get you. All of this results in me being able to write an integration test that is super easy to read:

```
  pure subroutine ocean_dyn_step_barotropic(grid, metrics, dyn, cor, ct, bs, dt)
      !! Public only for the unit-test suite (no production module imports it);
      !! ignore when developing production code in other modules.

      type(hgrid_t), intent(in) :: grid
      type(ocean_metrics_t), intent(in) :: metrics
      type(ocean_dyn_t), intent(inout) :: dyn
      type(coriolis_adv_t), intent(inout) :: cor
      type(continuity_t), intent(inout) :: ct
      type(barotropic_cgrid_state_t), intent(inout) :: bs
      real(wp), intent(in) :: dt

      integer :: i, j, nx, ny, nx_face, ny_uface, nx_vface, ny_face

      nx = grid%nx_total
      ny = grid%ny_total
      nx_face = size(bs%u_face_x, 1)
      ny_uface = size(bs%u_face_x, 2)
      nx_vface = size(bs%v_face_y, 1)
      ny_face = size(bs%v_face_y, 2)

      ! ---- 1. Save u^n into h0 / u_face_x0 / v_face_y0 ----
      do concurrent(j=1:ny, i=1:nx)
         bs%h0(i, j) = bs%h(i, j)
      end do
      do concurrent(j=1:ny_uface, i=1:nx_face)
         bs%u_face_x0(i, j) = bs%u_face_x(i, j)
      end do
      do concurrent(j=1:ny_face, i=1:nx_vface)
         bs%v_face_y0(i, j) = bs%v_face_y(i, j)
      end do

      ! ---- 2. Stage 1: tendencies at u^n, FE step -> u^(1) ----
      call continuity_compute_fluxes_barotropic(grid, metrics, ct, bs)
      call coriolis_adv_compute_tendencies_barotropic(grid, metrics, cor, bs)
      call continuity_apply_fluxes_barotropic(bs, dt)
      call coriolis_adv_apply_tendencies_barotropic(cor, bs, dt)

      ! ---- 3. Stage 2: tendencies at u^(1), FE step -> u^(1) + dt*L(u^(1)) ----
      call continuity_compute_fluxes_barotropic(grid, metrics, ct, bs)
      call coriolis_adv_compute_tendencies_barotropic(grid, metrics, cor, bs)
      call continuity_apply_fluxes_barotropic(bs, dt)
      call coriolis_adv_apply_tendencies_barotropic(cor, bs, dt)

      ! ---- 4. RK2 average: u^(n+1) = 1/2 * (u^n + (u^(1) + dt*L(u^(1)))) ----
      do concurrent(j=1:ny, i=1:nx)
         bs%h(i, j) = 0.5_wp*(bs%h0(i, j) + bs%h(i, j))
      end do
      do concurrent(j=1:ny_uface, i=1:nx_face)
         bs%u_face_x(i, j) = 0.5_wp*(bs%u_face_x0(i, j) + bs%u_face_x(i, j))
      end do
      do concurrent(j=1:ny_face, i=1:nx_vface)
         bs%v_face_y(i, j) = 0.5_wp*(bs%v_face_y0(i, j) + bs%v_face_y(i, j))
      end do

      dyn%outer_step_count = dyn%outer_step_count + 1
   end subroutine ocean_dyn_step_barotropic
```

Which to me is the magic of Fortran, if you had gone the full object oriented way in C++ would you get (AI generated)…because a lot of the new rewrites focus on migrating to C++:

``` js
template <class Op>
concept BarotropicOperator = requires(Op op, const HGrid& g, const OceanMetrics& m,
                                      BarotropicCGridState& s, wp dt) {
    op.compute(g, m, s);
    op.apply(s, dt);
};

template <BarotropicOperator Cont, BarotropicOperator Cor>
class BarotropicRK2Stepper {
public:
    BarotropicRK2Stepper(Cont& cont, Cor& cor) : cont_(cont), cor_(cor) {}

    void step(const HGrid& grid, const OceanMetrics& metrics,
              OceanDyn& dyn, BarotropicCGridState& s, wp dt) {
        s.saveSnapshot();                              // 1. save u^n
        forwardEulerStage(grid, metrics, s, dt);       // 2. u* = u^n + dt L(u^n)
        forwardEulerStage(grid, metrics, s, dt);       // 3. u* + dt L(u*)
        s.averageWithSnapshot();                       // 4. Heun average
        ++dyn.outerStepCount;
    }
private:
    void forwardEulerStage(const HGrid& grid, const OceanMetrics& metrics,
                           BarotropicCGridState& s, wp dt) {
        cont_.compute(grid, metrics, s);
        cor_ .compute(grid, metrics, s);
        cont_.apply(s, dt);
        cor_ .apply(s, dt);
    }
    Cont& cont_;
    Cor&  cor_;
};
```

But now, unless you’re very familiar with C++ and objects this might read like a foreign language. Because each object abstracts away what compute is doing you get that indirection that is very nice for generality but a bit difficult to cope if you’re starting out. I think heavily from the academic perspective, new students often without experience at all.

I feel that the Fortran implementation has a simplicity to the reader that is difficult to get with C++ unless you write C++ that looks more akin to C. I feel that the Fortran can look like how someone would write a python code for this to prototype: (also AI generated)

``` python
def ocean_dyn_step_barotropic(grid, metrics, dyn, cor, ct, bs, dt):
    bs.h0[:]        = bs.h
    bs.u_face_x0[:] = bs.u_face_x
    bs.v_face_y0[:] = bs.v_face_y

    continuity_compute_fluxes_barotropic(grid, metrics, ct, bs)
    coriolis_adv_compute_tendencies_barotropic(grid, metrics, cor, bs)
    continuity_apply_fluxes_barotropic(bs, dt)
    coriolis_adv_apply_tendencies_barotropic(cor, bs, dt)
    # ... stage 2 ...

    bs.h[:]        = 0.5 * (bs.h0 + bs.h)
    bs.u_face_x[:] = 0.5 * (bs.u_face_x0 + bs.u_face_x)
    bs.v_face_y[:] = 0.5 * (bs.v_face_y0 + bs.v_face_y)
    dyn.outer_step_count += 1
```

So, basically my idea of a refactor/port/modernization is hide the complexity away through nice interfaces without abstracting too much that the algorithm gets lost. I like the idea of data structures carrying data used in computation and then use procedural style calls for this. I am sure you could rewrite my dynamics step to look nicer. My main goal is: write your code such that a new student can read it and compare it to the algorithm they see in a book/paper
