cd /news/developer-tools/ai-coding-assistants-vs-codee-insigh… · home topics developer-tools article
[ARTICLE · art-36618] src=fortran-lang.discourse.group ↗ pub= topic=developer-tools verified=true sentiment=· neutral

AI Coding Assistants vs. Codee — Insights on Fortran Correctness and Modernization

A developer refactored Fortran code for GPU acceleration by replacing inline PPM slope computations with pure subroutines callable from do concurrent loops, improving code clarity and enabling better GPU performance. The rewrite focused on preserving algorithms while creating safe APIs and data structures for efficient GPU execution.

read7 min views5 publishedJun 23, 2026
AI Coding Assistants vs. Codee — Insights on Fortran Correctness and Modernization
Image: Fortran-Lang (auto-discovered)

A lot of the rewrites are ill inspired but have good intentions. I think it is worth to rewrite codes to adopt GPUs, since a lot of the original algorithms and paradigms might scale and work poorly on GPUs. I have been part of rewrites and ports and porting a codebase to GPUs without willing to rewrite the architecture ends up producing a difficult to work with result. It doesn’t result in the 20x speedup that you’d want and it might make the CPU slower or basically duplicate the codebase.

I feel that a good rewrite focuses on keeping the algorithms as much as possible while creating new APIs that use these algorithms in a safe way so that you can spend more time designing an architecture that will use the algorithms. For example, this bit for some PPM fluxes:

      do concurrent(j=1:ny, i=3:nx - 2) &
         local(dh_m1, dh_0, dh_p1, h_left, h_right)
         call ppm_limited_slope(bs%h(i - 2, j), bs%h(i - 1, j), bs%h(i, j), dh_m1)
         call ppm_limited_slope(bs%h(i - 1, j), bs%h(i, j), bs%h(i + 1, j), dh_0)
         call ppm_limited_slope(bs%h(i, j), bs%h(i + 1, j), bs%h(i + 2, j), dh_p1)
         h_left = 0.5_wp*(bs%h(i - 1, j) + bs%h(i, j)) - (dh_0 - dh_m1)/6.0_wp
         h_right = 0.5_wp*(bs%h(i, j) + bs%h(i + 1, j)) - (dh_p1 - dh_0)/6.0_wp
         call ppm_cell_limiter(bs%h(i, j), h_left, h_right)
         if (do_pos) call ppm_limit_pos(bs%h(i, j), h_left, h_right, h_min_pos)
         this%h_face_right_x%data(i, j, 1) = h_left
         this%h_face_left_x%data(i + 1, j, 1) = h_right
      end do

Before, the code for ppm_limited_slope

was inline in the do concurrent, repeated 3 times like:

      dh_left = h_i - h_im1
      dh_right = h_ip1 - h_i
      dh_centered = 0.5_wp*(dh_left + dh_right)
      if (dh_left*dh_right > 0.0_wp) then
         dh = sign(min(abs(dh_centered), &
                       2.0_wp*abs(dh_left), &
                       2.0_wp*abs(dh_right)), &
                   dh_centered)
      else
         dh = 0.0_wp
      end if

So I spent the time first refactoring so that I could create pure subroutines

that are callable from a do concurrent to get to the first bit of code. Then all the data needed to compute the fluxes can be just passed in by: pure subroutine continuity_compute_fluxes_barotropic(grid, metrics, this, bs)

The extent of object orientation for me is data structures that hold all necessary stuff for a computation. Their only type bound procedures are initialization, data transfer, finalization, data deletion:

   type :: continuity_t
      logical :: is_init = .false.
         !! True between `init` and `destroy`. 
      integer :: ppm_variant = PPM_VARIANT_H3_MONO
         !! Active PPM scheme variant.
      logical :: monotone = .true.
         !! Enforce monotonicity on the reconstructed face values.
      real(wp) :: h_min = 1.0e-6_wp
         !! Lower clip on cell-centred thickness during the update.
         !! Also used as the floor for `ppm_limit_pos` 
      real(wp) :: cfl_max = 0.5_wp
         !! Soft cap on per-face CFL before falling back to upwind.
      logical :: use_ppm_limit_pos = .false.
         !! Positivity limiter

      ! ---- Face-reconstruction workspace ----
      !
      !   h_face_left_x(i, j, k)  — value AT east face i extrapolated
      !                             from the LEFT-side cell (i-1, j, k),
      !                             i.e. h_R of cell i-1.
      !   h_face_right_x(i, j, k) — value AT east face i extrapolated
      !                             from the RIGHT-side cell (i, j, k),
      !                             i.e. h_L of cell i.
      !
      ! Upwind: the kernel picks left if u >= 0 (left cell donates),
      ! right if u < 0.
      type(scratch_3d_buffer_t) :: h_face_left_x
         !! Left-state thickness at east faces.
      type(scratch_3d_buffer_t) :: h_face_right_x
         !! Right-state thickness at east faces.
      type(scratch_3d_buffer_t) :: h_face_left_y
         !! Left-state thickness at north faces.
      type(scratch_3d_buffer_t) :: h_face_right_y
         !! Right-state thickness at north faces.
   contains
      procedure :: init => continuity_init
      procedure :: destroy => continuity_destroy
      procedure :: enter_data => continuity_enter_data
      procedure :: exit_data => continuity_exit_data
   end type continuity_t

So that I can then, for example use the scratch_3d_buffer_t

like:

      ! East-face shapes: (nx+1, ny, nz)
      call this%h_face_left_x%init(nx + 1, ny, nz, "continuity_h_face_left_x")
      call this%h_face_right_x%init(nx + 1, ny, nz, "continuity_h_face_right_x")
      ! North-face shapes: (nx, ny+1, nz)
      call this%h_face_left_y%init(nx, ny + 1, nz, "continuity_h_face_left_y")
      call this%h_face_right_y%init(nx, ny + 1, nz, "continuity_h_face_right_y")

      this%is_init = .true.

All the simplifications I’ve done make it so that my RK2 step looks like:

      do stage = 1, 2
         call run_stage_split(grid, metrics, dyn, eos, cor, ct, pgf, hv, bd, ss, &
                              va, hd, vd, vmix, ms, dt, n_inner, &
                              sf=sf, stage=stage, vcoord=vcoord, bc=bc, t=t, &
                              lateral_mix=lateral_mix, epbl=epbl, kshear=kshear)
      end do

      call rk2_average(ms)
      if (allocated(ms%tracers)) then
         do it = 1, size(ms%tracers)
            call rk2_average_field_3d(ms%tracers(it)%hTr0, ms%tracers(it)%hTr, &
                                      size(ms%tracers(it)%hTr, 1), &
                                      size(ms%tracers(it)%hTr, 2), &
                                      size(ms%tracers(it)%hTr, 3))
         end do
      end if

It is a bit useless to expose a public API for a continuity PPM solver, since it is quite specific. But then being able to expose a do_one_dynamics_step(handle)

API so that someone can call it from Python and have it run on the GPU? That is what a million dollar refactor should get you. All of this results in me being able to write an integration test that is super easy to read:

  pure subroutine ocean_dyn_step_barotropic(grid, metrics, dyn, cor, ct, bs, dt)
      !! Public only for the unit-test suite (no production module imports it);
      !! ignore when developing production code in other modules.

      type(hgrid_t), intent(in) :: grid
      type(ocean_metrics_t), intent(in) :: metrics
      type(ocean_dyn_t), intent(inout) :: dyn
      type(coriolis_adv_t), intent(inout) :: cor
      type(continuity_t), intent(inout) :: ct
      type(barotropic_cgrid_state_t), intent(inout) :: bs
      real(wp), intent(in) :: dt

      integer :: i, j, nx, ny, nx_face, ny_uface, nx_vface, ny_face

      nx = grid%nx_total
      ny = grid%ny_total
      nx_face = size(bs%u_face_x, 1)
      ny_uface = size(bs%u_face_x, 2)
      nx_vface = size(bs%v_face_y, 1)
      ny_face = size(bs%v_face_y, 2)

      ! ---- 1. Save u^n into h0 / u_face_x0 / v_face_y0 ----
      do concurrent(j=1:ny, i=1:nx)
         bs%h0(i, j) = bs%h(i, j)
      end do
      do concurrent(j=1:ny_uface, i=1:nx_face)
         bs%u_face_x0(i, j) = bs%u_face_x(i, j)
      end do
      do concurrent(j=1:ny_face, i=1:nx_vface)
         bs%v_face_y0(i, j) = bs%v_face_y(i, j)
      end do

      ! ---- 2. Stage 1: tendencies at u^n, FE step -> u^(1) ----
      call continuity_compute_fluxes_barotropic(grid, metrics, ct, bs)
      call coriolis_adv_compute_tendencies_barotropic(grid, metrics, cor, bs)
      call continuity_apply_fluxes_barotropic(bs, dt)
      call coriolis_adv_apply_tendencies_barotropic(cor, bs, dt)

      ! ---- 3. Stage 2: tendencies at u^(1), FE step -> u^(1) + dt*L(u^(1)) ----
      call continuity_compute_fluxes_barotropic(grid, metrics, ct, bs)
      call coriolis_adv_compute_tendencies_barotropic(grid, metrics, cor, bs)
      call continuity_apply_fluxes_barotropic(bs, dt)
      call coriolis_adv_apply_tendencies_barotropic(cor, bs, dt)

      ! ---- 4. RK2 average: u^(n+1) = 1/2 * (u^n + (u^(1) + dt*L(u^(1)))) ----
      do concurrent(j=1:ny, i=1:nx)
         bs%h(i, j) = 0.5_wp*(bs%h0(i, j) + bs%h(i, j))
      end do
      do concurrent(j=1:ny_uface, i=1:nx_face)
         bs%u_face_x(i, j) = 0.5_wp*(bs%u_face_x0(i, j) + bs%u_face_x(i, j))
      end do
      do concurrent(j=1:ny_face, i=1:nx_vface)
         bs%v_face_y(i, j) = 0.5_wp*(bs%v_face_y0(i, j) + bs%v_face_y(i, j))
      end do

      dyn%outer_step_count = dyn%outer_step_count + 1
   end subroutine ocean_dyn_step_barotropic

Which to me is the magic of Fortran, if you had gone the full object oriented way in C++ would you get (AI generated)…because a lot of the new rewrites focus on migrating to C++:

template <class Op>
concept BarotropicOperator = requires(Op op, const HGrid& g, const OceanMetrics& m,
                                      BarotropicCGridState& s, wp dt) {
    op.compute(g, m, s);
    op.apply(s, dt);
};

template <BarotropicOperator Cont, BarotropicOperator Cor>
class BarotropicRK2Stepper {
public:
    BarotropicRK2Stepper(Cont& cont, Cor& cor) : cont_(cont), cor_(cor) {}

    void step(const HGrid& grid, const OceanMetrics& metrics,
              OceanDyn& dyn, BarotropicCGridState& s, wp dt) {
        s.saveSnapshot();                              // 1. save u^n
        forwardEulerStage(grid, metrics, s, dt);       // 2. u* = u^n + dt L(u^n)
        forwardEulerStage(grid, metrics, s, dt);       // 3. u* + dt L(u*)
        s.averageWithSnapshot();                       // 4. Heun average
        ++dyn.outerStepCount;
    }
private:
    void forwardEulerStage(const HGrid& grid, const OceanMetrics& metrics,
                           BarotropicCGridState& s, wp dt) {
        cont_.compute(grid, metrics, s);
        cor_ .compute(grid, metrics, s);
        cont_.apply(s, dt);
        cor_ .apply(s, dt);
    }
    Cont& cont_;
    Cor&  cor_;
};

But now, unless you’re very familiar with C++ and objects this might read like a foreign language. Because each object abstracts away what compute is doing you get that indirection that is very nice for generality but a bit difficult to cope if you’re starting out. I think heavily from the academic perspective, new students often without experience at all.

I feel that the Fortran implementation has a simplicity to the reader that is difficult to get with C++ unless you write C++ that looks more akin to C. I feel that the Fortran can look like how someone would write a python code for this to prototype: (also AI generated)

def ocean_dyn_step_barotropic(grid, metrics, dyn, cor, ct, bs, dt):
    bs.h0[:]        = bs.h
    bs.u_face_x0[:] = bs.u_face_x
    bs.v_face_y0[:] = bs.v_face_y

    continuity_compute_fluxes_barotropic(grid, metrics, ct, bs)
    coriolis_adv_compute_tendencies_barotropic(grid, metrics, cor, bs)
    continuity_apply_fluxes_barotropic(bs, dt)
    coriolis_adv_apply_tendencies_barotropic(cor, bs, dt)

    bs.h[:]        = 0.5 * (bs.h0 + bs.h)
    bs.u_face_x[:] = 0.5 * (bs.u_face_x0 + bs.u_face_x)
    bs.v_face_y[:] = 0.5 * (bs.v_face_y0 + bs.v_face_y)
    dyn.outer_step_count += 1

So, basically my idea of a refactor/port/modernization is hide the complexity away through nice interfaces without abstracting too much that the algorithm gets lost. I like the idea of data structures carrying data used in computation and then use procedural style calls for this. I am sure you could rewrite my dynamics step to look nicer. My main goal is: write your code such that a new student can read it and compare it to the algorithm they see in a book/paper

── more in #developer-tools 4 stories · sorted by recency
── more on @fortran 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/ai-coding-assistants…] indexed:0 read:7min 2026-06-23 ·