AI Coding Assistants vs. Codee — Insights on Fortran Correctness and Modernization

A developer refactored Fortran code for GPU acceleration by replacing inline PPM slope computations with pure subroutines callable from do concurrent loops, improving code clarity and enabling better GPU performance. The rewrite focused on preserving algorithms while creating safe APIs and data structures for efficient GPU execution.

A lot of the rewrites are ill inspired but have good intentions. I think it is worth to rewrite codes to adopt GPUs, since a lot of the original algorithms and paradigms might scale and work poorly on GPUs. I have been part of rewrites and ports and porting a codebase to GPUs without willing to rewrite the architecture ends up producing a difficult to work with result. It doesn’t result in the 20x speedup that you’d want and it might make the CPU slower or basically duplicate the codebase. I feel that a good rewrite focuses on keeping the algorithms as much as possible while creating new APIs that use these algorithms in a safe way so that you can spend more time designing an architecture that will use the algorithms. For example, this bit for some PPM fluxes: do concurrent j=1:ny, i=3:nx - 2 & local dh m1, dh 0, dh p1, h left, h right call ppm limited slope bs%h i - 2, j , bs%h i - 1, j , bs%h i, j , dh m1 call ppm limited slope bs%h i - 1, j , bs%h i, j , bs%h i + 1, j , dh 0 call ppm limited slope bs%h i, j , bs%h i + 1, j , bs%h i + 2, j , dh p1 h left = 0.5 wp bs%h i - 1, j + bs%h i, j - dh 0 - dh m1 /6.0 wp h right = 0.5 wp bs%h i, j + bs%h i + 1, j - dh p1 - dh 0 /6.0 wp call ppm cell limiter bs%h i, j , h left, h right if do pos call ppm limit pos bs%h i, j , h left, h right, h min pos this%h face right x%data i, j, 1 = h left this%h face left x%data i + 1, j, 1 = h right end do Before, the code for ppm limited slope was inline in the do concurrent, repeated 3 times like: dh left = h i - h im1 dh right = h ip1 - h i dh centered = 0.5 wp dh left + dh right if dh left dh right 0.0 wp then dh = sign min abs dh centered , & 2.0 wp abs dh left , & 2.0 wp abs dh right , & dh centered else dh = 0.0 wp end if So I spent the time first refactoring so that I could create pure subroutines that are callable from a do concurrent to get to the first bit of code. Then all the data needed to compute the fluxes can be just passed in by: pure subroutine continuity compute fluxes barotropic grid, metrics, this, bs The extent of object orientation for me is data structures that hold all necessary stuff for a computation. Their only type bound procedures are initialization, data transfer, finalization, data deletion: type :: continuity t logical :: is init = .false. True between init and destroy . integer :: ppm variant = PPM VARIANT H3 MONO Active PPM scheme variant. logical :: monotone = .true. Enforce monotonicity on the reconstructed face values. real wp :: h min = 1.0e-6 wp Lower clip on cell-centred thickness during the update. Also used as the floor for ppm limit pos real wp :: cfl max = 0.5 wp Soft cap on per-face CFL before falling back to upwind. logical :: use ppm limit pos = .false. Positivity limiter ---- Face-reconstruction workspace ---- h face left x i, j, k — value AT east face i extrapolated from the LEFT-side cell i-1, j, k , i.e. h R of cell i-1. h face right x i, j, k — value AT east face i extrapolated from the RIGHT-side cell i, j, k , i.e. h L of cell i. Upwind: the kernel picks left if u = 0 left cell donates , right if u < 0. type scratch 3d buffer t :: h face left x Left-state thickness at east faces. type scratch 3d buffer t :: h face right x Right-state thickness at east faces. type scratch 3d buffer t :: h face left y Left-state thickness at north faces. type scratch 3d buffer t :: h face right y Right-state thickness at north faces. contains procedure :: init = continuity init procedure :: destroy = continuity destroy procedure :: enter data = continuity enter data procedure :: exit data = continuity exit data end type continuity t So that I can then, for example use the scratch 3d buffer t like: East-face shapes: nx+1, ny, nz call this%h face left x%init nx + 1, ny, nz, "continuity h face left x" call this%h face right x%init nx + 1, ny, nz, "continuity h face right x" North-face shapes: nx, ny+1, nz call this%h face left y%init nx, ny + 1, nz, "continuity h face left y" call this%h face right y%init nx, ny + 1, nz, "continuity h face right y" this%is init = .true. All the simplifications I’ve done make it so that my RK2 step looks like: do stage = 1, 2 call run stage split grid, metrics, dyn, eos, cor, ct, pgf, hv, bd, ss, & va, hd, vd, vmix, ms, dt, n inner, & sf=sf, stage=stage, vcoord=vcoord, bc=bc, t=t, & lateral mix=lateral mix, epbl=epbl, kshear=kshear end do call rk2 average ms if allocated ms%tracers then do it = 1, size ms%tracers call rk2 average field 3d ms%tracers it %hTr0, ms%tracers it %hTr, & size ms%tracers it %hTr, 1 , & size ms%tracers it %hTr, 2 , & size ms%tracers it %hTr, 3 end do end if It is a bit useless to expose a public API for a continuity PPM solver, since it is quite specific. But then being able to expose a do one dynamics step handle API so that someone can call it from Python and have it run on the GPU? That is what a million dollar refactor should get you. All of this results in me being able to write an integration test that is super easy to read: pure subroutine ocean dyn step barotropic grid, metrics, dyn, cor, ct, bs, dt Public only for the unit-test suite no production module imports it ; ignore when developing production code in other modules. type hgrid t , intent in :: grid type ocean metrics t , intent in :: metrics type ocean dyn t , intent inout :: dyn type coriolis adv t , intent inout :: cor type continuity t , intent inout :: ct type barotropic cgrid state t , intent inout :: bs real wp , intent in :: dt integer :: i, j, nx, ny, nx face, ny uface, nx vface, ny face nx = grid%nx total ny = grid%ny total nx face = size bs%u face x, 1 ny uface = size bs%u face x, 2 nx vface = size bs%v face y, 1 ny face = size bs%v face y, 2 ---- 1. Save u^n into h0 / u face x0 / v face y0 ---- do concurrent j=1:ny, i=1:nx bs%h0 i, j = bs%h i, j end do do concurrent j=1:ny uface, i=1:nx face bs%u face x0 i, j = bs%u face x i, j end do do concurrent j=1:ny face, i=1:nx vface bs%v face y0 i, j = bs%v face y i, j end do ---- 2. Stage 1: tendencies at u^n, FE step - u^ 1 ---- call continuity compute fluxes barotropic grid, metrics, ct, bs call coriolis adv compute tendencies barotropic grid, metrics, cor, bs call continuity apply fluxes barotropic bs, dt call coriolis adv apply tendencies barotropic cor, bs, dt ---- 3. Stage 2: tendencies at u^ 1 , FE step - u^ 1 + dt L u^ 1 ---- call continuity compute fluxes barotropic grid, metrics, ct, bs call coriolis adv compute tendencies barotropic grid, metrics, cor, bs call continuity apply fluxes barotropic bs, dt call coriolis adv apply tendencies barotropic cor, bs, dt ---- 4. RK2 average: u^ n+1 = 1/2 u^n + u^ 1 + dt L u^ 1 ---- do concurrent j=1:ny, i=1:nx bs%h i, j = 0.5 wp bs%h0 i, j + bs%h i, j end do do concurrent j=1:ny uface, i=1:nx face bs%u face x i, j = 0.5 wp bs%u face x0 i, j + bs%u face x i, j end do do concurrent j=1:ny face, i=1:nx vface bs%v face y i, j = 0.5 wp bs%v face y0 i, j + bs%v face y i, j end do dyn%outer step count = dyn%outer step count + 1 end subroutine ocean dyn step barotropic Which to me is the magic of Fortran, if you had gone the full object oriented way in C++ would you get AI generated …because a lot of the new rewrites focus on migrating to C++: js template <class Op concept BarotropicOperator = requires Op op, const HGrid& g, const OceanMetrics& m, BarotropicCGridState& s, wp dt { op.compute g, m, s ; op.apply s, dt ; }; template <BarotropicOperator Cont, BarotropicOperator Cor class BarotropicRK2Stepper { public: BarotropicRK2Stepper Cont& cont, Cor& cor : cont cont , cor cor {} void step const HGrid& grid, const OceanMetrics& metrics, OceanDyn& dyn, BarotropicCGridState& s, wp dt { s.saveSnapshot ; // 1. save u^n forwardEulerStage grid, metrics, s, dt ; // 2. u = u^n + dt L u^n forwardEulerStage grid, metrics, s, dt ; // 3. u + dt L u s.averageWithSnapshot ; // 4. Heun average ++dyn.outerStepCount; } private: void forwardEulerStage const HGrid& grid, const OceanMetrics& metrics, BarotropicCGridState& s, wp dt { cont .compute grid, metrics, s ; cor .compute grid, metrics, s ; cont .apply s, dt ; cor .apply s, dt ; } Cont& cont ; Cor& cor ; }; But now, unless you’re very familiar with C++ and objects this might read like a foreign language. Because each object abstracts away what compute is doing you get that indirection that is very nice for generality but a bit difficult to cope if you’re starting out. I think heavily from the academic perspective, new students often without experience at all. I feel that the Fortran implementation has a simplicity to the reader that is difficult to get with C++ unless you write C++ that looks more akin to C. I feel that the Fortran can look like how someone would write a python code for this to prototype: also AI generated python def ocean dyn step barotropic grid, metrics, dyn, cor, ct, bs, dt : bs.h0 : = bs.h bs.u face x0 : = bs.u face x bs.v face y0 : = bs.v face y continuity compute fluxes barotropic grid, metrics, ct, bs coriolis adv compute tendencies barotropic grid, metrics, cor, bs continuity apply fluxes barotropic bs, dt coriolis adv apply tendencies barotropic cor, bs, dt ... stage 2 ... bs.h : = 0.5 bs.h0 + bs.h bs.u face x : = 0.5 bs.u face x0 + bs.u face x bs.v face y : = 0.5 bs.v face y0 + bs.v face y dyn.outer step count += 1 So, basically my idea of a refactor/port/modernization is hide the complexity away through nice interfaces without abstracting too much that the algorithm gets lost. I like the idea of data structures carrying data used in computation and then use procedural style calls for this. I am sure you could rewrite my dynamics step to look nicer. My main goal is: write your code such that a new student can read it and compare it to the algorithm they see in a book/paper