{"slug": "ai-coding-assistants-vs-codee-insights-on-fortran-correctness-and-modernization", "title": "AI Coding Assistants vs. Codee — Insights on Fortran Correctness and Modernization", "summary": "A developer refactored Fortran code for GPU acceleration by replacing inline PPM slope computations with pure subroutines callable from do concurrent loops, improving code clarity and enabling better GPU performance. The rewrite focused on preserving algorithms while creating safe APIs and data structures for efficient GPU execution.", "body_md": "A lot of the rewrites are ill inspired but have good intentions. I think it is worth to rewrite codes to adopt GPUs, since a lot of the original algorithms and paradigms might scale and work poorly on GPUs. I have been part of rewrites and ports and porting a codebase to GPUs without willing to rewrite the architecture ends up producing a difficult to work with result. It doesn’t result in the 20x speedup that you’d want and it might make the CPU slower or basically duplicate the codebase.\n\nI feel that a good rewrite focuses on keeping the algorithms as much as possible while creating new APIs that use these algorithms in a safe way so that you can spend more time designing an architecture that will use the algorithms. For example, this bit for some PPM fluxes:\n\n```\n      do concurrent(j=1:ny, i=3:nx - 2) &\n         local(dh_m1, dh_0, dh_p1, h_left, h_right)\n         call ppm_limited_slope(bs%h(i - 2, j), bs%h(i - 1, j), bs%h(i, j), dh_m1)\n         call ppm_limited_slope(bs%h(i - 1, j), bs%h(i, j), bs%h(i + 1, j), dh_0)\n         call ppm_limited_slope(bs%h(i, j), bs%h(i + 1, j), bs%h(i + 2, j), dh_p1)\n         h_left = 0.5_wp*(bs%h(i - 1, j) + bs%h(i, j)) - (dh_0 - dh_m1)/6.0_wp\n         h_right = 0.5_wp*(bs%h(i, j) + bs%h(i + 1, j)) - (dh_p1 - dh_0)/6.0_wp\n         call ppm_cell_limiter(bs%h(i, j), h_left, h_right)\n         if (do_pos) call ppm_limit_pos(bs%h(i, j), h_left, h_right, h_min_pos)\n         this%h_face_right_x%data(i, j, 1) = h_left\n         this%h_face_left_x%data(i + 1, j, 1) = h_right\n      end do\n```\n\nBefore, the code for `ppm_limited_slope`\n\nwas inline in the do concurrent, repeated 3 times like:\n\n```\n      dh_left = h_i - h_im1\n      dh_right = h_ip1 - h_i\n      dh_centered = 0.5_wp*(dh_left + dh_right)\n      if (dh_left*dh_right > 0.0_wp) then\n         dh = sign(min(abs(dh_centered), &\n                       2.0_wp*abs(dh_left), &\n                       2.0_wp*abs(dh_right)), &\n                   dh_centered)\n      else\n         dh = 0.0_wp\n      end if\n```\n\nSo I spent the time first refactoring so that I could create `pure subroutines`\n\nthat are callable from a do concurrent to get to the first bit of code. Then all the data needed to compute the fluxes can be just passed in by: `pure subroutine continuity_compute_fluxes_barotropic(grid, metrics, this, bs)`\n\nThe extent of object orientation for me is data structures that hold all necessary stuff for a computation. Their only type bound procedures are initialization, data transfer, finalization, data deletion:\n\n```\n   type :: continuity_t\n      logical :: is_init = .false.\n         !! True between `init` and `destroy`. \n      integer :: ppm_variant = PPM_VARIANT_H3_MONO\n         !! Active PPM scheme variant.\n      logical :: monotone = .true.\n         !! Enforce monotonicity on the reconstructed face values.\n      real(wp) :: h_min = 1.0e-6_wp\n         !! Lower clip on cell-centred thickness during the update.\n         !! Also used as the floor for `ppm_limit_pos` \n      real(wp) :: cfl_max = 0.5_wp\n         !! Soft cap on per-face CFL before falling back to upwind.\n      logical :: use_ppm_limit_pos = .false.\n         !! Positivity limiter\n\n      ! ---- Face-reconstruction workspace ----\n      !\n      !   h_face_left_x(i, j, k)  — value AT east face i extrapolated\n      !                             from the LEFT-side cell (i-1, j, k),\n      !                             i.e. h_R of cell i-1.\n      !   h_face_right_x(i, j, k) — value AT east face i extrapolated\n      !                             from the RIGHT-side cell (i, j, k),\n      !                             i.e. h_L of cell i.\n      !\n      ! Upwind: the kernel picks left if u >= 0 (left cell donates),\n      ! right if u < 0.\n      type(scratch_3d_buffer_t) :: h_face_left_x\n         !! Left-state thickness at east faces.\n      type(scratch_3d_buffer_t) :: h_face_right_x\n         !! Right-state thickness at east faces.\n      type(scratch_3d_buffer_t) :: h_face_left_y\n         !! Left-state thickness at north faces.\n      type(scratch_3d_buffer_t) :: h_face_right_y\n         !! Right-state thickness at north faces.\n   contains\n      procedure :: init => continuity_init\n      procedure :: destroy => continuity_destroy\n      procedure :: enter_data => continuity_enter_data\n      procedure :: exit_data => continuity_exit_data\n   end type continuity_t\n```\n\nSo that I can then, for example use the `scratch_3d_buffer_t`\n\nlike:\n\n```\n      ! East-face shapes: (nx+1, ny, nz)\n      call this%h_face_left_x%init(nx + 1, ny, nz, \"continuity_h_face_left_x\")\n      call this%h_face_right_x%init(nx + 1, ny, nz, \"continuity_h_face_right_x\")\n      ! North-face shapes: (nx, ny+1, nz)\n      call this%h_face_left_y%init(nx, ny + 1, nz, \"continuity_h_face_left_y\")\n      call this%h_face_right_y%init(nx, ny + 1, nz, \"continuity_h_face_right_y\")\n\n      this%is_init = .true.\n```\n\nAll the simplifications I’ve done make it so that my RK2 step looks like:\n\n```\n      do stage = 1, 2\n         call run_stage_split(grid, metrics, dyn, eos, cor, ct, pgf, hv, bd, ss, &\n                              va, hd, vd, vmix, ms, dt, n_inner, &\n                              sf=sf, stage=stage, vcoord=vcoord, bc=bc, t=t, &\n                              lateral_mix=lateral_mix, epbl=epbl, kshear=kshear)\n      end do\n\n      call rk2_average(ms)\n      if (allocated(ms%tracers)) then\n         do it = 1, size(ms%tracers)\n            call rk2_average_field_3d(ms%tracers(it)%hTr0, ms%tracers(it)%hTr, &\n                                      size(ms%tracers(it)%hTr, 1), &\n                                      size(ms%tracers(it)%hTr, 2), &\n                                      size(ms%tracers(it)%hTr, 3))\n         end do\n      end if\n```\n\nIt is a bit useless to expose a public API for a continuity PPM solver, since it is quite specific. But then being able to expose a `do_one_dynamics_step(handle)`\n\nAPI so that someone can call it from Python and have it run on the GPU? That is what a million dollar refactor should get you. All of this results in me being able to write an integration test that is super easy to read:\n\n```\n  pure subroutine ocean_dyn_step_barotropic(grid, metrics, dyn, cor, ct, bs, dt)\n      !! Public only for the unit-test suite (no production module imports it);\n      !! ignore when developing production code in other modules.\n\n      type(hgrid_t), intent(in) :: grid\n      type(ocean_metrics_t), intent(in) :: metrics\n      type(ocean_dyn_t), intent(inout) :: dyn\n      type(coriolis_adv_t), intent(inout) :: cor\n      type(continuity_t), intent(inout) :: ct\n      type(barotropic_cgrid_state_t), intent(inout) :: bs\n      real(wp), intent(in) :: dt\n\n      integer :: i, j, nx, ny, nx_face, ny_uface, nx_vface, ny_face\n\n      nx = grid%nx_total\n      ny = grid%ny_total\n      nx_face = size(bs%u_face_x, 1)\n      ny_uface = size(bs%u_face_x, 2)\n      nx_vface = size(bs%v_face_y, 1)\n      ny_face = size(bs%v_face_y, 2)\n\n      ! ---- 1. Save u^n into h0 / u_face_x0 / v_face_y0 ----\n      do concurrent(j=1:ny, i=1:nx)\n         bs%h0(i, j) = bs%h(i, j)\n      end do\n      do concurrent(j=1:ny_uface, i=1:nx_face)\n         bs%u_face_x0(i, j) = bs%u_face_x(i, j)\n      end do\n      do concurrent(j=1:ny_face, i=1:nx_vface)\n         bs%v_face_y0(i, j) = bs%v_face_y(i, j)\n      end do\n\n      ! ---- 2. Stage 1: tendencies at u^n, FE step -> u^(1) ----\n      call continuity_compute_fluxes_barotropic(grid, metrics, ct, bs)\n      call coriolis_adv_compute_tendencies_barotropic(grid, metrics, cor, bs)\n      call continuity_apply_fluxes_barotropic(bs, dt)\n      call coriolis_adv_apply_tendencies_barotropic(cor, bs, dt)\n\n      ! ---- 3. Stage 2: tendencies at u^(1), FE step -> u^(1) + dt*L(u^(1)) ----\n      call continuity_compute_fluxes_barotropic(grid, metrics, ct, bs)\n      call coriolis_adv_compute_tendencies_barotropic(grid, metrics, cor, bs)\n      call continuity_apply_fluxes_barotropic(bs, dt)\n      call coriolis_adv_apply_tendencies_barotropic(cor, bs, dt)\n\n      ! ---- 4. RK2 average: u^(n+1) = 1/2 * (u^n + (u^(1) + dt*L(u^(1)))) ----\n      do concurrent(j=1:ny, i=1:nx)\n         bs%h(i, j) = 0.5_wp*(bs%h0(i, j) + bs%h(i, j))\n      end do\n      do concurrent(j=1:ny_uface, i=1:nx_face)\n         bs%u_face_x(i, j) = 0.5_wp*(bs%u_face_x0(i, j) + bs%u_face_x(i, j))\n      end do\n      do concurrent(j=1:ny_face, i=1:nx_vface)\n         bs%v_face_y(i, j) = 0.5_wp*(bs%v_face_y0(i, j) + bs%v_face_y(i, j))\n      end do\n\n      dyn%outer_step_count = dyn%outer_step_count + 1\n   end subroutine ocean_dyn_step_barotropic\n```\n\nWhich to me is the magic of Fortran, if you had gone the full object oriented way in C++ would you get (AI generated)…because a lot of the new rewrites focus on migrating to C++:\n\n``` js\ntemplate <class Op>\nconcept BarotropicOperator = requires(Op op, const HGrid& g, const OceanMetrics& m,\n                                      BarotropicCGridState& s, wp dt) {\n    op.compute(g, m, s);\n    op.apply(s, dt);\n};\n\ntemplate <BarotropicOperator Cont, BarotropicOperator Cor>\nclass BarotropicRK2Stepper {\npublic:\n    BarotropicRK2Stepper(Cont& cont, Cor& cor) : cont_(cont), cor_(cor) {}\n\n    void step(const HGrid& grid, const OceanMetrics& metrics,\n              OceanDyn& dyn, BarotropicCGridState& s, wp dt) {\n        s.saveSnapshot();                              // 1. save u^n\n        forwardEulerStage(grid, metrics, s, dt);       // 2. u* = u^n + dt L(u^n)\n        forwardEulerStage(grid, metrics, s, dt);       // 3. u* + dt L(u*)\n        s.averageWithSnapshot();                       // 4. Heun average\n        ++dyn.outerStepCount;\n    }\nprivate:\n    void forwardEulerStage(const HGrid& grid, const OceanMetrics& metrics,\n                           BarotropicCGridState& s, wp dt) {\n        cont_.compute(grid, metrics, s);\n        cor_ .compute(grid, metrics, s);\n        cont_.apply(s, dt);\n        cor_ .apply(s, dt);\n    }\n    Cont& cont_;\n    Cor&  cor_;\n};\n```\n\nBut now, unless you’re very familiar with C++ and objects this might read like a foreign language. Because each object abstracts away what compute is doing you get that indirection that is very nice for generality but a bit difficult to cope if you’re starting out. I think heavily from the academic perspective, new students often without experience at all.\n\nI feel that the Fortran implementation has a simplicity to the reader that is difficult to get with C++ unless you write C++ that looks more akin to C. I feel that the Fortran can look like how someone would write a python code for this to prototype: (also AI generated)\n\n``` python\ndef ocean_dyn_step_barotropic(grid, metrics, dyn, cor, ct, bs, dt):\n    bs.h0[:]        = bs.h\n    bs.u_face_x0[:] = bs.u_face_x\n    bs.v_face_y0[:] = bs.v_face_y\n\n    continuity_compute_fluxes_barotropic(grid, metrics, ct, bs)\n    coriolis_adv_compute_tendencies_barotropic(grid, metrics, cor, bs)\n    continuity_apply_fluxes_barotropic(bs, dt)\n    coriolis_adv_apply_tendencies_barotropic(cor, bs, dt)\n    # ... stage 2 ...\n\n    bs.h[:]        = 0.5 * (bs.h0 + bs.h)\n    bs.u_face_x[:] = 0.5 * (bs.u_face_x0 + bs.u_face_x)\n    bs.v_face_y[:] = 0.5 * (bs.v_face_y0 + bs.v_face_y)\n    dyn.outer_step_count += 1\n```\n\nSo, basically my idea of a refactor/port/modernization is hide the complexity away through nice interfaces without abstracting too much that the algorithm gets lost. I like the idea of data structures carrying data used in computation and then use procedural style calls for this. I am sure you could rewrite my dynamics step to look nicer. My main goal is: write your code such that a new student can read it and compare it to the algorithm they see in a book/paper", "url": "https://wpnews.pro/news/ai-coding-assistants-vs-codee-insights-on-fortran-correctness-and-modernization", "canonical_source": "https://fortran-lang.discourse.group/t/ai-coding-assistants-vs-codee-insights-on-fortran-correctness-and-modernization/10472?page=2#post_38", "published_at": "2026-06-23 14:35:00+00:00", "updated_at": "2026-06-24 00:13:22.595181+00:00", "lang": "en", "topics": ["developer-tools", "artificial-intelligence"], "entities": ["Fortran", "GPU", "PPM"], "alternates": {"html": "https://wpnews.pro/news/ai-coding-assistants-vs-codee-insights-on-fortran-correctness-and-modernization", "markdown": "https://wpnews.pro/news/ai-coding-assistants-vs-codee-insights-on-fortran-correctness-and-modernization.md", "text": "https://wpnews.pro/news/ai-coding-assistants-vs-codee-insights-on-fortran-correctness-and-modernization.txt", "jsonld": "https://wpnews.pro/news/ai-coding-assistants-vs-codee-insights-on-fortran-correctness-and-modernization.jsonld"}}