Lisa Lock
Scientific Editor
Robert Egan
Senior Editor
Climate and ocean models use a series of equations to represent complex natural processes. However, the equations used in these models are often derived from limited observations and a series of assumptions.
Machine learning could be a powerful approach for analyzing data to directly "discover" the equations that underlie large, complex systems such as the changing biogeochemistry of the ocean. Machine learning has yet to be fully tested for equation discovery, however, meaning its capabilities and potential shortcomings aren't fully known.
To better understand machine learning's applicability for equation discovery, Chengwang Wang and colleagues turned to an ocean biogeochemical model, examining whether machine learning could re-create known equations governing colloidal iron (a key part of the ocean iron cycle) from a relatively sparse data set.
The machine learning technique discovered equations that performed comparably to the original equations used in the model while simultaneously uncovering new information about the underlying data sets and the iron cycle in general. This result is a step toward validating the use of equation discovery for other similarly complex processes in the real world, according to the authors. The findings are published in the journal Geophysical Research Letters.
The authors used a kind of equation discovery called symbolic regression, which asks a machine learning model to begin with mathematical operators and from there discover optimal equations for a particular data set. With this approach, the authors derived a suite of six equations that described how colloidal iron, which consists of microscopic suspended iron particles, behaves in the oceans. The equations discovered via symbolic regression differed from the known equations but are functionally simpler and produce large-scale patterns equally well, the authors say.
The equations also contain new insights into iron cycling. For example, they do not include salinity, likely because that variable does not change much throughout the ocean. The equations additionally show that full-water column sampling approaches produce better results than those taken from specific depths, helping to guide future sampling work. Finally, the authors also found that the equations discovered from sparse data sets can be robust if colloidal iron data are measured where existing dissolved iron samples have been taken.
This gap highlights a need for future sampling to capture colloidal iron data throughout the water column and to focus on expanding coverage of undersampled ocean basins, they argue. Scientists with unpublished iron speciation data from GEOTRACES cruises can help this effort by sharing their data, they add.
Publication details
Chengwang Wang et al, Toward Using Equation Discovery to Generate Parameterizations of Biogeochemical Processes, Geophysical Research Letters (2026). DOI: 10.1029/2025gl121380
Journal information:
[Geophysical Research Letters](https://phys.org/journals/geophysical-research-letters/)
[
](http://www.agu.org/journals/gl/index.shtml)
Provided by Eos
This story is republished courtesy of Eos, hosted by the American Geophysical Union. Read the original story here.
Citation: Machine learning rediscovers equations governing ocean biogeochemistry (2026, June 24) retrieved 25 June 2026 from https://phys.org/news/2026-06-machine-rediscovers-equations-ocean-biogeochemistry.html