cd /news/machine-learning/implementation-of-reinforcement-lear… · home topics machine-learning article
[ARTICLE · art-40274] src=arxiv.org ↗ pub= topic=machine-learning verified=true sentiment=· neutral

Implementation of reinforcement learning in chemical reaction networks: application to phototaxis as curiosity-driven exploration

Researchers developed a framework linking partially observable Markov decision processes with biochemical reaction dynamics to model phototaxis in unicellular algae as an information-driven process. Using inverse reinforcement learning on Chlamydomonas trajectories, they showed that run-tumble behavior emerges as a curiosity-driven exploration strategy to reduce sensory ambiguity.

read1 min views1 publishedJun 26, 2026

arXiv:2606.26168v1 Announce Type: new Abstract: Living systems navigate environments using noisy and incomplete sensory signals. In unicellular algae, phototaxis is often modeled as a mechanistic run--tumble process driven by stimulus--response rules. However, such descriptions overlook how organisms actively sample their environment to reduce sensory ambiguity. From a minimal cognition perspective, we reframe this navigation as a subjective, information-driven sensorimotor process. To this end, we propose a framework linking a Partially Observable Markov Decision Process (POMDP) with biochemical reaction dynamics. Environmental variables are hidden, while the cell updates a minimal internal state from each observation through a memoryless Bayesian step. These internal dynamics balance orienting toward light with exploratory reorientation and can be implemented through Chemical-Reaction-Network Ordinary Differential Equations (CRN--ODEs). Our model includes a biophysical observation process for photoreception and a chemically computable polynomial bound on information gain. Using Inverse Reinforcement Learning (IRL) on 30 experimentally recorded Chlamydomonas trajectories, we infer the behavioral objective consistent with observed phototactic motion and benchmark the resulting dynamics with standard Stochastic Simulation Algorithm (SSA) baselines. Our model reproduces the empirical alignment-to-light distribution, comparable to objective SSA baselines on this dataset. Within this framework, run--tumble alternation emerges as an information-acquisition strategy: tumbling reorients the cell to sample new sensory configurations and resolve sensor ambiguity, demonstrating how intracellular biochemical networks can support adaptive information-seeking behavior in cellular navigation.

── more in #machine-learning 4 stories · sorted by recency
── more on @chlamydomonas 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/implementation-of-re…] indexed:0 read:1min 2026-06-26 ·