04:00
2026-05-29
arxiv.org
machine-learning
Behavior-Aware Auxiliary Corrections for Off-Policy Temporal-Difference Prediction
Researchers introduced behavior-aware auxiliary corrections for off-policy temporal-difference learning, replacing the standard covariance matrix with the behavior Bellman matrix in the TDC and TDRC aโฆ