Human-in-the-Loop Contextual Bandits for Short-Term Rental Dynamic Pricing: Structural Equivalence of Historical Warm-Up and Approval-Gated Live Learning

wpnews.pro

cd /news/machine-learning/human-in-the-loop-contextual-bandits… · home › topics › machine-learning › article

[ARTICLE · art-19865] src=arxiv.org pub=2026-06-03T04:00Z topic=machine-learning verified=true sentiment=· neutral

Human-in-the-Loop Contextual Bandits for Short-Term Rental Dynamic Pricing: Structural Equivalence of Historical Warm-Up and Approval-Gated Live Learning

Researchers have developed a Human-in-the-Loop Gated Bandit (HITL-GB) framework for dynamic pricing in short-term rental markets, where a contextual bandit algorithm generates price recommendations but a human agent must approve each decision. The framework demonstrates that historical pricing data collected under a prior deterministic policy is structurally equivalent to on-policy warm-up data, compressing the cold-start period from approximately 150 episodes to just 30 episodes in real-world testing. The findings extend beyond rental markets to any high-stakes domain requiring human oversight, including clinical drug dosing and credit origination, suggesting that mandatory human approval is a statistical asset rather than a deployment constraint.

read1 min publishedJun 3, 2026

arXiv:2606.02595v1 Announce Type: new Abstract: Dynamic pricing in short-term rental (STR) markets presents a distinctive challenge for online learning algorithms: pricing decisions carry significant financial risk, operators require explainability, and market feedback is sparse (one booking outcome per listed night). We introduce the Human-in-the-Loop Gated Bandit (HITL-GB) framework, in which a contextual bandit algorithm generates price recommendations but a human agent retains authority to accept, modify, or reject each recommendation before it is applied. We show that under this approval constraint, historical pricing data -- collected under a prior deterministic policy -- is structurally equivalent to on-policy warm-up data for initialising the bandit's posterior, bypassing the weeks-to-months cold-start period that renders pure online bandit learning impractical in sparse-feedback markets. We formalise the approval-gated reward signal, derive a regularised ridge-regression warm-up procedure from historical episodes, and validate the approach on real STR production data (anonymised urban market, 2 rooms, April 2022 -- April 2026, 1,461 nightly pricing episodes). Our warm-up procedure compresses effective cold-start from ~150 episodes to ~30 episodes when initialising agents from the Hierarchical Factored Thompson Sampling (HF-TS) family. We further argue that the structural equivalence result is domain-agnostic: any high-stakes domain where human approval is legally or operationally required -- including clinical drug dosing, credit origination, content moderation, and radiological diagnosis -- satisfies the same conditions and benefits from the same warm-up strategy. In regulated industries, mandatory human oversight is thus a statistical asset rather than a deployment constraint.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/human-in-the-loop-contex…

Read original on arxiv.org → arxiv.org/abs/2606.02595

mentioned entities

Hierarchical Factored Thompson Sampling

HF-TS

metadata

slughuman-in-the-loop-contextual-bandits-for-short-term-rental-dynamic-pricing-of-up

topic#machine-learning

secondary1 topics

sentimentneutral

langen

canonicalarxiv.org

navigation

← prevContra Costa County’s Measure A …

── more in #machine-learning 4 stories · sorted by recency

arxiv.org · 3 Jun · #machine-learning

Spectral Asymptotics of Neural Network Loss Landscapes: An Exact Decomposition of the Curvature Exponent

arxiv.org · 3 Jun · #machine-learning

Making Brain-Computer Interfaces More Secure

arxiv.org · 3 Jun · #machine-learning

Graph Mamba Survival Analysis Based on Topology-Aware ordering

dev.to · 3 Jun · #machine-learning

TradingAgents's 5 Hidden Uses That 90% of Quant Devs Miss in 2026

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required