A Three-Phase Foundation Model for Tax-Aware Personalized Portfolio Management

Researchers introduced a three-phase deep reinforcement learning system for personalized portfolio management that overcomes ticker lock-in, monolithic objectives, and static user models. Phase 1 uses a self-supervised cross-asset encoder with a Chronos time series foundation model; Phase 2 employs a Mixture-of-Experts actor-critic for six investment objectives; Phase 3 adds a LoRA personalization layer fine-tuned on individual brokerage data. The system enables tax-aware, goal-adaptive portfolio management without retraining for new assets.

arXiv:2606.30997v1 Announce Type: new Abstract: We present a three-phase deep reinforcement learning system for personalized portfolio management that addresses three limitations shared by all prior financial RL work: 1 ticker lock-in, 2 monolithic objectives , and 3 static user models. Phase 1 pretrains a ticker-identity-free cross asset encoder via self-supervised learning on a multi-asset corpus, augmented by a frozen parallel branch using Chronos, a T5-based time series foundation model, fused via a learned gating mechanism. To our knowledge, this is the first application of a time series foundation model to portfolio management RL. The encoder generalizes to any publicly traded asset via a 50-dimensional observable metadata vector that requires no retraining for new tickers. Phase 2 fine-tunes a MoE Mixture of Experts portfolio actor critic with PPO under an objective-conditioned reward that simultaneously serves six distinct investment goals sampled per episode: short-term alpha, short-term gain, long-term gain, capital preservation, tax-loss harvesting, and long-term-gains-only. A MoE architecture assigns each objective to a specialized expert head momentum, growth, defensive, tax-aware , and a learned intent router blends experts based on the active objective and current market regime, which eliminates cross-objective gradient conflict. Phase 3 adds a lightweight personalization layer further adapted at inference time to each individual via a 76-parameter LoRA module fine-tuned on real brokerage transaction history, inferring investment objectives from revealed trading behavior rather than questionnaires. A natural language intent parser converts free-form goals directly into structured investment objective parameters.