{"slug": "architecture-driven-shift-towards-a-lightweight-selector-for-capturing-the-of", "title": "Architecture-driven Shift: towards a lightweight selector for capturing the trends of logit shift", "summary": "Researchers have developed Architecture-driven Shift (ADS), a lightweight metric that captures logit shift trends in continual learning models using few data samples, addressing the high computational cost of traditional methods. The framework decouples logit shift into architecture and data dependencies, establishing a theoretical relationship between heterogeneous network structures and prior task performance. Empirical tests across 175 architectures show strong monotonic correlation with logit shift, enabling efficient model selection for reliable continual learning.", "body_md": "arXiv:2605.27469v1 Announce Type: new\nAbstract: Continual Learning (CL) is a practical paradigm to utilize power of deep pre-trained neural networks, but which pre-trained model has a better ability to balance ``Plasticity-Stability\", deserving to be chosen? The logit shift serves as a natural proxy because it represents the logit shift in CL scenarios. However, obtaining the logit shift requires huge computational cost, which hinders large-scale model selection. Existing theoretical analyses fail to offer an efficient alternative because of the assumption of uniform hidden layer widths, which ignores the structural heterogeneity (variable width and depth) of real-world architectures. This raises a critical question: what theoretically relationship can be identified between heterogeneous architecture and logit shift on prior tasks (that the model has been trained on)? To answer the question, we decouple logit shift into architecture dependency and data dependency to establish our framework, which reveals that the combination of two dependency, defined as Architecture-driven Shift (ADS), that can capture the logit shift tendency well computable with few data samples. Specifically, for a well-optimized model on prior tasks, higher ADS is associated with a larger logit shift after training on the current task, which derived based on three mechanistic components: (1) spectral norm scaling of weight matrix gradients with layer width, (2) the optimization path length of the new task, and (3) the asymptotic task conflict in wide networks. Extensive empirical results across more than 175 diverse architectures demonstrate a strong monotonic correlation (the weakest Spearman's $r_s=0.731$) between ADS and logit shift. Practically, we demonstrate that ADS can serve as a lightweight proxy of the expected calibration error, which is a widely used metric for reliable CL model selection, on three datasets across six scenarios.", "url": "https://wpnews.pro/news/architecture-driven-shift-towards-a-lightweight-selector-for-capturing-the-of", "canonical_source": "https://arxiv.org/abs/2605.27469", "published_at": "2026-05-28 04:00:00+00:00", "updated_at": "2026-05-28 04:28:59.085636+00:00", "lang": "en", "topics": ["machine-learning", "neural-networks", "artificial-intelligence", "ai-research"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/architecture-driven-shift-towards-a-lightweight-selector-for-capturing-the-of", "markdown": "https://wpnews.pro/news/architecture-driven-shift-towards-a-lightweight-selector-for-capturing-the-of.md", "text": "https://wpnews.pro/news/architecture-driven-shift-towards-a-lightweight-selector-for-capturing-the-of.txt", "jsonld": "https://wpnews.pro/news/architecture-driven-shift-towards-a-lightweight-selector-for-capturing-the-of.jsonld"}}