When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency¶
Venue: iclr2026 (Poster) Authors: OpenReview: https://openreview.net/forum?id=05PqjBzN6S
Relevance¶
LLM score: 1/3 — The paper mentions low per-update time and memory overhead, but its main contribution is a data-sufficiency test for retraining after concept drift, not directly advancing energy-efficient training or Sutro Group priorities like data movement, sparsity, or hardware-aware kernels.
Keyword hits: locality
TLDR¶
(none provided)
Abstract¶
Sudden concept drift makes previously trained predictors unreliable, yet deciding when to retrain and what post-drift data size is sufficient is rarely addressed. We propose CALIPER —a detector- and model-agnostic, data-only test that estimates the post-drift data size required for stable retraining. CALIPER exploits state dependence in streams generated by dynamical systems: we run a single-pass weighted local regression over the post-drift window and track a one-step proxy error as a function of a locality parameter $\theta$. When an effective sample size gate is satisfied, a monotonically non-increasing trend in this error with increasing a locality parameter indicates that the data size is sufficiently informative for retraining. We also provide a theoretical analysis of our method, and we show that the algorithm has a low per-update time and memory. Across datasets from four heterogeneous domains, three learner families, and two detectors, CALIPER consistently matches or exceeds the best fixed data size for retraining while incurring negligible overhead and often outperforming incremental updates. CALIPER closes the gap between drift detection and data-sufficient adaptation in streaming learning.
Keywords¶
Concept drift, Stream learning, Data sufficiency, Time series