Papers
Score 3
- 3 — ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality — Directly advances scaling laws for efficient multilingual training, optimizing compute and data allo
- 3 — ENERGYLLM-BENCH:AREPRODUCIBLEBENCHMARKFORENERGYAND CARBONFOOTPRINTOFLARGELANGUAGEMODELS — The paper introduces a reproducible benchmark for energy-efficient LLM evaluation, including low-pre
- 3 — Scaling Laws for Fully Sparsely-Activated Large Language Models — The paper directly investigates sparsity in LLMs, deriving scaling laws for fully sparsely-activated
- 3 — Pruning with Occam's Razor — The paper directly advances energy-efficient training by integrating pruning with gradient descent t
- 3 — Understanding Dataset Distillation via Spectral Filtering — Directly advances energy-efficient training through dataset distillation, which reduces data require
- 3 — Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention — Directly addresses low-precision training instability and proposes a fix, aligning with Sutro Group'
- 3 — Efficient Resource-Constrained Training of Transformers via Subspace Optimization — Directly advances energy-efficient training by reducing memory and compute via subspace optimization
- 3 — Toward Bit-Efficient Dataset Condensation: A General Framework — The paper introduces a low-precision quantization method for dataset condensation, directly reducing
- 3 — Quantization with Purpose: Loss-Aware Bit Allocation for Gradient Compression — Directly advances energy-efficient training by reducing data movement via loss-aware gradient quanti
- 3 — ESSA: Evolutionary Strategies for Scalable Alignment — The paper introduces a gradient-free, hardware-friendly alignment method using evolutionary strategi
- 3 — Dataset Distillation via Committee Voting — Directly advances efficient training through dataset distillation, a core Sutro Group priority, redu
- 3 — Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth — The paper directly advances low-precision/quantization and memory-efficient fine-tuning, aligning wi
- 3 — Efficient Fine-tuning with Decomposed Foundation Model — Directly advances memory- and compute-efficient fine-tuning through model decomposition, integrating
- 3 — Regularization can make diffusion models more efficient — The paper leverages sparsity to reduce computational complexity of diffusion models, directly aligni
- 3 — Mitigating Non-IID Drift in Zeroth-Order Federated LLM Fine-Tuning with Transferable Sparsity — Directly advances sparsity and communication efficiency (data movement) in federated LLM fine-tuning
- 3 — Fantastic Pretraining Optimizers and Where to Find Them — The paper directly evaluates optimizers for pretraining efficiency, a named Sutro Group priority, ai
- 3 — Rapid Training of Hamiltonian Graph Networks Using Random Features — Proposes a gradient-descent-free training method achieving 150-600x speedup, directly advancing trai
- 3 — HiDivDrop: Vision Token Reduction in MLLMs via Late Injection and Differentiable Top-K — The paper directly advances energy-efficient training by sparsifying visual tokens, reducing data mo
- 3 — The Markovian Thinker — The paper directly reduces data movement and memory footprint for reasoning models via a Markovian c
- 3 — Scaling with Collapse: Efficient and Predictable Training of LLM Families — Directly advances compute-efficient training and scaling laws, a named Sutro priority, by showing lo
- 3 — Rethinking JEPA: Compute‑Efficient Video Self-Supervised Learning with Frozen Teachers — Directly improves training compute efficiency via a decoupled two-stage method, achieving better per
- 3 — Shadow loss: Memory-linear deep metric learning with anchor projection — The paper directly advances energy-efficient training by reducing memory buffer from O(S*D) to O(S)
- 3 — Towards Distributed Neural Architectures — Directly advances sparsity and data movement efficiency via learned dynamic routing and compute allo
- 3 — MAGNET: Multi-granular Adaptive Gradient-guided Knowledge Distillation for Pareto-Efficient Tuning — Directly advances energy-efficient training via knowledge distillation with gradient-guided sparsity
- 3 — LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts — The paper directly advances energy-efficient training by introducing a dynamic routing mechanism tha
- 3 — Textual Equilibrium Propagation for Deep Compound AI Systems — Proposes a local learning method inspired by biologically-plausible Equilibrium Propagation, directl
- 3 — Dynamic Rank Adjustment for Accurate and Efficient Neural Network Training — Directly advances energy-efficient training by reducing computational cost through low-rank training
- 3 — FLARE: Fast Low-rank Attention Routing Engine — Proposes an efficient low-rank attention mechanism that reduces compute and memory, avoids materiali
- 3 — MT-DAO: Multi-Timescale Distributed Adaptive Optimizers with Local Updates — Directly tackles communication efficiency by reducing data movement in distributed training of large
- 3 — Quantize-then-Rectify: Accelerating VQ-VAE Training in Latent Feature Space — The paper directly advances energy-efficient training by dramatically reducing VQ-VAE training cost
- 3 — Stochastic Layer-wise Learning: Scalable and Efficient Alternative to Backpropagation — Proposes a local learning method that eliminates backpropagation, reducing memory and data movement,
- 3 — DES-LOC: Desynced Low Communication Adaptive Optimizers for Foundation Models — The paper directly advances energy-efficient training by reducing communication costs in distributed
- 3 — A Recovery Guarantee for Sparse Neural Networks — The paper provides a sparse recovery guarantee for neural networks using iterative hard thresholding
- 3 — Online Pseudo-Zeroth-Order Training of Neuromorphic Spiking Neural Networks — Directly advances biologically-plausible local learning and neuromorphic hardware-friendly training,
- 3 — Mutual Information Preserving Neural Network Pruning — Proposes a sparsity method (pruning) to reduce neural network resource requirements, directly aligne
- 3 — SMixer: Rethinking Efficient-Training and Event-Driven SNNs — Directly advances energy-efficient training of spiking neural networks through sparsity (spatial-tem
- 3 — Winner-Take-All Spiking Transformer for Language Modeling — The paper directly advances energy-efficient AI via sparse, spike-driven, softmax-free spiking trans
- 3 — NorMuon: Making Muon more efficient and scalable — Directly advances training efficiency and optimizer design, with a distributed implementation that a
- 3 — AMiD: Knowledge Distillation for LLMs with $\alpha$-mixture Assistant Distribution — The paper introduces a knowledge distillation framework to reduce computational and memory costs of
- 3 — MoE-PHDS: One MoE checkpoint for flexible runtime sparsity — Directly addresses sparsity and training efficiency by enabling a single MoE checkpoint to serve mul
Score 2
- 2 — Larger Datasets Can Be Repeated More: A Theoretical Analysis of Multi-Epoch Scaling in Linear Regression — The paper analyzes multi-epoch training efficiency, quantifying when data reuse is effective, which
- 2 — Modular Distillation Makes Small Models Think Like Big Ones — The paper's main contribution is a modular distillation framework that improves computing efficiency
- 2 — Learning-Domain Decomposition: Interpreting Training Dynamics via Loss Vectors — The paper's data pruning method, enabling training with 5% of data, directly contributes to training
- 2 — Curriculum-Guided Layer Scaling for Language Model Pretraining — The paper's primary contribution is a compute-efficient pretraining method that reduces resource con
- 2 — DSA: Efficient Inference For Video Generation Models via Distributed Sparse Attention — The paper introduces sparse attention as a core method for efficient inference, directly advancing s
- 2 — UNIFIED MULTI-TEACHER DISTILLATION ACROSS HYBRID NEURAL ARCHITECTURES — The paper's main contribution is a multi-teacher distillation method that reduces training data requ
- 2 — Learning a Zeroth-Order Optimizer for Fine-Tuning LLMs — The paper proposes a learned zeroth-order optimizer to reduce memory consumption during LLM fine-tun
- 2 — Matrix-Free Least Squares Solvers: Values, Gradients, and What to Do With Them — The paper makes sparsity (enforcing weight sparsity on a 50M parameter model) a main contribution, d
- 2 — Critique-Guided Distillation for Efficient and Robust Language Model Reasoning — The paper proposes a knowledge distillation method that significantly reduces compute requirements (
- 2 — Boost Post-Training Quantization via Null Space Optimization for Large Language Models — The paper focuses on post-training quantization for LLMs, aligning with the Sutro Group's interest i
- 2 — HOBA: Higher-Order Block-Diagonal Attention Unrolling for Transformer — The paper's main contribution is a sparse attention mechanism (block-diagonal) that reduces computat
- 2 — Data-Efficient Training by Evolved Sampling — The paper's main contribution is dynamic data selection for training acceleration, a form of trainin
- 2 — EAST: Early Action Prediction Sampling Strategy with Token Masking — Token masking procedure cuts memory usage and accelerates training, making training efficiency a mai
- 2 — LoRA Meets Second-Order Optimization: Towards Optimal Low-Rank Updates — Proposes a second-order optimizer for low-rank fine-tuning that improves convergence and reduces tra
- 2 — Catalyst: Reveal the Geometry of Pruning by Reshaping Neural Network — The paper introduces a novel structured pruning regularization that is a core sparsity technique, di
- 2 — MSAVQ: Multi-dimensional Sensitivity-Aware Vector Quantization for Ultra-Low-Bit Vision-Language Models — Quantization is a main contribution, aligning with the low-precision interest, but the paper targets
- 2 — Boomerang Distillation Enables Zero-Shot Model Size Interpolation — The paper's main contribution is a distillation method that reduces training cost by interpolating m
- 2 — NIRVANA: Structured Pruning Reimagined for Large Language Models Compression — The paper's main contribution is structured pruning, directly related to the sparsity priority area
- 2 — Generalization and Scaling Laws for Mixture-of-Experts Transformers — The paper's main contribution is theoretical scaling laws and generalization bounds for sparse (MoE)
- 2 — Listens like Mel: Boosting Latent Audio Diffusion with Channel Locality — Faster convergence directly improves training efficiency, making efficiency a main contribution, tho
- 2 — AlignPrune: Robust Dynamic Data Pruning through Loss Trajectory Alignment — The paper advances dynamic data pruning, a technique for efficient training by reducing data usage,
- 2 — Attention and Compression is all you need for Controllably Efficient Language Models — The paper's main contribution is an efficient architecture that reduces compute and memory via compr
- 2 — CoDA: From Text-to-Image Diffusion Models to Training-Free Dataset Distillation — The paper's main contribution is a training-free dataset distillation method that eliminates the pro
- 2 — Asynchronous Matching with Dynamic Sampling for Multimodal Dataset Distillation — The paper advances dataset distillation, a technique to reduce training data size and improve effici
- 2 — Exploring Knowledge Purification in Multi-Teacher Knowledge Distillation for LLMs — The paper proposes knowledge purification to reduce resource demands in multi-teacher distillation,
Score 1
- 1 — PersonalQ: Select, Quantize, and Serve Personalized Diffusion Models for Efficient Inference — The paper focuses on inference-time quantization for personalized diffusion models, tangentially rel
- 1 — When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency — The paper mentions low per-update time and memory overhead, but its main contribution is a data-suff
- 1 — LM-mixup: Text Data Augmentation via Language Model based Mixup — The paper addresses data efficiency through instruction distillation, which tangentially relates to
- 1 — SMARAN: Closing the Generalization Gap with Performance Driven Optimization Method — The paper proposes an optimizer that adjusts learning rate based on performance, which could offer m
- 1 — TiTok: Transfer Token-level Knowledge via Contrastive Excess to Transplant LoRA — The paper reduces overhead by avoiding a discriminator, but its main focus is knowledge transfer for
- 1 — LayerDecompose: Exploring weight sharing for Large Language Model Compression — Focuses on post-training compression for deployment, not energy-efficient training or the Sutro Grou
- 1 — Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs — The paper focuses on inference efficiency (throughput) rather than training energy, and while it use
- 1 — FitLight: Federated Imitation Learning for Plug-and-Play Autonomous Traffic Signal Control — Mentions model pruning for resource-constrained deployment, but the main contribution is federated i
- 1 — ECMNet: Lightweight Semantic Segmentation with Efficient CNN-Mamba Network — The paper proposes a lightweight segmentation model focusing on inference efficiency (parameter coun
- 1 — Disentangling Token Dependencies for Efficient Decoding in Diffusion Language Models — The paper focuses on inference efficiency for diffusion language models via knowledge distillation,
- 1 — MARS: Mamba-driven Adaptive Reordering Scheme for Semantic Occupancy Prediction in Autonomous Driving — The paper focuses on a task-specific architecture for autonomous driving that reduces memory usage a
- 1 — Energy Efficient Language Models through Dynamic Sparsity — Paper focuses on inference efficiency through activation sparsity and quantization for deployment, n
- 1 — LoRA: The Past, Present, and Future — The paper focuses on parameter-efficient fine-tuning (LoRA variants), tangentially relevant as it me
- 1 — Training-Free Determination of Network Width via Neural Tangent Kernel — The paper addresses efficient model sizing to avoid overparameterization, which indirectly reduces t
- 1 — Self-Correction via Task Distillation — The paper uses task distillation to improve self-correction, which incidentally reduces fine-tuning
- 1 — One Stone Three Birds: Training-free Core-context-aware Attention for Efficient LLM Prefilling, Decoding, and KV Caching — The paper focuses on inference-time efficiency via training-free sparse attention, tangential to the
- 1 — Learning linear state-space models with sparse system matrices — The paper focuses on sparsity in linear state-space models for system identification, which is tange
- 1 — PTQTP: Post-Training Quantization to Trit-Planes for Large Language Models — The paper focuses on post-training quantization for efficient inference, not on energy-efficient tra
- 1 — Vision as LoRA — Mentions efficiency via LoRA merging and distillation for training acceleration, but main contributi
- 1 — LeSTD: LLM Compression via Learning-based Sparse Tensor Decomposition — The paper focuses on post-training compression using sparsity, which is tangential to Sutro Group's
- 1 — Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning — The paper mentions computational efficiency of critic training as a motivation for using lightweight
- 1 — Vulcan: Crafting Compact Class-Specific Vision Transformers For Edge Intelligence — The paper uses structured pruning for post-training model compression, touching on sparsity but not
- 1 — BDQ: Bidirectional Diagonal Quantization for LLMs — The paper focuses on post-training quantization for inference efficiency, not on energy-efficient tr
- 1 — Representation Finetuning for Continual Learning — The paper proposes a parameter-efficient finetuning method for continual learning, tangentially rela
- 1 — Analyzing and Internalizing Complex Policy Documents for LLM Agents — Tangential: addresses inference efficiency via policy internalization, not energy-efficient training
- 1 — Semantic Uncertainty Quantification of Hallucinations in LLMs: A Quantum Tensor Network Based Method — The paper touches on efficiency by evaluating robustness under quantization for resource-constrained
- 1 — Emergent Discrete Controller Modules for Symbolic Planning in Transformers — Mentions a small FLOPs overhead and sparse application, but the main focus is on symbolic planning w
- 1 — PUM-Net: Plastic Unified Memory Network with Associative Interaction for Long-Context State Space Models — Tangential: mentions training cost reduction by avoiding sequence length inflation, but the main con
- 1 — You Do Not Fully Utilize Transformer's Representation Capacity — The paper mentions efficiency gains (lower perplexity per FLOP) but focuses on representation capaci
- 1 — FastALM: Hierarchical Frame Q-Former for Effective Audio Modality Adaptation — The paper focuses on efficient inference for audio-language models by compressing speech features, w
- 1 — A Separable Self-attention Inspired by the State Space Model for Computer Vision — The paper proposes an efficient separable self-attention with linear complexity, but its main contri
- 1 — Fast-dLLM v2: Efficient Block-Diffusion LLM — The paper primarily contributes to inference efficiency through block diffusion and hierarchical cac
- 1 — ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference — The paper focuses on post-training quantization for efficient LLM inference, not on training efficie
- 1 — Scalable Variational Bayesian Fine-Tuning of LLMs via Orthogonalized Low-Rank Adapter — The paper uses parameter-efficient fine-tuning (LoRA variant) but its main contribution is uncertain
- 1 — Generalised Flow Maps for Few-Step Generative Modelling on Riemannian Manifolds — The paper aims to improve inference-time efficiency (few-step sampling) rather than training efficie
- 1 — Personalization Under Value Conflict: Resolving Contradictory Preferences with Paired Fine-Tuning — The paper mentions reduced data requirements as a secondary benefit, but the main contribution is pe
- 1 — AnyDepth: Depth Estimation Made Easy — The paper proposes a lightweight decoder and data filtering for depth estimation, touching on effici
- 1 — Entropy-Select: Training-Free Local Entropy Token Compression for Video LLMs — Tangential: the paper proposes token compression for inference efficiency in video LLMs, which touch
- 1 — CoKV: Optimizing LLM Inference with Game-Theoretic Adaptive KV Cache — The paper addresses inference memory efficiency via KV cache optimization, which is tangential to th
- 1 — TSDINO: Teacher–Student Self-Distillation Framework for Robust Pre-training of Time-Series Foundation Models — The paper uses self-distillation for pre-training but does not focus on energy efficiency or data mo
- 1 — Think Twice, Act Once: Token-Aware Compression and Action Reuse for Efficient Inference in Vision-Language-Action Models — Addresses inference efficiency via token pruning and action reuse, not training efficiency or any Su
- 1 — DeFake: Data-Efficient Adaptation for Generalized Deepfake Detection — The paper focuses on data-efficient adaptation (few-shot learning for deepfake detection), which is
- 1 — DoRAN: Stabilizing Weight-Decomposed Low-Rank Adaptation via Noise Injection and Auxiliary Networks — Focuses on parameter-efficient fine-tuning stabilization and sample efficiency, tangentially related
- 1 — Eliminating VAE for Fast and High-Resolution Generative Detail Restoration — The paper's primary contribution is inference acceleration and memory reduction via VAE elimination
- 1 — pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation — The paper uses distillation to accelerate inference via few-step generation, which touches on the li
- 1 — Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance — MoE is efficiency-related, but the paper focuses on improving expert specialization for vision, not
- 1 — KAN or MLP? Point Cloud Shows the Way Forward — The paper introduces an efficient variant of KAN to reduce parameters and computational cost, but it
- 1 — Partial-Correlation Learning for Large Language Models with Skip-Tuning — Skip-Tuning reduces fine-tuning data by using noncontiguous segments, potentially offering efficienc
- 1 — LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs — Paper focuses on post-training quantization for inference efficiency, which is tangentially related
- 1 — KDP: Simplifying Representation Dynamics in Kernel Space — The paper proposes a model compression method via layer pruning, which yields inference efficiency g
- 1 — TreeSNNs: Temporal Resolution Ensembled SNNs for Neuromorphic Action Recognition — Mentions SNN energy efficiency as motivation but the main contribution is accuracy improvement via t
- 1 — Mechanistic Interpretability of In-Context Learning Generalization through Structured Task Curriculum — Mentions data efficiency improvement through curriculum learning but the main contribution is mechan
- 1 — Hyden: A Hybrid Dual-Path Encoder for Monocular Geometry of High-resolution Images — The paper emphasizes inference efficiency and uses self-distillation for label generation, but the m
- 1 — KnItLM: Weaving Knowledge into Instruction-Tuned LLMs via Continual Pre-Training and Merging — The paper focuses on knowledge ingestion and model merging, mentioning cost avoidance of instruction
- 1 — Reduce What You Use: Input‑Aware Matrix‑Multiplication Pruning for LLMs — The paper proposes inference-time pruning for matrix multiplication, tangential to Sutro Group's foc
- 1 — Self-Rewarding Rubric-Based Reinforcement Learning for Open-Ended Reasoning — The paper mentions faster and more resource-efficient training via a lightweight framework, but the
- 1 — When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models — The paper studies post-training compression (quantization, pruning, distillation) effects on reasoni
- 1 — On Computational Limits and Provably Efficient Criteria of Visual Autoregressive Models: A Fine-Grained Complexity Analysis — Mentions efficiency angle (sub-quadratic time for VAR models) but main contribution is a fine-graine
- 1 — Point-MoE: Large-Scale Multi-Dataset Training with Mixture-of-Experts for 3D Semantic Segmentation — Uses sparse mixture-of-experts (sparsity) but primarily addresses multi-dataset 3D segmentation scal
- 1 — Diversity of Transformer Layers: One Aspect of Parameter Scaling Laws — Paper analyzes layer diversity and parameter scaling laws from an interpretability perspective, tang
- 1 — Precise Attribute Intensity Control in Large Language Models via Targeted Representation Editing — The paper focuses on controlled text generation via representation editing, with efficiency mentione
- 1 — Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs — Mentions quantization as a potential model substitution but the primary focus is on API integrity au
- 1 — HT-Sparse: Training-Free Query-Guided Head–Token Sparsification for Long-Video Multimodal Inference — The paper proposes training-free inference sparsity for long-video multimodal models, touching on sp
- 1 — DeCo-DETR: Decoupled Cognition DETR for efficient Open-Vocabulary Object Detection — Tangential: mentions efficiency by replacing costly text encoders, but the main focus is on decouple
- 1 — DC-LLM: HARDWARE-FRIENDLY LLM WEIGHT COMPRESSION VIA DYNAMIC LINEAR COMBINATION — Focuses on inference weight compression and data movement, not training efficiency; mentions low-pre
- 1 — Textual Steering Vectors Can Improve Visual Understanding in Multimodal Large Language Models — The paper mentions minimal computational overhead but focuses on interpretability and steering for m
- 1 — Look Back to Move Forward: Delay-Aware Instance Selection for Online Continual Learning — The paper reduces training budget via selective instance replay, a form of training efficiency, but
- 1 — FlexTraj: Image-to-Video Generation with Flexible Point Trajectory Control — The paper mentions efficiency in convergence and inference as a minor benefit of its sequence-concat
- 1 — Remaining-data-free Machine Unlearning by Suppressing Sample Contribution — Mentions efficiency in unlearning but does not address training efficiency, data movement, sparsity,
- 1 — PartInfer: Enabling LLM Inference On Edge Devices — The paper focuses on LLM inference efficiency via neuron-level sparsity for edge devices, which is t
- 1 — RCStat: A Statistical Framework of Relative Contextualization in Transformers — The paper addresses inference-time efficiency via KV-cache compression, which is tangential to the g
- 1 — LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution — The paper improves inference efficiency through linear attention but does not primarily address ener
- 1 — DenseMixer: Improving MoE Post-Training with Precise Router Gradient — Mentions MoE sparsity and training efficiency indirectly, but the main contribution is a refined rou
- 1 — Cognitive Alignment in Personality Reasoning: Leveraging Prototype Theory for MBTI Inference — Uses LoRA for parameter-efficient fine-tuning, but the main contribution is cognitively aligned infe
- 1 — Simple yet Effective Semi-supervised Knowledge Distillation from Vision-Language Models via Dual-Head Optimization — Mentions distillation and minimal overhead, but the main contribution is accuracy improvement in sem
- 1 — QuantSparse: Comprehensively Compressing Video Diffusion Transformer with Model Quantization and Attention Sparsification — The paper focuses on compressing video diffusion transformers for inference using quantization and a
- 1 — Exploring Redundancy and Shared Representations for Transformer Models Optimization — The paper touches on efficiency via model compression but does not directly advance energy-efficient
- 1 — Spectral-Aware Sparse Communication and Entropy-Balanced Tasking in Multi-Agent Systems — The paper addresses communication sparsification and energy reduction in multi-agent coordination, n
- 1 — Training-time Selection of Linear Vs. Softmax Attention in Layer-based Hybrid Transformers — The paper addresses inference-time memory efficiency (KV-cache reduction) via layer selection during
- 1 — DLM-One: Diffusion Language Models for One-Step Sequence Generation — The paper uses score distillation to achieve efficient one-step inference, which tangentially relate
- 1 — Surrogate Modeling of 3D Rayleigh-Bénard Convection with Equivariant Autoencoders — The paper focuses on sample and parameter efficiency for surrogate modeling, tangential to the Sutro
- 1 — LUCID-3D: A Lightweight and Compatible Framework for Unified 3D Understanding and Generation — The paper mentions reducing training cost by leveraging pretrained models as a secondary benefit, bu
- 1 — HybridCoT: Interleaving Latent and Text Chain-of-Thought for Efficient Reasoning — The paper primarily improves inference efficiency through latent reasoning, with only incidental men
- 1 — Lookahead Tree-Based Rollouts for Enhanced Trajectory-Level Exploration in Reinforcement Learning with Verifiable Rewards — The paper focuses on improving trajectory diversity in RL for LLMs, claiming acceleration in policy
- 1 — LoRA-DA: Data-Aware Initialization for Low-Rank Adaptation via Asymptotic Analysis — The paper improves LoRA initialization to potentially reduce training steps, indirectly enhancing tr
- 1 — Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache — Addresses memory efficiency of KV cache in inference, tangentially related to data movement and hard
- 1 — Enhancing LLMs for Knowledge Base Question Answering by Chain-of-Decomposition — Mentions efficient fine-tuning and reduced LLM calls via task decomposition, but training efficiency
- 1 — Trading Complexity for Expressivity: Theoretical Exploration of Linear and Causal Token Mixing Strategies — Mentions decoding speed and cache size as design tradeoffs, but the main contribution is a theoretic
- 1 — TEL: A Thermodynamics-Inspired Layer for Adaptive, and Efficient Neural Learning — Efficiency is mentioned as a property (minimal overhead, fixed compute budget) but the core contribu
- 1 — BaNEL: Exploration Posteriors for Generative Modeling Using Only Negative Rewards — The paper addresses efficiency in terms of reducing reward evaluations, which is tangential to Sutro
- 1 — Decoupling of Experts: A Knowledge-Driven Architecture for Efficient LLMs — The paper mentions efficiency in scaling but its main contribution is a knowledge-driven architectur
- 1 — VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning — The Chain-of-LoRA mechanism for efficient role switching is a minor inference efficiency aspect, but
- 1 — Streaming Visual Geometry Transformer — The paper uses distillation for training and FlashAttention for inference, which touches on efficien
- 1 — Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference — The paper addresses data movement and sparsity in KV cache eviction for inference, which is tangenti
- 1 — One-Shot Multi-Label Causal Discovery in High-Dimensional Event Sequences — The paper mentions efficient parallelized causal discovery on GPUs as a secondary benefit, but the c
- 1 — Slimming the Giant: Efficient Structured Pruning for Adapter-Tuned SAM — Paper uses structured pruning for inference-time compression and latency gains, not for improving tr
- 1 — Scaling Weisfeiler–Leman Expressiveness Analysis to Massive Graphs with GPUs — The paper accelerates a graph algorithm on GPUs but does not focus on energy-efficient AI training,
- 1 — RS-MoE: Collaborative Compression for Mixture-of-Experts LLMs based on Low-Rank and Sparse Approximation — The paper addresses post-training compression using low-rank and sparse approximation, not directly
- 1 — Vulnerability-Aware Parameter-Efficient Fine-Tuning for Enhanced Adversarial Robustness — The paper uses parameter-efficient fine-tuning (PEFT) for adversarial robustness, but the main contr
- 1 — Are EEG Foundation Models Worth It? Comparative Evaluation with Traditional Decoders in Diverse BCI Tasks — Mentions scaling laws tangentially but primarily benchmarks EEG foundation models, not advancing Sut
- 1 — LoRA in the Right Place: Which Block to Tune in Parameter-Efficient Fine-Tuning? — The paper focuses on parameter-efficient fine-tuning placement for improving adaptation performance,
- 1 — Sampling Complexity of TD and PPO in RKHS — The paper discusses sample efficiency but its main contribution is theoretical convergence analysis,
- 1 — A Brain-Inspired Gating Mechanism Unlocks Robust Computation in Spiking Neural Networks — The paper mentions SNNs as energy-efficient but focuses on noise robustness via a biologically-inspi
- 1 — A Mathematical Framework for the Hierarchical Analysis of Neural Networks — Tangentially relevant via model compression, but the core contribution is a mathematical framework f
- 1 — DTP: A Simple yet Effective Distracting Token Pruning Framework for Vision-Language Action Models — The paper proposes token pruning (a form of sparsity) but focuses on inference-time performance impr
- 1 — Spiking Graph Predictive Coding — Efficiency is mentioned as a side benefit of event-driven spiking computation, but the paper's main
- 1 — DBLP: Noise Bridge Consistency Distillation For Efficient And Reliable Adversarial Purification — Mentions fast inference via distillation, but the main contribution is adversarial purification not
- 1 — BlindSight: Harnessing Sparsity for Efficient Vision-Language Models — The paper leverages sparsity and a hardware-aware kernel for inference optimization, not training, m
- 1 — DistillMatch: Leveraging Knowledge Distillation from Vision Foundation Model for Multimodal Image Matching — The paper uses knowledge distillation to transfer features from a vision foundation model, resulting
- 1 — Scaling Law for Catastrophic Forgetting via Gradient Products — The paper studies scaling laws for catastrophic forgetting, which is tangentially related to the gro
- 1 — Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers — The paper focuses on stabilizing RL training in MoE models by aligning routers, mentioning training
- 1 — SCAR: Shapley Credit Assignment for More Efficient RLHF — The paper improves RLHF training efficiency through dense rewards and faster convergence, but its ma
- 1 — DomED: Redesigning Ensemble Distillation for Domain Generalization — Mentions computational cost reduction via tailored data allocation but main contribution is domain g
- 1 — TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling — The paper uses LoRA for parameter-efficient fine-tuning and reduces sequence length via tokenization
- 1 — IOMM: Fast Pre-training of Unified Multimodal Models without Text-Image Pairs — The paper mentions training efficiency (fast pre-training, reduced GPU hours) but its main contribut
- 1 — Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms — Paper analyzes sparsity and expert utilization in MoE models, tangential to energy-efficient trainin
- 1 — Conceptrol: Concept Control of Zero-shot Personalized Image Generation — Mentions no computational overhead but primarily focuses on personalized image generation control, n
- 1 — A Learn-to-Optimize Approach for Coordinate-Wise Step Sizes for Quasi-Newton Methods — The paper improves optimizer step sizes for faster convergence, indirectly reducing training time an
- 1 — NoLoRA: Nonlinear Low-Rank Adaptation for Parameter-Efficient Fine-Tuning — The paper focuses on improving fine-tuning expressiveness via nonlinear low-rank adaptation, mention
- 1 — Routing-Deconstructed LoRA in Federated Fine-Tuning — Paper focuses on federated LoRA with a secondary mention of reducing communication cost via alternat
- 1 — Learning What Matters: Prioritized Concept Learning via Relative Error-driven Sample Selection — The paper proposes a sample selection curriculum to improve data and compute efficiency, which is ta
- 1 — DeCoP: Enhancing Self-Supervised Time Series Representation with Dependency Controlled Pre-training — The paper reduces FLOPs as a side benefit, but its core contribution is time-series representation l
- 1 — BoundaryDPT: Pushing the Boundaries of Depth Pruning for Vision Transformers — The paper focuses on inference-time speedup via depth pruning, not energy-efficient training or data
- 1 — QueryStream: Advancing Streaming Video Understanding with Query-Aware Pruning and Proactive Response — The paper focuses on inference-time token pruning for streaming video, not training efficiency, thus
- 1 — LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training — The paper addresses data-efficient scaling for agent training via synthetic data, tangential to the
- 1 — PRKV:Page Restruct KV Cache for High Accuracy and Efficiency LLM Generation — The paper focuses on inference efficiency through KV cache optimization, tangentially related to dat
- 1 — Unveiling the Scaling Law of PINNs under Non-Euclidean Geometry — The paper addresses optimization scaling challenges for PINNs, which tangentially relates to trainin
- 1 — Critical attention scaling in long-context transformers — The paper addresses attention scaling for long contexts but does not focus on training efficiency, d
- 1 — Qronos: Correcting the Past by Shaping the Future... in Post-Training Quantization — While quantization is a named priority, the paper focuses on post-training quantization for inferenc
- 1 — Temporal superposition and feature geometry of RNNs under memory demands — The paper studies representational geometry and sparsity in RNNs under memory constraints, tangentia
- 1 — N-Gram Induction Heads for In-Context RL: Improving Stability and Reducing Data Needs — The paper's main contribution is improving in-context RL with n-gram induction heads, tangentially m
- 1 — Prompt, Predict, Correct: LLM-TrajEcho for Closed-Loop Trajectory Forecasting via Online Prompt Feedback — The paper uses LoRA for parameter-efficient fine-tuning, which marginally relates to training effici
- 1 — MEDSPIKEFORMER: All Neurons Matter for Medical Image Segmentation — Tangentially mentions energy efficiency of spiking neural networks, but the main contribution is imp
- 1 — FLoRA-NA: Nearly Accurate Aggregation for Federated Low-Rank Adaptation — The paper addresses communication efficiency in federated learning, tangentially relevant to data mo