Papers¶

Score 3¶

3 — ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality — Directly advances scaling laws for efficient multilingual training, optimizing compute and data allo
3 — ENERGYLLM-BENCH:AREPRODUCIBLEBENCHMARKFORENERGYAND CARBONFOOTPRINTOFLARGELANGUAGEMODELS — The paper introduces a reproducible benchmark for energy-efficient LLM evaluation, including low-pre
3 — Scaling Laws for Fully Sparsely-Activated Large Language Models — The paper directly investigates sparsity in LLMs, deriving scaling laws for fully sparsely-activated
3 — Pruning with Occam's Razor — The paper directly advances energy-efficient training by integrating pruning with gradient descent t
3 — Understanding Dataset Distillation via Spectral Filtering — Directly advances energy-efficient training through dataset distillation, which reduces data require
3 — Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention — Directly addresses low-precision training instability and proposes a fix, aligning with Sutro Group'
3 — Efficient Resource-Constrained Training of Transformers via Subspace Optimization — Directly advances energy-efficient training by reducing memory and compute via subspace optimization
3 — Toward Bit-Efficient Dataset Condensation: A General Framework — The paper introduces a low-precision quantization method for dataset condensation, directly reducing
3 — Quantization with Purpose: Loss-Aware Bit Allocation for Gradient Compression — Directly advances energy-efficient training by reducing data movement via loss-aware gradient quanti
3 — ESSA: Evolutionary Strategies for Scalable Alignment — The paper introduces a gradient-free, hardware-friendly alignment method using evolutionary strategi
3 — Dataset Distillation via Committee Voting — Directly advances efficient training through dataset distillation, a core Sutro Group priority, redu
3 — Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth — The paper directly advances low-precision/quantization and memory-efficient fine-tuning, aligning wi
3 — Efficient Fine-tuning with Decomposed Foundation Model — Directly advances memory- and compute-efficient fine-tuning through model decomposition, integrating
3 — Regularization can make diffusion models more efficient — The paper leverages sparsity to reduce computational complexity of diffusion models, directly aligni
3 — Mitigating Non-IID Drift in Zeroth-Order Federated LLM Fine-Tuning with Transferable Sparsity — Directly advances sparsity and communication efficiency (data movement) in federated LLM fine-tuning
3 — Fantastic Pretraining Optimizers and Where to Find Them — The paper directly evaluates optimizers for pretraining efficiency, a named Sutro Group priority, ai
3 — Rapid Training of Hamiltonian Graph Networks Using Random Features — Proposes a gradient-descent-free training method achieving 150-600x speedup, directly advancing trai
3 — HiDivDrop: Vision Token Reduction in MLLMs via Late Injection and Differentiable Top-K — The paper directly advances energy-efficient training by sparsifying visual tokens, reducing data mo
3 — The Markovian Thinker — The paper directly reduces data movement and memory footprint for reasoning models via a Markovian c
3 — Scaling with Collapse: Efficient and Predictable Training of LLM Families — Directly advances compute-efficient training and scaling laws, a named Sutro priority, by showing lo
3 — Rethinking JEPA: Compute‑Efficient Video Self-Supervised Learning with Frozen Teachers — Directly improves training compute efficiency via a decoupled two-stage method, achieving better per
3 — Shadow loss: Memory-linear deep metric learning with anchor projection — The paper directly advances energy-efficient training by reducing memory buffer from O(S*D) to O(S)
3 — Towards Distributed Neural Architectures — Directly advances sparsity and data movement efficiency via learned dynamic routing and compute allo
3 — MAGNET: Multi-granular Adaptive Gradient-guided Knowledge Distillation for Pareto-Efficient Tuning — Directly advances energy-efficient training via knowledge distillation with gradient-guided sparsity
3 — LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts — The paper directly advances energy-efficient training by introducing a dynamic routing mechanism tha
3 — Textual Equilibrium Propagation for Deep Compound AI Systems — Proposes a local learning method inspired by biologically-plausible Equilibrium Propagation, directl
3 — Dynamic Rank Adjustment for Accurate and Efficient Neural Network Training — Directly advances energy-efficient training by reducing computational cost through low-rank training
3 — FLARE: Fast Low-rank Attention Routing Engine — Proposes an efficient low-rank attention mechanism that reduces compute and memory, avoids materiali
3 — MT-DAO: Multi-Timescale Distributed Adaptive Optimizers with Local Updates — Directly tackles communication efficiency by reducing data movement in distributed training of large
3 — Quantize-then-Rectify: Accelerating VQ-VAE Training in Latent Feature Space — The paper directly advances energy-efficient training by dramatically reducing VQ-VAE training cost
3 — Stochastic Layer-wise Learning: Scalable and Efficient Alternative to Backpropagation — Proposes a local learning method that eliminates backpropagation, reducing memory and data movement,
3 — DES-LOC: Desynced Low Communication Adaptive Optimizers for Foundation Models — The paper directly advances energy-efficient training by reducing communication costs in distributed
3 — A Recovery Guarantee for Sparse Neural Networks — The paper provides a sparse recovery guarantee for neural networks using iterative hard thresholding
3 — Online Pseudo-Zeroth-Order Training of Neuromorphic Spiking Neural Networks — Directly advances biologically-plausible local learning and neuromorphic hardware-friendly training,
3 — Mutual Information Preserving Neural Network Pruning — Proposes a sparsity method (pruning) to reduce neural network resource requirements, directly aligne
3 — SMixer: Rethinking Efficient-Training and Event-Driven SNNs — Directly advances energy-efficient training of spiking neural networks through sparsity (spatial-tem
3 — Winner-Take-All Spiking Transformer for Language Modeling — The paper directly advances energy-efficient AI via sparse, spike-driven, softmax-free spiking trans
3 — NorMuon: Making Muon more efficient and scalable — Directly advances training efficiency and optimizer design, with a distributed implementation that a
3 — AMiD: Knowledge Distillation for LLMs with $\alpha$-mixture Assistant Distribution — The paper introduces a knowledge distillation framework to reduce computational and memory costs of
3 — MoE-PHDS: One MoE checkpoint for flexible runtime sparsity — Directly addresses sparsity and training efficiency by enabling a single MoE checkpoint to serve mul

Score 2¶

2 — Larger Datasets Can Be Repeated More: A Theoretical Analysis of Multi-Epoch Scaling in Linear Regression — The paper analyzes multi-epoch training efficiency, quantifying when data reuse is effective, which
2 — Modular Distillation Makes Small Models Think Like Big Ones — The paper's main contribution is a modular distillation framework that improves computing efficiency
2 — Learning-Domain Decomposition: Interpreting Training Dynamics via Loss Vectors — The paper's data pruning method, enabling training with 5% of data, directly contributes to training
2 — Curriculum-Guided Layer Scaling for Language Model Pretraining — The paper's primary contribution is a compute-efficient pretraining method that reduces resource con
2 — DSA: Efficient Inference For Video Generation Models via Distributed Sparse Attention — The paper introduces sparse attention as a core method for efficient inference, directly advancing s
2 — UNIFIED MULTI-TEACHER DISTILLATION ACROSS HYBRID NEURAL ARCHITECTURES — The paper's main contribution is a multi-teacher distillation method that reduces training data requ
2 — Learning a Zeroth-Order Optimizer for Fine-Tuning LLMs — The paper proposes a learned zeroth-order optimizer to reduce memory consumption during LLM fine-tun
2 — Matrix-Free Least Squares Solvers: Values, Gradients, and What to Do With Them — The paper makes sparsity (enforcing weight sparsity on a 50M parameter model) a main contribution, d
2 — Critique-Guided Distillation for Efficient and Robust Language Model Reasoning — The paper proposes a knowledge distillation method that significantly reduces compute requirements (
2 — Boost Post-Training Quantization via Null Space Optimization for Large Language Models — The paper focuses on post-training quantization for LLMs, aligning with the Sutro Group's interest i
2 — HOBA: Higher-Order Block-Diagonal Attention Unrolling for Transformer — The paper's main contribution is a sparse attention mechanism (block-diagonal) that reduces computat
2 — Data-Efficient Training by Evolved Sampling — The paper's main contribution is dynamic data selection for training acceleration, a form of trainin
2 — EAST: Early Action Prediction Sampling Strategy with Token Masking — Token masking procedure cuts memory usage and accelerates training, making training efficiency a mai
2 — LoRA Meets Second-Order Optimization: Towards Optimal Low-Rank Updates — Proposes a second-order optimizer for low-rank fine-tuning that improves convergence and reduces tra
2 — Catalyst: Reveal the Geometry of Pruning by Reshaping Neural Network — The paper introduces a novel structured pruning regularization that is a core sparsity technique, di
2 — MSAVQ: Multi-dimensional Sensitivity-Aware Vector Quantization for Ultra-Low-Bit Vision-Language Models — Quantization is a main contribution, aligning with the low-precision interest, but the paper targets
2 — Boomerang Distillation Enables Zero-Shot Model Size Interpolation — The paper's main contribution is a distillation method that reduces training cost by interpolating m
2 — NIRVANA: Structured Pruning Reimagined for Large Language Models Compression — The paper's main contribution is structured pruning, directly related to the sparsity priority area
2 — Generalization and Scaling Laws for Mixture-of-Experts Transformers — The paper's main contribution is theoretical scaling laws and generalization bounds for sparse (MoE)
2 — Listens like Mel: Boosting Latent Audio Diffusion with Channel Locality — Faster convergence directly improves training efficiency, making efficiency a main contribution, tho
2 — AlignPrune: Robust Dynamic Data Pruning through Loss Trajectory Alignment — The paper advances dynamic data pruning, a technique for efficient training by reducing data usage,
2 — Attention and Compression is all you need for Controllably Efficient Language Models — The paper's main contribution is an efficient architecture that reduces compute and memory via compr
2 — CoDA: From Text-to-Image Diffusion Models to Training-Free Dataset Distillation — The paper's main contribution is a training-free dataset distillation method that eliminates the pro
2 — Asynchronous Matching with Dynamic Sampling for Multimodal Dataset Distillation — The paper advances dataset distillation, a technique to reduce training data size and improve effici
2 — Exploring Knowledge Purification in Multi-Teacher Knowledge Distillation for LLMs — The paper proposes knowledge purification to reduce resource demands in multi-teacher distillation,

Score 1¶

1 — PersonalQ: Select, Quantize, and Serve Personalized Diffusion Models for Efficient Inference — The paper focuses on inference-time quantization for personalized diffusion models, tangentially rel
1 — When to Retrain after Drift: A Data-Only Test of Post-Drift Data Size Sufficiency — The paper mentions low per-update time and memory overhead, but its main contribution is a data-suff
1 — LM-mixup: Text Data Augmentation via Language Model based Mixup — The paper addresses data efficiency through instruction distillation, which tangentially relates to
1 — SMARAN: Closing the Generalization Gap with Performance Driven Optimization Method — The paper proposes an optimizer that adjusts learning rate based on performance, which could offer m
1 — TiTok: Transfer Token-level Knowledge via Contrastive Excess to Transplant LoRA — The paper reduces overhead by avoiding a discriminator, but its main focus is knowledge transfer for
1 — LayerDecompose: Exploring weight sharing for Large Language Model Compression — Focuses on post-training compression for deployment, not energy-efficient training or the Sutro Grou
1 — Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs — The paper focuses on inference efficiency (throughput) rather than training energy, and while it use
1 — FitLight: Federated Imitation Learning for Plug-and-Play Autonomous Traffic Signal Control — Mentions model pruning for resource-constrained deployment, but the main contribution is federated i
1 — ECMNet: Lightweight Semantic Segmentation with Efficient CNN-Mamba Network — The paper proposes a lightweight segmentation model focusing on inference efficiency (parameter coun
1 — Disentangling Token Dependencies for Efficient Decoding in Diffusion Language Models — The paper focuses on inference efficiency for diffusion language models via knowledge distillation,
1 — MARS: Mamba-driven Adaptive Reordering Scheme for Semantic Occupancy Prediction in Autonomous Driving — The paper focuses on a task-specific architecture for autonomous driving that reduces memory usage a
1 — Energy Efficient Language Models through Dynamic Sparsity — Paper focuses on inference efficiency through activation sparsity and quantization for deployment, n
1 — LoRA: The Past, Present, and Future — The paper focuses on parameter-efficient fine-tuning (LoRA variants), tangentially relevant as it me
1 — Training-Free Determination of Network Width via Neural Tangent Kernel — The paper addresses efficient model sizing to avoid overparameterization, which indirectly reduces t
1 — Self-Correction via Task Distillation — The paper uses task distillation to improve self-correction, which incidentally reduces fine-tuning
1 — One Stone Three Birds: Training-free Core-context-aware Attention for Efficient LLM Prefilling, Decoding, and KV Caching — The paper focuses on inference-time efficiency via training-free sparse attention, tangential to the
1 — Learning linear state-space models with sparse system matrices — The paper focuses on sparsity in linear state-space models for system identification, which is tange
1 — PTQTP: Post-Training Quantization to Trit-Planes for Large Language Models — The paper focuses on post-training quantization for efficient inference, not on energy-efficient tra
1 — Vision as LoRA — Mentions efficiency via LoRA merging and distillation for training acceleration, but main contributi
1 — LeSTD: LLM Compression via Learning-based Sparse Tensor Decomposition — The paper focuses on post-training compression using sparsity, which is tangential to Sutro Group's
1 — Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning — The paper mentions computational efficiency of critic training as a motivation for using lightweight
1 — Vulcan: Crafting Compact Class-Specific Vision Transformers For Edge Intelligence — The paper uses structured pruning for post-training model compression, touching on sparsity but not
1 — BDQ: Bidirectional Diagonal Quantization for LLMs — The paper focuses on post-training quantization for inference efficiency, not on energy-efficient tr
1 — Representation Finetuning for Continual Learning — The paper proposes a parameter-efficient finetuning method for continual learning, tangentially rela
1 — Analyzing and Internalizing Complex Policy Documents for LLM Agents — Tangential: addresses inference efficiency via policy internalization, not energy-efficient training
1 — Semantic Uncertainty Quantification of Hallucinations in LLMs: A Quantum Tensor Network Based Method — The paper touches on efficiency by evaluating robustness under quantization for resource-constrained
1 — Emergent Discrete Controller Modules for Symbolic Planning in Transformers — Mentions a small FLOPs overhead and sparse application, but the main focus is on symbolic planning w
1 — PUM-Net: Plastic Unified Memory Network with Associative Interaction for Long-Context State Space Models — Tangential: mentions training cost reduction by avoiding sequence length inflation, but the main con
1 — You Do Not Fully Utilize Transformer's Representation Capacity — The paper mentions efficiency gains (lower perplexity per FLOP) but focuses on representation capaci
1 — FastALM: Hierarchical Frame Q-Former for Effective Audio Modality Adaptation — The paper focuses on efficient inference for audio-language models by compressing speech features, w
1 — A Separable Self-attention Inspired by the State Space Model for Computer Vision — The paper proposes an efficient separable self-attention with linear complexity, but its main contri
1 — Fast-dLLM v2: Efficient Block-Diffusion LLM — The paper primarily contributes to inference efficiency through block diffusion and hierarchical cac
1 — ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference — The paper focuses on post-training quantization for efficient LLM inference, not on training efficie
1 — Scalable Variational Bayesian Fine-Tuning of LLMs via Orthogonalized Low-Rank Adapter — The paper uses parameter-efficient fine-tuning (LoRA variant) but its main contribution is uncertain
1 — Generalised Flow Maps for Few-Step Generative Modelling on Riemannian Manifolds — The paper aims to improve inference-time efficiency (few-step sampling) rather than training efficie
1 — Personalization Under Value Conflict: Resolving Contradictory Preferences with Paired Fine-Tuning — The paper mentions reduced data requirements as a secondary benefit, but the main contribution is pe
1 — AnyDepth: Depth Estimation Made Easy — The paper proposes a lightweight decoder and data filtering for depth estimation, touching on effici
1 — Entropy-Select: Training-Free Local Entropy Token Compression for Video LLMs — Tangential: the paper proposes token compression for inference efficiency in video LLMs, which touch
1 — CoKV: Optimizing LLM Inference with Game-Theoretic Adaptive KV Cache — The paper addresses inference memory efficiency via KV cache optimization, which is tangential to th
1 — TSDINO: Teacher–Student Self-Distillation Framework for Robust Pre-training of Time-Series Foundation Models — The paper uses self-distillation for pre-training but does not focus on energy efficiency or data mo
1 — Think Twice, Act Once: Token-Aware Compression and Action Reuse for Efficient Inference in Vision-Language-Action Models — Addresses inference efficiency via token pruning and action reuse, not training efficiency or any Su
1 — DeFake: Data-Efficient Adaptation for Generalized Deepfake Detection — The paper focuses on data-efficient adaptation (few-shot learning for deepfake detection), which is
1 — DoRAN: Stabilizing Weight-Decomposed Low-Rank Adaptation via Noise Injection and Auxiliary Networks — Focuses on parameter-efficient fine-tuning stabilization and sample efficiency, tangentially related
1 — Eliminating VAE for Fast and High-Resolution Generative Detail Restoration — The paper's primary contribution is inference acceleration and memory reduction via VAE elimination
1 — pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation — The paper uses distillation to accelerate inference via few-step generation, which touches on the li
1 — Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance — MoE is efficiency-related, but the paper focuses on improving expert specialization for vision, not
1 — KAN or MLP? Point Cloud Shows the Way Forward — The paper introduces an efficient variant of KAN to reduce parameters and computational cost, but it
1 — Partial-Correlation Learning for Large Language Models with Skip-Tuning — Skip-Tuning reduces fine-tuning data by using noncontiguous segments, potentially offering efficienc
1 — LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs — Paper focuses on post-training quantization for inference efficiency, which is tangentially related
1 — KDP: Simplifying Representation Dynamics in Kernel Space — The paper proposes a model compression method via layer pruning, which yields inference efficiency g
1 — TreeSNNs: Temporal Resolution Ensembled SNNs for Neuromorphic Action Recognition — Mentions SNN energy efficiency as motivation but the main contribution is accuracy improvement via t
1 — Mechanistic Interpretability of In-Context Learning Generalization through Structured Task Curriculum — Mentions data efficiency improvement through curriculum learning but the main contribution is mechan
1 — Hyden: A Hybrid Dual-Path Encoder for Monocular Geometry of High-resolution Images — The paper emphasizes inference efficiency and uses self-distillation for label generation, but the m
1 — KnItLM: Weaving Knowledge into Instruction-Tuned LLMs via Continual Pre-Training and Merging — The paper focuses on knowledge ingestion and model merging, mentioning cost avoidance of instruction
1 — Reduce What You Use: Input‑Aware Matrix‑Multiplication Pruning for LLMs — The paper proposes inference-time pruning for matrix multiplication, tangential to Sutro Group's foc
1 — Self-Rewarding Rubric-Based Reinforcement Learning for Open-Ended Reasoning — The paper mentions faster and more resource-efficient training via a lightweight framework, but the
1 — When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models — The paper studies post-training compression (quantization, pruning, distillation) effects on reasoni
1 — On Computational Limits and Provably Efficient Criteria of Visual Autoregressive Models: A Fine-Grained Complexity Analysis — Mentions efficiency angle (sub-quadratic time for VAR models) but main contribution is a fine-graine
1 — Point-MoE: Large-Scale Multi-Dataset Training with Mixture-of-Experts for 3D Semantic Segmentation — Uses sparse mixture-of-experts (sparsity) but primarily addresses multi-dataset 3D segmentation scal
1 — Diversity of Transformer Layers: One Aspect of Parameter Scaling Laws — Paper analyzes layer diversity and parameter scaling laws from an interpretability perspective, tang
1 — Precise Attribute Intensity Control in Large Language Models via Targeted Representation Editing — The paper focuses on controlled text generation via representation editing, with efficiency mentione
1 — Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs — Mentions quantization as a potential model substitution but the primary focus is on API integrity au
1 — HT-Sparse: Training-Free Query-Guided Head–Token Sparsification for Long-Video Multimodal Inference — The paper proposes training-free inference sparsity for long-video multimodal models, touching on sp
1 — DeCo-DETR: Decoupled Cognition DETR for efficient Open-Vocabulary Object Detection — Tangential: mentions efficiency by replacing costly text encoders, but the main focus is on decouple
1 — DC-LLM: HARDWARE-FRIENDLY LLM WEIGHT COMPRESSION VIA DYNAMIC LINEAR COMBINATION — Focuses on inference weight compression and data movement, not training efficiency; mentions low-pre
1 — Textual Steering Vectors Can Improve Visual Understanding in Multimodal Large Language Models — The paper mentions minimal computational overhead but focuses on interpretability and steering for m
1 — Look Back to Move Forward: Delay-Aware Instance Selection for Online Continual Learning — The paper reduces training budget via selective instance replay, a form of training efficiency, but
1 — FlexTraj: Image-to-Video Generation with Flexible Point Trajectory Control — The paper mentions efficiency in convergence and inference as a minor benefit of its sequence-concat
1 — Remaining-data-free Machine Unlearning by Suppressing Sample Contribution — Mentions efficiency in unlearning but does not address training efficiency, data movement, sparsity,
1 — PartInfer: Enabling LLM Inference On Edge Devices — The paper focuses on LLM inference efficiency via neuron-level sparsity for edge devices, which is t
1 — RCStat: A Statistical Framework of Relative Contextualization in Transformers — The paper addresses inference-time efficiency via KV-cache compression, which is tangential to the g
1 — LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution — The paper improves inference efficiency through linear attention but does not primarily address ener
1 — DenseMixer: Improving MoE Post-Training with Precise Router Gradient — Mentions MoE sparsity and training efficiency indirectly, but the main contribution is a refined rou
1 — Cognitive Alignment in Personality Reasoning: Leveraging Prototype Theory for MBTI Inference — Uses LoRA for parameter-efficient fine-tuning, but the main contribution is cognitively aligned infe
1 — Simple yet Effective Semi-supervised Knowledge Distillation from Vision-Language Models via Dual-Head Optimization — Mentions distillation and minimal overhead, but the main contribution is accuracy improvement in sem
1 — QuantSparse: Comprehensively Compressing Video Diffusion Transformer with Model Quantization and Attention Sparsification — The paper focuses on compressing video diffusion transformers for inference using quantization and a
1 — Exploring Redundancy and Shared Representations for Transformer Models Optimization — The paper touches on efficiency via model compression but does not directly advance energy-efficient
1 — Spectral-Aware Sparse Communication and Entropy-Balanced Tasking in Multi-Agent Systems — The paper addresses communication sparsification and energy reduction in multi-agent coordination, n
1 — Training-time Selection of Linear Vs. Softmax Attention in Layer-based Hybrid Transformers — The paper addresses inference-time memory efficiency (KV-cache reduction) via layer selection during
1 — DLM-One: Diffusion Language Models for One-Step Sequence Generation — The paper uses score distillation to achieve efficient one-step inference, which tangentially relate
1 — Surrogate Modeling of 3D Rayleigh-Bénard Convection with Equivariant Autoencoders — The paper focuses on sample and parameter efficiency for surrogate modeling, tangential to the Sutro
1 — LUCID-3D: A Lightweight and Compatible Framework for Unified 3D Understanding and Generation — The paper mentions reducing training cost by leveraging pretrained models as a secondary benefit, bu
1 — HybridCoT: Interleaving Latent and Text Chain-of-Thought for Efficient Reasoning — The paper primarily improves inference efficiency through latent reasoning, with only incidental men
1 — Lookahead Tree-Based Rollouts for Enhanced Trajectory-Level Exploration in Reinforcement Learning with Verifiable Rewards — The paper focuses on improving trajectory diversity in RL for LLMs, claiming acceleration in policy
1 — LoRA-DA: Data-Aware Initialization for Low-Rank Adaptation via Asymptotic Analysis — The paper improves LoRA initialization to potentially reduce training steps, indirectly enhancing tr
1 — Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache — Addresses memory efficiency of KV cache in inference, tangentially related to data movement and hard
1 — Enhancing LLMs for Knowledge Base Question Answering by Chain-of-Decomposition — Mentions efficient fine-tuning and reduced LLM calls via task decomposition, but training efficiency
1 — Trading Complexity for Expressivity: Theoretical Exploration of Linear and Causal Token Mixing Strategies — Mentions decoding speed and cache size as design tradeoffs, but the main contribution is a theoretic
1 — TEL: A Thermodynamics-Inspired Layer for Adaptive, and Efficient Neural Learning — Efficiency is mentioned as a property (minimal overhead, fixed compute budget) but the core contribu
1 — BaNEL: Exploration Posteriors for Generative Modeling Using Only Negative Rewards — The paper addresses efficiency in terms of reducing reward evaluations, which is tangential to Sutro
1 — Decoupling of Experts: A Knowledge-Driven Architecture for Efficient LLMs — The paper mentions efficiency in scaling but its main contribution is a knowledge-driven architectur
1 — VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning — The Chain-of-LoRA mechanism for efficient role switching is a minor inference efficiency aspect, but
1 — Streaming Visual Geometry Transformer — The paper uses distillation for training and FlashAttention for inference, which touches on efficien
1 — Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference — The paper addresses data movement and sparsity in KV cache eviction for inference, which is tangenti
1 — One-Shot Multi-Label Causal Discovery in High-Dimensional Event Sequences — The paper mentions efficient parallelized causal discovery on GPUs as a secondary benefit, but the c
1 — Slimming the Giant: Efficient Structured Pruning for Adapter-Tuned SAM — Paper uses structured pruning for inference-time compression and latency gains, not for improving tr
1 — Scaling Weisfeiler–Leman Expressiveness Analysis to Massive Graphs with GPUs — The paper accelerates a graph algorithm on GPUs but does not focus on energy-efficient AI training,
1 — RS-MoE: Collaborative Compression for Mixture-of-Experts LLMs based on Low-Rank and Sparse Approximation — The paper addresses post-training compression using low-rank and sparse approximation, not directly
1 — Vulnerability-Aware Parameter-Efficient Fine-Tuning for Enhanced Adversarial Robustness — The paper uses parameter-efficient fine-tuning (PEFT) for adversarial robustness, but the main contr
1 — Are EEG Foundation Models Worth It? Comparative Evaluation with Traditional Decoders in Diverse BCI Tasks — Mentions scaling laws tangentially but primarily benchmarks EEG foundation models, not advancing Sut
1 — LoRA in the Right Place: Which Block to Tune in Parameter-Efficient Fine-Tuning? — The paper focuses on parameter-efficient fine-tuning placement for improving adaptation performance,
1 — Sampling Complexity of TD and PPO in RKHS — The paper discusses sample efficiency but its main contribution is theoretical convergence analysis,
1 — A Brain-Inspired Gating Mechanism Unlocks Robust Computation in Spiking Neural Networks — The paper mentions SNNs as energy-efficient but focuses on noise robustness via a biologically-inspi
1 — A Mathematical Framework for the Hierarchical Analysis of Neural Networks — Tangentially relevant via model compression, but the core contribution is a mathematical framework f
1 — DTP: A Simple yet Effective Distracting Token Pruning Framework for Vision-Language Action Models — The paper proposes token pruning (a form of sparsity) but focuses on inference-time performance impr
1 — Spiking Graph Predictive Coding — Efficiency is mentioned as a side benefit of event-driven spiking computation, but the paper's main
1 — DBLP: Noise Bridge Consistency Distillation For Efficient And Reliable Adversarial Purification — Mentions fast inference via distillation, but the main contribution is adversarial purification not
1 — BlindSight: Harnessing Sparsity for Efficient Vision-Language Models — The paper leverages sparsity and a hardware-aware kernel for inference optimization, not training, m
1 — DistillMatch: Leveraging Knowledge Distillation from Vision Foundation Model for Multimodal Image Matching — The paper uses knowledge distillation to transfer features from a vision foundation model, resulting
1 — Scaling Law for Catastrophic Forgetting via Gradient Products — The paper studies scaling laws for catastrophic forgetting, which is tangentially related to the gro
1 — Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers — The paper focuses on stabilizing RL training in MoE models by aligning routers, mentioning training
1 — SCAR: Shapley Credit Assignment for More Efficient RLHF — The paper improves RLHF training efficiency through dense rewards and faster convergence, but its ma
1 — DomED: Redesigning Ensemble Distillation for Domain Generalization — Mentions computational cost reduction via tailored data allocation but main contribution is domain g
1 — TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling — The paper uses LoRA for parameter-efficient fine-tuning and reduces sequence length via tokenization
1 — IOMM: Fast Pre-training of Unified Multimodal Models without Text-Image Pairs — The paper mentions training efficiency (fast pre-training, reduced GPU hours) but its main contribut
1 — Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms — Paper analyzes sparsity and expert utilization in MoE models, tangential to energy-efficient trainin
1 — Conceptrol: Concept Control of Zero-shot Personalized Image Generation — Mentions no computational overhead but primarily focuses on personalized image generation control, n
1 — A Learn-to-Optimize Approach for Coordinate-Wise Step Sizes for Quasi-Newton Methods — The paper improves optimizer step sizes for faster convergence, indirectly reducing training time an
1 — NoLoRA: Nonlinear Low-Rank Adaptation for Parameter-Efficient Fine-Tuning — The paper focuses on improving fine-tuning expressiveness via nonlinear low-rank adaptation, mention
1 — Routing-Deconstructed LoRA in Federated Fine-Tuning — Paper focuses on federated LoRA with a secondary mention of reducing communication cost via alternat
1 — Learning What Matters: Prioritized Concept Learning via Relative Error-driven Sample Selection — The paper proposes a sample selection curriculum to improve data and compute efficiency, which is ta
1 — DeCoP: Enhancing Self-Supervised Time Series Representation with Dependency Controlled Pre-training — The paper reduces FLOPs as a side benefit, but its core contribution is time-series representation l
1 — BoundaryDPT: Pushing the Boundaries of Depth Pruning for Vision Transformers — The paper focuses on inference-time speedup via depth pruning, not energy-efficient training or data
1 — QueryStream: Advancing Streaming Video Understanding with Query-Aware Pruning and Proactive Response — The paper focuses on inference-time token pruning for streaming video, not training efficiency, thus
1 — LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training — The paper addresses data-efficient scaling for agent training via synthetic data, tangential to the
1 — PRKV:Page Restruct KV Cache for High Accuracy and Efficiency LLM Generation — The paper focuses on inference efficiency through KV cache optimization, tangentially related to dat
1 — Unveiling the Scaling Law of PINNs under Non-Euclidean Geometry — The paper addresses optimization scaling challenges for PINNs, which tangentially relates to trainin
1 — Critical attention scaling in long-context transformers — The paper addresses attention scaling for long contexts but does not focus on training efficiency, d
1 — Qronos: Correcting the Past by Shaping the Future... in Post-Training Quantization — While quantization is a named priority, the paper focuses on post-training quantization for inferenc
1 — Temporal superposition and feature geometry of RNNs under memory demands — The paper studies representational geometry and sparsity in RNNs under memory constraints, tangentia
1 — N-Gram Induction Heads for In-Context RL: Improving Stability and Reducing Data Needs — The paper's main contribution is improving in-context RL with n-gram induction heads, tangentially m
1 — Prompt, Predict, Correct: LLM-TrajEcho for Closed-Loop Trajectory Forecasting via Online Prompt Feedback — The paper uses LoRA for parameter-efficient fine-tuning, which marginally relates to training effici
1 — MEDSPIKEFORMER: All Neurons Matter for Medical Image Segmentation — Tangentially mentions energy efficiency of spiking neural networks, but the main contribution is imp
1 — FLoRA-NA: Nearly Accurate Aggregation for Federated Low-Rank Adaptation — The paper addresses communication efficiency in federated learning, tangentially relevant to data mo