Paper Digest: ICASSP 2026 Papers & Highlights
Note: ICASSP-2026 accepts more than 4,500 papers, this page only includes 500 of them selected by our daily paper digest algorithm. Interested users can choose to read All 4,500 ICASSP-2026 papers in a separate page, which takes quite some time to load.
To search for papers presented at ICASSP-2026 on a specific topic, please make use of the search by venue (ICASSP-2026) service. To summarize the latest research published at ICASSP-2026 on a specific topic, you can utilize the review by venue (ICASSP-2026) service. If you are interested in browsing papers by author, we have a comprehensive list of ~ 5,700 authors (ICASSP-2026).
Since 2018, Paper Digest has built a foundation of data spanning decades of conferences, journals, and research topics. The platform features a daily digest service that sifts through tens of thousands of new papers, clinical trials, news articles, and community posts, filtering the noise to highlight what matters most to specific interests. Beyond daily updates, dozens of built-in research tools streamline the academic workflow, supporting efficient reading and writing, comprehensive literature reviews, and automated research report generation.
Paper Digest Team
New York City, New York, 10017
team@paperdigest.org
TABLE 1: Paper Digest: ICASSP 2026 Papers & Highlights
| Paper | Author(s) | |
|---|---|---|
| 1 | Identifying Birdsong Syllables Without Labelled Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we build the first fully unsupervised algorithm to decompose birdsong recordings into sequences of syllables. |
M. Teng; J. Boussard; D. Rolnick; H. Larochelle; |
| 2 | Quality Enhancement for Anomaly Detection Via Injective Linear Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While deep learning methods, including CNNs and Transformers, have achieved strong performance in this area, their effectiveness degrades on compressed inputs commonly encountered in real-world scenarios due to bandwidth and storage constraints. To address this, we propose an injective linear attention-based quality enhancement framework for anomaly detection. |
Z. Ma; H. R. Tohidypour; P. Nasiopoulos; V. C. M. Leung; |
| 3 | From Token to Line: Enhancing Code Generation with A Long-Term Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This insight suggests that it is reasonable to treat each line of code as a fundamental processing unit and generate them sequentially. Inspired by this, we propose the LSR-MCTS algorithm, which leverages MCTS to determine the code line-by-line and select the optimal path. |
T. Lu; |
| 4 | Wavelet-Aware Anomaly Detection in Multi-Channel User Logs Via Deviation Modulation and Resolution-Adaptive Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These logs are often multi-channel, non-stationary, and anomalies are rare, making anomaly detection challenging. To address these issues, we propose a novel framework that integrates wavelet-aware modulation, multi-resolution wavelet decomposition, and resolution-adaptive attention for robust anomaly detection. |
K. Kong; D. Liu; X. Jin; S. Xu; G. Geng; |
| 5 | DVT-AD: Discriminative Vision Transformers for Scalable Unsupervised Anomaly Detection Via Simple Self-Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce Discriminative Vision Transformers for Scalable Unsupervised Anomaly Detection via Simple Self-Distillation (DVT-AD), a simple yet highly effective self-distillation framework. |
M. Wong; C. A. Da Costa Filho; G. Munro; O. Dukor; A. Judi; M. Lawson; |
| 6 | Refgen: Reference-Guided Synthetic Data Generation for Anomalous Sound Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Building upon recent advances in text-to-audio (TTA) generation, we propose RefGEN, a reference-guided generative framework that introduces two key innovations. |
W. Liang; |
| 7 | An AMP-Based Asymptotic Analysis for Nonlinear One-Bit Precoding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The considered scheme employs a convex-relaxation-then-quantization (CRQ) approach to the well-known minimum mean square error (MMSE) model, which includes the classical one-bit precoder SQUID as a special case. To analyze its asymptotic behavior, we develop a novel analytical framework based on approximate message passing (AMP). |
Z. Wu; J. Ma; Y. -F. Liu; B. Clerckx; |
| 8 | Channel Estimation for Holographic MIMO Systems with Mutual Coupling Awareness Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In contrast to traditional MIMO systems, the Toeplitz structure of MC matrix does not persist in HMIMO systems due to the more intricate impedance characteristics, leading to a substantial increase in the number of parameters to be estimated. To address this issue, we propose an approximate unitary diagonalization method for the MC matrix based on plane wave decomposition. |
A. Tang; S. Song; C. -Y. Tsui; R. C. de Lamare; M. Debbah; |
| 9 | SFN-Net: Integrating Spatial-Frequency Feature Fusion Into Deep Unfolding Network with NESTA for Compressive Sensing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, traditional CS methods heavily rely on manual parameter tuning and often focus solely on spatial domain feature extraction, leading to potential information loss. To address these issues, we propose SFN-Net, a deep unfolding network based on NESTA with integrated spatial-frequency features. |
T. Lu; H. Li; X. Yan; Y. Wu; L. Wei; S. Liu; |
| 10 | RCAL: Reinforced Cross-Modal Alignment for Multimodal Sentiment Analysis with Sparse Visual Frames Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose RCAL, a vision-centric framework explicitly designed for MSA under extreme visual sparsity. |
X. Song; X. Tao; J. Wu; T. T. Khoei; |
| 11 | HAM-SAM2: Enhancing SAM2 for Visual Object Tracking with Adaptive Motion Modeling and Hierarchical Memory Bank Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose HAM-SAM2, a training-free framework that jointly enhances both motion and memory for robust visual object tracking. |
K. Pan; G. Chen; W. Zhu; D. Zhao; T. Lu; |
| 12 | Mongoose: Do We Need A Scanner for Vision Mamba? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these existing methods introduce computational overhead due to specialized scanning techniques. To address these issues, we propose Mongoose, a simplified SSM that eliminates scanning mechanisms. |
B. N. Patro; V. S. Agneeswaran; |
| 13 | PPDD: A Unified Push–Pull Adversarial Objective in Feature and Logit Spaces for Dataset Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose PPDD, a unified Push–Pull objective that aggregates gradients into a single update: a Push term that maximizes reverse KL in logit space to mine low-density, high-uncertainty boundary regions, and two Pull terms that anchor fidelity via feature space MSE and semantic calibration. |
H. Huang; Y. Zhang; J. Song; W. Zhao; P. Ren; |
| 14 | FastEagle: Cascaded Drafting for Accelerating Speculative Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present FastEagle, a non-autoregressive cascaded drafter that emits an entire draft in a single forward pass. |
H. Huang; J. Song; W. Zhao; P. Ren; |
| 15 | SSUN: Symmetric Cross-Stage State Interaction Deep Unrolling Network for Hyperspectral and Multispectral Image Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although recent approaches have incorporated the deep unrolling network (DUN) to enable more explainable reconstruction, their performance remains constrained by weak cross-stage state dependencies between iterative steps. To handle this limitation, this paper proposes a symmetric cross-stage state interaction deep unrolling network (SSUN) for HS-MS image fusion, with a focus on enhancing long-range dependencies across successive stages. |
X. Shen; |
| 16 | WiRAG: Retrieval-Augmented Generation with Large Language Models (LLM) Framework for WiFi-Based Human Activity Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose WiRAG, a retrieval-augmented generation (RAG) with large language model (LLM) framework for WiFi-based human activity recognition (HAR). |
X. Shen; |
| 17 | DARL-CLIP: Density-Adaptive and Reinforcement Fine-Tuning CLIP for Cross-Scenario UAV Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes DaRL-CLIP, a density-adaptive and reinforcement fine-tuning CLIP agent, to enable robust cross-condition generalization of UAV object detection under imbalanced scenario distributions and limited scene diversity. |
C. Guo; |
| 18 | FinUA: Generating Diverse User Interactions for Financial Dialogue Systems Through User Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing simulation methods are unsuitable for this domain, as they generate data that is homogenized, factually inaccurate, and stylistically monotonous. To address these shortcomings, we propose the Financial User Agent (FinUA), a novel simulator incorporates two key mechanisms: a dialogue goal divergence mechanism to generate diverse and factually grounded goals, and a profile augmentation method to imbue simulated users with authentic linguistic habits and irrational behaviors. |
S. Dou; |
| 19 | Target Speaker Anonymization in Multi-Speaker Recordings Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, current evaluation methodology does not allow us to accurately assess privacy protection and utility in this complex multi-speaker scenario. This work aims to bridge these gaps by exploring effective strategies for targeted speaker anonymization in conversational audio, highlighting potential problems in their development and proposing corresponding improved evaluation methodologies1. |
N. Tomashenko; J. Yamagishi; X. Wang; Y. Liu; E. Vincent; |
| 20 | A Framework for Controlled Multi-Speaker Audio Synthesis for Robustness Evaluation of Speaker Diarisation Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing work focuses on using simulated data for training neural diarisation systems, but its suitability for evaluation and resemblance to real-world conversations is less studied. This paper presents a configurable synthesis framework to address these gaps. |
S. Ramoji; V. K. Thoppe Ravindranath; T. Hain; |
| 21 | Position-Invariant Fine-Tuning Of Speech Enhancement Models With Self-Supervised Speech Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work frames the problem as a general limitation of self-supervised representation fine-tuning and investigates it through representation-guided SE. |
A. Meghanani; T. Hain; |
| 22 | DAAGNet: Depth-Adaptive Anchor Graph for Weakly-Supervised Crowd Counting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We observe that the cause of this problem lies in the widespread neglect of depth, a critical geometric prior. To address this issue, we introduce Depth-Adaptive Anchor Graph for weakly-supervised Crowd Counting (DAAGNet), which explicitly injects metric depth into the pipeline. |
Y. Lei; X. Wang; L. Wang; Y. Wang; W. Liang; |
| 23 | Zero-Shot TTS with Enhanced Audio Prompts: BSC Submission for The 2026 WildSpoof Challenge TTS Track Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To handle acoustic noise, we implement a multi-stage enhancement pipeline using the Sidon model, which significantly outperforms standard Demucs in signal quality. |
J. Giraldo; A. Peiró-Lilja; R. Zevallos; C. España-Bonet; |
| 24 | DSVM-UNET : Enhancing VM-UNET With Dual Self-Distillation For Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a simple yet effective approach to improve the model by Dual Self-distillation for VM-UNet (DSVM-UNet) without any complex architectural designs. |
R. Shao; |
| 25 | SinDiff: Spoken-to-Sign Language Generation with Transformer-Based Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose SinDiff, a transformer-based diffusion framework for spoken-driven SLG that leverages dynamic attention and global context modeling. |
W. Liang; Y. Zhi; X. Xu; |
| 26 | Maximizing Secure Energy Efficiency in UAV-Assisted Backscattering Networks Using Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The scenario involves single UAV and static backscatter devices (BDs) in the presence of multiple mobile eavesdroppers (Eavs) attempting to intercept the backscattered information from the BDs. To counteract these eavesdropping, we propose a novel artificial noise (AN) injection scheme to degrade Eavs’ links. |
A. Mondal; D. Mishra; A. Al-Nahari; R. Jäntti; |
| 27 | DistilMOS: Layer-Wise Self-Distillation for Self-Supervised Learning Model-Based MOS Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we propose DistilMOS, a novel method that learns to predict not only MOS but also token IDs obtained by clustering the hidden representations of each layer in the pretrained SSL model. |
J. Yang; W. Nakata; Y. Saito; H. Saruwatari; |
| 28 | Sequential Geodesic Adaptation for Auditory Attention Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This issue is central to Task 2 of the EEG-AAD 2026 Challenge, which targets cross-session generalization by requiring models primarily trained on Audio-only data to decode attention in unseen Audio-Visual scenarios. To address this, we construct a unified Riemannian manifold that aligns covariance distributions from both modalities into a common geometric space, establishing a robust shared embedding via Gamma-band filtering (25-50 Hz) and Tangent Space projection, classified by a Hybrid Ensemble. |
M. Finocchiaro; S. Calcagno; S. Palazzo; C. Spampinato; F. P. Salanitri; |
| 29 | Dual-Guided Multi-Granularity Implicit Alignment Network for Medical Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing methods often neglect the challenge of multi-modal alignment due to the data heterogeneity. To address this issue, we propose a dual-guided multi-granularity implicit alignment network (Med-MGIA) that establishes cross-modal correlations without bounding box annotations. |
Q. Teng; J. Chen; D. Yuan; Y. Liu; Z. Liu; |
| 30 | Spatial-CLAP: Learning Spatially-Aware Audio–Text Embeddings for Multi-Source Conditions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The central challenge in modeling spatial information lies in multi-source conditions, where the correct correspondence between each sound source and its location is required. To tackle this problem, we propose Spatial-CLAP, which introduces a content-aware spatial encoder that enables spatial representations coupled with audio content. |
K. Seki; Y. Okamoto; K. Yamaoka; Y. Saito; S. Takamichi; H. Saruwatari; |
| 31 | Fusion of Transformer and CNN Attention Networks for Learned Image Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although self-attention gifts Transformers the power to resurrect it, this mercy activates only when oceans of training data are provided; furthermore, the fully-connected web makes attention maps tremble at the slightest perturbation and, paradoxically, squander capacity on the very smooth, low-frequency substance that image compression craves. To overcome these drawbacks for learned image compression, we introduce the Fusion of Transformer and CNN Attention Networks (FTCAN). |
J. Hu; J. Guo; X. Zhang; K. L. E. Law; |
| 32 | Multi-View Frequency Alignment and State Space Parameter Fusion for Lightweight Camouflaged Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods typically adopt local frequency transforms to capture fine-grained textures but neglect global structural cues, and their conventional fusion strategies fail to align multimodal feature spaces effectively. To address these issues, we propose FASF-Net, a lightweight COD network that leverages multi-view frequency alignment and multimodal state space parameter fusion. |
W. Liang; C. Chen; M. Yu; J. Du; S. Li; J. Xu; |
| 33 | Diffusion-Link: Diffusion Probabilistic Model for Bridging The Audio-Text Modality Gap Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Diffusion-Link, a diffusion-based modality-bridging module that generatively maps audio embeddings into the text-embedding distribution. |
K. Nam; J. Choi; H. Lee; J. Heo; J. S. Chung; |
| 34 | Robust CPD-Based DOA Estimation for Rotating Distributed Array Systems Under Inter-Node Calibration Error Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Meanwhile, the rotating array configuration exacerbates the coupling of directional vectors, making it challenging for traditional matrix-based methods to achieve effective decoupling. To address these issues, we propose a robust canonical polyadic decomposition (CPD)based DOA estimation algorithm that constructs tensor modeling for RDAS. |
Z. Xu; C. Zhou; Z. Shi; |
| 35 | Exploring Confidence As A Reward to Advance LLMS Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we systematically investigate Confidence-as-a-Reward (CRew), a simple, training-free method that utilizes token-level confidence in model’s final answers as a reward signal, especially suitable for closed-ended tasks. |
H. Du; B. Li; C. Xie; C. Gao; K. Chen; D. Tao; |
| 36 | SWAN: Boosting Image Super-Resolution with Stochastic Wavelet Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a Stochastic Wavelet Attention (SWA) mechanism that efficiently models global-local dependencies in both spatial and frequency domains. |
S. Xiong; |
| 37 | MMFast: Rethinking Vision-Language Interaction in Efficient MLLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work investigates the fusion dynamics within auto-regressive MLLMs and reveals that critical fine-grained interactions occur predominantly in intermediate layers, while early and late layers exhibit significant redundancy. Motivated by these insights, we propose MMFast, a novel MLLM architecture that achieves a superior trade-off between efficiency and performance. |
S. Xiong; |
| 38 | Dissecting Performance Degradation in Audio Source Separation Under Sampling Frequency Mismatch Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Audio processing methods based on deep neural networks are typically trained at a single sampling frequency (SF). |
K. Imamura; T. Nakamura; K. Yatabe; H. Saruwatari; |
| 39 | Content Adaptive Switchable Hyperprior Networks for Learned Image Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This poses a suboptimal solution due to the diverse image content typically found in compression tasks. We propose a coder-agnostic solution to this challenge which we call Content Adaptive Switchable Hyperprior networks (CASH). |
S. Deniffel; J. Seiler; A. Kaup; |
| 40 | Closed-Loop Co-Adaptive Retinal Coding with Joint Topological-Spectral Feature Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a closed-loop, co-adaptive framework that jointly optimizes spike train generation and decoding. |
C. Qin; |
| 41 | Denoising Diffusion Model for DOA Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The proposed model is trained on signals with a fixed number of sources, yet can generalize to scenarios with a variable number of sources. |
F. Qian; C. Zhou; Z. Shi; |
| 42 | HILO: Hierarchical Feature Fusion Via Local-Global Attention for Multimodal Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, due to their inherent linguistic bias, visual information is often underrepresented during cross-modal fusion, which limits their overall multimodal representation capability. To mitigate this issue, we propose HILO, a novel vision-language architecture specifically designed for multimodal embeddings. |
X. Zuo; |
| 43 | ClearGCD: Mitigating Shortcut Learning for Robust Generalized Category Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose ClearGCD, a framework that suppresses reliance on non-semantic cues through two complementary mechanisms: Semantic View Alignment (SVA), generating strong augmentations via cross-class patch replacement while enforcing semantic consistency with weak augmentations, and Shortcut Suppression Regularization (SSR), maintaining an adaptive prototype bank that aligns known classes and separates potential novel ones. |
K. Lyu; |
| 44 | Sidon: Fast and Robust Open-Source Multilingual Speech Restoration for Large-Scale Dataset Cleansing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Sidon, a fast, opensource speech restoration model that converts noisy in-the-wild speech into studio-quality speech and scales to dozens of languages. |
W. Nakata; Y. Saito; Y. Ueda; H. Saruwatari; |
| 45 | Geneses: Unified Generative Speech Enhancement and Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose Geneses, a generative framework to achieve unified, high-quality SE–SS. |
K. Asai; W. Nakata; Y. Saito; H. Saruwatari; |
| 46 | Closed-Form Ziv-Zakai Bound for Compressive Time Delay Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Compressed sensing (CS) techniques have been adopted to time delay estimation problems, allowing the utilization of wider band signals for improved performance. |
S. Wen; Z. Zhang; C. Zhou; Z. Shi; |
| 47 | HADEN: Hierarchical Attentive Alignment and Dual-Contrastive Enhancement Network for Multimodal Few-Shot Relation Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Multimodal few-shot relation extraction (MM-FSRE) requires fusing textual and visual information to identify new relations under low-resource scenarios, but existing methods suffer from inadequate modal alignment and heavy data reliance. To address this, we propose the HADEN framework, which employs a CrossModal-Hierarchical Attention (CHA) module for dynamic alignment of multi-layer semantics and Dual-Perspective Contrastive Learning (DPCL) to enhance feature clustering. |
Z. Ni; H. Li; Y. Sun; |
| 48 | Compressive Recovery of Signals Defined On Perturbed Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present cross-validation based algorithms, along with recovery guarantees, for the novel ‘Compressive Perturbed Graph Recovery’ problem, where the signal is recovered from compressive measurements while correcting the graph perturbations. |
S. Ghosh; A. Rajwade; |
| 49 | SLAP: Scalable Language-Audio Pretraining with Variable-Duration Audio and Multi-Objective Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Third, the standard contrastive training objective operates on global representations, which may hinder the learning of dense, fine-grained audio features. To address these challenges, we introduce Scalable Language-Audio Pretraining (SLAP), which scales language-audio pretraining to 109 million audio-text pairs with variable audio durations and incorporates multiple training objectives. |
X. Mei; |
| 50 | LAMB: LLM-Based Audio Captioning with Modality Gap Bridging Via Cauchy-Schwarz Divergence Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, prior approaches that project audio features into the LLM embedding space without considering cross-modal alignment fail to fully utilize these capabilities. To address this, we propose LAMB, an LLM-based audio captioning framework that bridges the modality gap between audio embeddings and the LLM text embedding space. |
H. Lee; J. Choi; K. Nam; J. S. Chung; |
| 51 | Training-Free Multimodal Guidance for Video to Audio Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a novel training-free multimodal guidance mechanism for V2A diffusion that leverages the volume spanned by the modality embeddings to enforce unified alignment across video, audio, and text. |
E. Grassucci; G. Galadini; G. Cicchetti; A. Uncini; F. Antonacci; D. Comminiello; |
| 52 | Automatic Inter-Animal Alignment of Recorded Kinematic Trajectories Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a framework that couples bidirectional LSTM networks with an orthogonal Procrustes alignment to automatically detect movement onset and corrective turning points in non-human primate reaching tasks. |
A. Markus; N. Sinha; Y. Prut; J. Goldberger; |
| 53 | IODRESEARCH: Deep Research on Private Heterogeneous Data Via The Internet of Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose IoDResearch (Internet of Data Research), a private data-centric Deep Research framework that operationalizes the Internet of Data paradigm. |
Z. Shi; Z. Guo; X. Ma; G. Huang; Y. Ma; X. Jing; |
| 54 | FastAV: Efficient Token Pruning for Audio-Visual Large Language Model Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present FastAV, the first token pruning framework tailored for audio-visual large language models (AV-LLMs). |
C. Jung; Y. Jang; S. Lee; J. S. Chung; |
| 55 | Navigating Modality Uncertainty: Modality-Interaction Enhanced Mixture-of-Experts for Multi-Modal Knowledge Graph Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While existing methods design various multi-modal fusion mechanisms, they largely overlook the inherent disparities in modality quality, as entity modalities across triples differ in informativeness, uncertainty, and noise, and they fail to address sample-specific modality uncertainty, ultimately resulting in suboptimal performance. To address this limitation, we propose MIMoE, a Modality-Interaction Enhanced Mixture-of-Experts framework with an uncertainty-aware router that adaptively integrates heterogeneous modalities. |
H. Shen; |
| 56 | LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The goal of this paper is to provide a new perspective on speech modeling by incorporating perceptual invariances such as amplitude scaling and temporal shifts. |
D. Kwak; Y. Jang; J. S. Chung; |
| 57 | RealCount: Robust Open-World Object Counting Via Duplex Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose RealCount, a multimodal vision-language framework incorporating dual-stream prompt/image adapters and duplex query/input contrastive learning. |
Z. Shi; R. Liu; J. Takahashi; S. Jiang; |
| 58 | UNMIXX: Untangling Highly Correlated Singing Voices Mixtures Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce UNMIXX, a novel framework for multiple singing voices separation (MSVS). |
J. Jung; J. -H. Kim; D. Kwak; J. Lee; J. Nam; J. S. Chung; |
| 59 | Deconfusion CLIP Towards Robust Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a novel framework that employs a few-shot approach to first deconfuse CLIP and then learn prompts that can better distinguish between ID and OOD data. |
Z. Xun; Z. Hu; L. Lan; W. Yang; G. Tang; |
| 60 | Bayesian Jammer Localization with A Hybrid CNN and Path-Loss Mixture of Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a hybrid Bayesian mixture-of-experts framework that fuses a physical path-loss (PL) model and a convolutional neural network (CNN) through log-linear pooling. |
M. Jaramillo-Civill; L. González-Gudiño; T. Imbiriba; P. Closas; |
| 61 | DPMM-CFL: Clustered Federated Learning Via Dirichlet Process Mixture Model Nonparametric Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose DPMM-CFL, a CFL algorithm that places a Dirichlet Process (DP) prior over the distribution of cluster parameters. |
M. Jaramillo-Civill; P. Wu; P. Closas; |
| 62 | Dual-Guided Generative Frame Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose dual-guided generative frame interpolation (DGFI), a framework that integrates semantic guidance from vision-language models and flow guidance into a pre-trained diffusion-based image-to-video (I2V) generator. |
Y. Wei; H. Amirpour; C. Timmerer; |
| 63 | Semantic-Guided Modal Alignment for Multimodal Cardiovascular Disease Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Multimodal data can offer complementary insights from different perspectives, thereby enhancing detection accuracy. |
G. Zhang; D. Liu; Y. Lu; H. Sun; B. Lin; Z. Shi; |
| 64 | DA-VLM: Data Factory with Minimal Effort Using VLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, it often requires costly training or compromises performance. We address these limitations by proposing a novel automated pipeline that combines pre-trained ControlNet and Vision-Language Models to generate pixel-level labelled realistic images without additional training or manual annotations. |
J. Ye; J. -X. Zhong; Q. Xie; Y. Zhou; N. Trigoni; A. Markham; |
| 65 | Gram-Schmidt Feature Selection for Class Activation Maps Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Gram-Schmidt Feature Selection CAM (GFS-CAM), a novel gradient-free approach that leverages Gram-Schmidt orthogonalization to construct disentangled visual explanations. |
K. Safavigerdini; B. Yaghooti; B. Sinopoli; K. Palaniappan; |
| 66 | Signal-Driven Joint Safety-Comfort Objective for Real-Time Trajectory Replanning on Rutted Roads Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a signal-driven joint objective for real-time trajectory replanning on rutted roads. |
X. Shen; K. Li; H. Hu; Z. Zhang; N. Tang; |
| 67 | GCE-UQ: Quantifying and Decomposing Uncertainty in Graph Counterfactual Explanations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce GCE-UQ, a framework that quantifies, decomposes, and mitigates uncertainty for GCEs. |
C. Guo; S. Xie; X. Zhang; |
| 68 | OF-SemWat: High-Payload Text Embedding for Semantic Watermarking of AI-Generated Images with Arbitrary Size Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a high-payload image watermarking method for textual embedding, where a semantic description of the image – which may also correspond to the input text prompt-, is embedded inside the image. |
B. Tondi; A. Costanzo; M. Barni; |
| 69 | A Long-Form Single-Speaker Real-Time MRI Speech Dataset and Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: We release the USC Long Single-Speaker (LSS) dataset containing real-time MRI video of the vocal tract dynamics and simultaneous audio obtained during speech production. This … |
S. Foley; |
| 70 | Two-Dimensional Tomographic Reconstruction from Projections with Unknown Angles and Unknown Spatial Shifts Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present a technique for two-dimensional (2D) tomography in which both viewing angles and spatial shifts associated with the projections are unknown. |
S. J. Grampurohit; S. Mulleti; A. Rajwade; |
| 71 | Identity Leakage Through Accent Cues in Voice Anonymisation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We report a study of residual accent information involving multiple anonymisation systems. |
R. Bakari; O. L. Blouch; N. Gengembre; N. Evans; M. Panariello; |
| 72 | A Dataset of Robot-Patient and Doctor-Patient Medical Dialogues for Spoken Language Processing Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes MeDial-Speech, a novel speech dataset for training and evaluating Med-AIs that can carry out consultations with patients. |
H. Cuayáhuitl; G. Jang; |
| 73 | Interval-Aware Retrieval Framework For Speech-Based Automatic Alzheimer’s Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing systems typically insert symbolic pauses or attach acoustic features, followed by simple fusion, which weakens token-level alignment and lacks a normative reference for healthy timing. To address these issues, this paper proposes an interval-aware retrieval framework that explicitly incorporates temporal knowledge into speech-based AD detection. |
M. Gu; |
| 74 | Towards Reliable Time Series Forecasting Under Future Uncertainty: Ambiguity and Novelty Rejection Mechanisms Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To enhance model reliability, we introduce a dual rejection mechanism combining ambiguity and novelty rejection. |
N. Feng; |
| 75 | DMP-TTS: Disentangled Multi-Modal Prompting for Controllable Text-to-Speech with Chained Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present DMP-TTS, a latent Diffusion Transformer (DiT) framework with explicit disentanglement and multi-modal prompting. |
K. Yin; |
| 76 | MFF-RVRDI: Multimodal Fusion Framework for Robust Video Recording Device Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose MFF-RVRDI, a multimodal framework that fuses video and audio features for robust device identification. |
W. Li; Y. Cao; X. Shen; |
| 77 | SPADE: Structured Pruning and Adaptive Distillation for Efficient LLM-TTS Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The goal of this paper is to introduce SPADE, a framework for Structured Pruning and Adaptive Distillation for Efficient Large Language Model-based text-to-speech (LLM-TTS). |
T. D. Nguyen; J. Kim; J. -H. Kim; S. Choi; Y. Lim; J. S. Chung; |
| 78 | RESBIDET: Efficient Dual-Branch Small Object Detection for UAVs Under Resource-Constrained Conditions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Real-time Unmanned Aerial Vehicle (UAV) object detection is challenged by strict onboard resource constraints, particularly when processing small objects in high-resolution images where existing methods struggle to balance accuracy and efficiency. To address this fundamental constraint, this paper proposes a dual-branch lightweight detector (ResBiDet). |
C. Guo; Y. Li; J. Ma; Z. Fang; Z. Niu; H. Xu; |
| 79 | SchrÖMind: Mitigating Hallucinations in Multimodal Large Language Models Via Solving The Schrödinger Bridge Problem Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Minor perturbations can shift attention from truthful to untruthful states, and the autoregressive nature of text generation often prevents error correction. To address this, we propose SchröMind—a novel framework reducing hallucinations via solving the Schrödinger bridge problem. |
Z. Shi; R. Liu; S. Yu; S. Munakata; K. Shirahata; |
| 80 | Heatmap-to-SMPL Multi-View Radar Transformer for Multi-Person 3D Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, by retaining heatmap fidelity and simultaneously exploiting shape priors, we propose RHAMP: a Radar HeAtmapto-SMPL Pose transformer for 3D human pose estimation. |
S. Kato; P. P. Wang; T. Fujihashi; A. Markham; |
| 81 | RADI: A Retrieval-Augmented Dynamic In-Context Learning Framework for AIGC Image Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While the sophisticated semantic reasoning capabilities of Multimodal Large Language Models (MLLMs) make them theoretically well-positioned for this challenge, their practical application is hampered by performance instability in zero-shot and few-shot contexts. To address this limitation, we propose RADI, a training-free Retrieval-Augmented Dynamic In-Context Detecting Framework. |
T. Bi; R. Ma; Y. Huang; Y. Wang; J. Liu; S. Zhang; |
| 82 | Dual Prototype Learning and Multi-Stream Perturbation for Robust Semi-Supervised Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite significant progress in this field, current methods still face challenges such as insufficient feature space constraints and pseudo-label noise, which limit further improvement of model performance. To address these limitations, we propose a robust semisupervised medical image segmentation method via dual prototype learning and multi-stream perturbation. |
G. Du; J. Xu; R. Wu; X. Zeng; S. Xiong; |
| 83 | The Synergistic Role of Audio and Large Video-Language Model in Source-Free Video Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a novel framework that unifies the complementary strengths of large video-language models (LVLMs) and rich audio features for source-free video unsupervised domain adaptation (SFVUDA) in action recognition. |
T. L. Liu; I. Stavness; M. Rochan; |
| 84 | Optimal QAM Constellation for Over-the-air Computation in The Presence of Heavy-Tailed Channel Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We seek QAM-like constellations that minimize the mean-squared error (MSE) of sum aggregation subject to an average-power constraint. |
S. Razavikia; D. Gündüz; C. Fischione; |
| 85 | A Novel Bayesian EM-Like Algorithm for Fast Compton Camera Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes a fast expectation-maximization (EM)-like algorithm for visualizing radioactive sources within a target region using maximum a posteriori (MAP) estimation and a Gaussian Markov random field (GMRF) model. |
N. Le; H. Snoussi; |
| 86 | Latentguard: Robust Latent Watermarking for Deepfake Tracing and Forgery Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While existing watermarking methods have shown promise, their pixel-domain designs are inherently vulnerable to latent-space manipulations performed by modern generative models (e.g., Stable Diffusion). To address this fundamental limitation, we propose LatentGuard, an unified latent-space watermarking framework that simultaneously achieves robust ownership verification and precise forgery localization. |
P. Yu; J. Xie; X. Zhou; J. Fei; Z. Xia; |
| 87 | Decoding Neural Mechanisms of Emotional Processing in Tinnitus: ERP and Gamma-Band EEG Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We investigated neural responses to negative emotion using EEG in 40 CST patients and 31 healthy controls (HC). |
J. Xia; |
| 88 | Debatecti: Enhancing ATT&CK Technique Identification in CTI Reports Via A Role-Specialized Multi-Agent Debate Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Accurately automating the analysis of Cyber Threat Intelligence (CTI) reports to identify MITRE ATT&CK techniques remains a critical challenge due to the labor-intensive nature of manual mapping and the limitations of existing NLP and LLM-based methods, which often suffer from hallucinations, incoherent reasoning, and knowledge isolation. To address these challenges, we propose DebateCTI—a novel framework that integrates a multi-agent debate mechanism with parameter-efficient fine-tuning. |
J. Xia; |
| 89 | EuleroDec: A Complex-Valued RVQ-VAE for Efficient and Robust Audio Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work we introduce an end-to-end complex-valued RVQ-VAE audio codec that preserves magnitude-phase coupling across the entire analysis-quantization-synthesis pipeline and removes adversarial discriminators and diffusion post-filters. |
L. Cerovaz; M. Mancusi; E. Rodolà; |
| 90 | Dense RGB-D Slam for Endoscopic Surgery Via Quadratic Gaussian Splatting, EndoQS-Slam Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing Simultaneous Localization and Mapping (SLAM) approaches are limited by sparse textures and difficulties in accurately representing anatomical structures. To address these challenges, we introduce EndoQS-SLAM, a novel dense RGB-D SLAM system that leverages Quadratic Gaussian Splatting (QGS) to achieve high-quality reconstruction. |
Z. Yang; J. Liu; X. Ding; W. Wei; P. Su; C. He; |
| 91 | Improving Anomalous Sound Detection with Attribute-Aware Representation from Domain-Adaptive Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address the challenge of missing attribute labels, this paper proposes an agglomerative hierarchical clustering method for the assignment of pseudo-attribute labels using representations derived from a domain-adaptive pre-trained model, which are expected to capture machine attribute characteristics. |
X. Fang; |
| 92 | WaterFlow: Explicit Physics-Prior Rectified Flow for Underwater Saliency Mask Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose WaterFlow, a rectified flow-based framework for underwater salient object detection that innovatively incorporates underwater physical imaging information as explicit priors directly into the network training process and introduces temporal dimension modeling, significantly enhancing the model’s capability for salient object identification. |
R. Li; S. Lian; H. Li; Y. Li; W. Wu; S. Kwong; |
| 93 | Fake Image Detection on Noise Residual Spectra Via Random-Feature Single-Layer Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Compared to deep networks, this offers significant advantages in terms of complexity, while maintaining good accuracy. We show that, based on asymptotic formulas from random matrix theory, hyperparameter optimization can be conducted in closed form, so greatly reducing its computational cost. |
E. Mele; A. Coluccia; |
| 94 | HAD: Hybrid Adversarial Distillation Against Adversarial Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To mitigate the inherent trade-off between robustness and accuracy, we propose a novel framework, Hybrid Adversarial Distillation (HAD). |
J. Zou; S. Zhang; M. Qiu; |
| 95 | Learning Reference-Guided Exposure Correction With Hybrid Illumination Characteristics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present HICNet, a reference-guided exposure correction framework. |
H. Ren; Z. Bi; Z. Wan; H. Cheng; |
| 96 | DSPC: Dual-Stage Progressive Compression Framework for Efficient Long-Context Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, most existing methods require training a small auxiliary model for compression, incurring a significant amount of additional computation. To avoid this, we propose a two-stage, training-free approach, called Dual-Stage Progressive Compression (DSPC). |
Y. Gao; Y. Lu; Z. Zhang; J. Nie; S. Yu; Q. Xuan; |
| 97 | Fine-Grained Hashing Via Center Similarity Guided Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose a Fine-Grained Hashing method via Center Similarity Guided Quantization (FGH). |
C. He; H. Wei; |
| 98 | Multi-Task Learning For Speech Quality Assessment Using ASR-Derived Entropy Features Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work introduces a novel approach that exploits entropy computed from automatic speech recognition (ASR) model predictions as a quality indicator for Non-reference speech quality assessment approach. |
T. D. Do; B. Thang Ta; V. H. Do; |
| 99 | FDCNet: Frequency Domain Channel Attention and Convolution for Lipreading Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In lipreading, conventional frontend frameworks primarily extract features in the spatial domain, which limits their ability to process mixed-frequency visual signals containing both low-frequency macroscopic lip shapes and high-frequency details, leading to insufficient extraction of critical information. To address this challenge, we propose a frequency-domain collaborative network, FDCNet. |
Q. Yan; Q. Zhang; L. Zhang; L. Yu; L. Sheng; |
| 100 | AURA: YCbCr-Based Universal Raw-Reconstruction for Inverse ISP Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods struggle to balance robustness and generalization. To address this, we propose AURA, a universal RAW reconstruction architecture that requires no camera metadata. |
H. Cheng; |
| 101 | Federated Clustering Without K: Adaptive Prototype Aggregation on Heterogeneous Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Federated clustering algorithms offer a privacy-preserving approach to unsupervised learning; however, most standard federated clustering algorithms critically falter on realistic, non-identically and independently distributed (Non-IID) data by imposing a fixed number of clusters on all clients (k). To overcome this fundamental limitation, we propose Adaptive Prototype Aggregation Federated Clustering (APA-FC), a novel framework that eliminates the need for a preset k. |
G. He; Z. Wang; R. Zhang; B. Yan; R. Wang; F. Nie; |
| 102 | Efficient Segment Anything with Depth-Aware Fusion and Limited Training Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a lightweight RGB-D fusion framework that augments EfficientViT-SAM with monocular depth priors. |
Y. Zhou; X. Xie; P. Li; A. Kunz; A. Osman; X. Maldague; |
| 103 | FocalCodec-Stream: Streaming Low-Bitrate Speech Coding Via Causal Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present FocalCodec-Stream, a hybrid codec based on focal modulation that compresses speech into a single binary codebook at 0.55 – 0.80 kbps with a theoretical latency of 80 ms. Our approach combines multi-stage causal distillation of WavLM with targeted architectural improvements, including a lightweight refiner module that enhances quality under latency constraints. |
L. D. Libera; C. Subakan; M. Ravanelli; |
| 104 | STDiffusion: A Spatiotemporal Interpolation-Oriented Diffusion Model for Signal Series Latent Representation Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose STDiffusion, an interpolation-guided spatiotemporal diffusion framework that replaces forward noise addition with spatiotemporal attention interpolation and UNet denoising with a deterministic reverse predictor in latent space, explicitly coupling diffusion steps with physical time. |
H. Xiong; |
| 105 | Learning from Disagreement: A Group Decision Simulation Framework for Robust Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a fundamentally new approach with our group decision simulation framework, which works by mimicking the collaborative decision-making process of a clinical panel. |
C. Zhong; |
| 106 | SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Compared to the visual domain, one bottleneck is the lack of large-scale chain-of-thought audio data to teach LALM stepwise reasoning. To circumvent this data and modality gap, we present SightSound-R1, a cross-modal distillation framework that transfers advanced reasoning from a stronger LVLM teacher to a weaker LALM student on the same audio–visual question answering (AVQA) dataset. |
Q. Wang; X. Jiang; L. He; J. Wu; N. Mesgarani; |
| 107 | SA-SSL-MOS: Self-Supervised Learning MOS Prediction with Spectral Augmentation for Generalized Multi-Rate Speech Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While self-supervised learning (SSL) models have been widely adopted in SQA to boost performance, a key limitation is that they are pre-trained on 16 kHz speech and therefore discard high-frequency information present in higher sampling rates. To address this issue, we propose a spectrogram-augmented SSL method that incorporates high-frequency features (up to 48 kHz sampling rate) through a parallel-branch architecture. |
F. Cao; |
| 108 | CARE-Agent: Multi-Agent Collaboration with Conflict-Aware Routing Mechanism for Diagnosis Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Deep temporal models excel at capturing sequential patterns but offer limited interpretability, whereas large language models (LLMs) provide contextual clinical reasoning and explanation yet struggle with structured EHR inputs. To bridge these complementary strengths, we present CARE-Agent, a multi-Agent collaboration with a Conflict-Aware Routing mEchanism for accurate and reliable diagnosis prediction, which coordinates various deep predictors and LLMs. |
P. Zhan; |
| 109 | Flowiid: Single-Step Intrinsic Image Decomposition Via Latent Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This makes them costly to combine with other models in real-world settings. To address this problem, we propose a flow matching-based solution. |
M. Singla; S. Kumari; S. Raman; |
| 110 | Enhancing Post-Training Quantization Via Future Activation Awareness Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although this method is efficient, it suffers from quantization bias and error accumulation, resulting in suboptimal and unstable quantization, especially when the calibration data is biased. To overcome these issues, we propose Future-Aware Quantization (FAQ), which leverages future-layer activations to guide quantization. |
Z. Lv; Z. Fan; Q. Tian; W. Zhang; Y. Zhuang; |
| 111 | Amplitude Optimization Driven Multi-OFDM Waveform Design with Good PMEPR and ISL Performances for Joint Radar and Communications Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, We formulate a trade-off objective that balances the ISL and PMEPR of OFDM and enforce constraints on IE accuracy, total transmit power, and per-subcarrier amplitude bounds. |
X. Xu; Y. Li; R. Tao; T. Shan; |
| 112 | Gaussian-Grounded Contextual Hierarchical Inference for Weakly Supervised Video Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing multiple instance learning approaches often fail to capture the diversity and contextual dynamics of abnormal events, resulting in limited localization accuracy. To address this, we propose Gaussian-grounded Contextual Hierarchical Inference (GCHI), a novel framework that learns discriminative Gaussian-grounded feature representations via conditional normalizing flows, models long-range temporal dependencies through contextual aggregation, and performs joint coarse-to-fine anomaly inference by aligning visual features with textual semantics and anomaly prototypes. |
W. Zheng; T. Zhang; Z. Cui; C. Xu; |
| 113 | Feature Identification for Hierarchical Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, conventional classification approaches often neglect the relationships between classes at different hierarchy levels, leading to suboptimal performance. To address this limitation, we propose a novel hierarchical contrastive learning method that leverages a Gaussian Mixture Model and an attention mechanism to capture hierarchy-specific features. |
J. Ott; N. Vysotskaya; H. Sun; L. Servadei; R. Wille; |
| 114 | Automatic Music Mixing Using A Generative Model of Effect Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here we introduce MEGAMI (Multitrack Embedding Generative Auto MIxing), a generative framework that models the conditional distribution of professional mixes given unprocessed tracks. |
E. Moliner; |
| 115 | C-Conformer: Channel-Augmented Conformer for Sound Event Localization and Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the Conformer is primarily designed for single-channel audio recognition and models only temporal dependencies, overlooking inter-channel features for multi-channel spatial perception. To address this, we propose the Channel-augmented Conformer (C-Conformer), which extends the Conformer by explicitly modeling both temporal and inter-channel relationships. |
C. He; S. Cheng; J. Bao; J. Liu; |
| 116 | Why Temporal Modeling Modules Fall Short in Temporally Sensitive Video-Text Retrieval Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose the CLIP-Preserving Decoupled Temporal Branch (CP-DTB), which uses self-attention and convolution to construct temporal modeling modules, leverages video self-supervised learning to reduce temporal modeling difficulty, and decouples the temporal branch from CLIP’s visual branch to preserve CLIP’s initial feature space. |
C. He; B. Yang; Y. Pang; Y. Cao; |
| 117 | Span Pruning and Syntactic Awareness for Aspect Sentiment Triplet Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose SPASA, a model that combines span pruning and syntax awareness, divided into two stages: Named Entity Recognition (NER) and Relation Extraction (RE). |
B. Cui; W. Wang; S. Liu; |
| 118 | Benchmarking Music Autotagging with MGPHot Expert Annotations Vs. Generic Tag Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Altogether, our contributions provide a more advanced benchmarking framework for future research in music understanding. |
P. Ramoneda; P. Alonso-Jiménez; S. Oramas; X. Serra; D. Bogdanov; |
| 119 | Synergistic Alignment Network for Robust Cross-Subject RSVP-EEG Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose the Synergistic Alignment Network (SynAlign-Net), a novel framework for robust cross-subject decoding. |
B. Fu; W. Gu; F. Li; X. Cai; Y. Niu; |
| 120 | Multi-Band Frequency Prompt Tuning for Source-Free Cross-Domain Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Source-Free Cross-Domain Few-Shot Learning (SF-CDFSL) suffers from the absence of source-domain data and the scarcity of target-domain samples, which makes it challenging to transfer domain knowledge and to learn discriminative representations for novel classes in the target domain. To address these issues, We propose Multi-Band Frequency Prompt Tuning (MB-FPT), a prompt-based framework that simultaneously aligns domain information and enhances class discrimination. |
R. Wu; S. Xiong; |
| 121 | Active Inference Framework for Closed-Loop Sensing, Communication, and Control in UAV Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce the active inference framework into SCC-enabled uncrewed aerial vehicle systems for joint state estimation, control, and sensing resource allocation. |
G. Pan; L. Bai; Z. Tian; H. Chen; M. Bennis; H. Wymeersch; |
| 122 | Attribute Driven W Space for Query Limited Face Template Inversion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods are effective under white box or high query settings, but practical systems are typically black box with restricted query access, as commonly found in real world APIs with strict query limits. To address this challenge, we propose an attribute driven $\mathcal{W}$ space framework leveraging the generative prior of StyleGAN3, which integrates attribute modulated attention, conditional layer normalization, and a low query supervision loss to enable identity preserving reconstruction in this constrained setting. |
Z. Zhou; Z. Shen; L. Dai; P. Yu; K. Gan; Z. Xia; |
| 123 | BINR: Live Video Broadcasting Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While the Quality of Experience (QoE) has been extensively studied for Video-on-Demand (VoD) services, the QoE of live broadcast videos remains relatively underexplored. In this paper, we address this gap by proposing a novel machine learning–based model for QoE prediction in live video broadcasting scenarios. |
H. Amirpour; M. Hamidi; W. Zhou; L. Atzori; C. Timmerer; |
| 124 | A Unified Four-Stage Dynamic Cycle for Robust Federated Fine-Tuning of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present SplitLETS, a unified four-stage cycle that combines LoRA, SAM, and LETS. |
B. Tan; J. Ren; Y. Li; A. Chaddad; |
| 125 | Learning Depth Guidance for Camouflaged Object Detection Without Annotations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite progress in RGB-only unsupervised detection, deep unsupervised RGB-D COD remains largely unexplored. To our knowledge, We present the first unsupervised RGB-D COD framework built on a systematic pseudo-label mining pipeline with two components: Depth-Guided Layer Decomposition (DGLD), which extracts geometric structure from depth maps to produce coarse pseudo masks; and Uncertainty-aware Label Optimization (ULO), which refines them by estimating pixel-wise uncertainty from the depth source. |
T. Han; |
| 126 | Hyperfedfs: Heterogeneous Federated Few-Shot Learning With Hypergraph-Driven Collaborative Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Federated few-shot learning faces dual challenges arising from data scarcity and client heterogeneity. We propose HyperFedFS, an end-to-end framework that addresses this problem through hypergraph-driven collaborative aggregation mechanisms. |
Q. Tan; Z. Wu; |
| 127 | DrivingScene: A Multi-Task Online Feed-Forward 3D Gaussian Splatting Method for Dynamic Driving Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose DrivingScene, an online, feed-forward framework that reconstructs 4D dynamic scenes from only two consecutive surround-view images. |
Q. Hou; W. Sun; C. Zeng; C. Wang; H. Li; J. Cui; |
| 128 | Graphmd: A Two-Module Diffusion Framework for Smooth and Consistent Molecular Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, their discrete noise-adding process limits the ability to capture smooth oscillatory behavior between consecutive frames and also poses challenges in maintaining spatial structural consistency and effectively processing molecular graph features. To address this, we propose a two-module approach: a molecular graph interaction module, enhanced with classical potential functions, and a diffusion module that uses the Discrete Cosine Transform (DCT) to better capture smooth molecular motions. |
G. Chang; Z. Si; J. Hu; Z. Duan; D. Guo; |
| 129 | SpeechCT-CLIP: Distilling Text-Image Knowledge to Speech for Voice-Native Multimodal CT Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet, nearly all medical AI systems rely exclusively on written text. In this work, we address this gap by exploring the feasibility of learning visual-language representations directly from spoken radiology reports. |
L. Buess; |
| 130 | CIFC-MFD: End-To-End Multi-Face Forgery Detection Using Cross-Image Face Contrast Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although recent end-to-end detection frameworks have emerged, they are confined to analyzing features within a single image, thus failing to leverage relational information across different images. To address this, we propose the Cross-Image Face Contrast Multi-Face Forgery Detection (CIFC-MFD) framework. |
X. Zhou; J. Xie; P. Yu; C. Ou; J. Fei; Z. Xia; |
| 131 | Bridging SAR and Optical Domains: Synergizing Brownian Bridge Diffusion and Local Contrastive Learning for Image Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet, this task remains challenging due to intrinsic sensor limitations (e.g., speckle noise, complex scattering) and algorithmic constraints (e.g., GAN instability, structural degradation of the diffusion models). To address the above challenges, a novel approach named LCCBBDM (Local Contrastive Conditional Brownian-Bridge Diffusion Model) is proposed in this paper, which synergizes the conditional Brownian-bridge diffusion model with local contrastive learning. |
Z. Dai; C. Huo; Z. Ren; |
| 132 | Direct Rician-Domain Processing for Noise-Aware MRI Denoising and Microstructure Preservation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a framework that operates directly in the Rician domain, with contributions including accurate Rician noise estimation, a noise-aware Block-Matching and 3D (BM3D) filtering that adapts denoising strength to local noise, and an adaptive Laplacian sharpening scheme guided by local variance and entropy to restore fine structures. |
S. Mirzaei; P. Nasiopoulos; K. Plataniotis; |
| 133 | Real-Time Markov Modeling for Single-Photon LiDAR: 1000× Acceleration and Convergence Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents the first non-sequential Markov modeling for the timestamp distribution. |
W. Zhang; H. K. Weerasooriya; P. Chennuri; S. H. Chan; |
| 134 | No-Reference Night-Time Image Quality Assessment Via Self-Supervised and Meta-Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present MESA-IQA, which combines self-supervised pre-training with meta-learning to reduce reliance on subjective labels. |
Y. Chen; Q. Sang; |
| 135 | A Conversational Entity Linking Method Based on Sentence Level and Token Level Dual Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: With the development of intelligent assistants, dialogue systems have become increasingly important. Understanding user utterances is crucial for promoting human-machine … |
H. Cheng; S. Li; H. Zhang; M. Fang; S. Liu; |
| 136 | RAFS: Retrieval-Augmented Few-shot CAD Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite recent advances, existing annotated data severely limits the applicability of deep learning approaches, and existing CAD retrieval techniques, which predominantly operate at the part level, often introduce retrieval noise that degrades segmentation performance. To overcome these challenges, we introduce a Retrieval-Augmented Few-shot CAD Segmentation Framework (RAFS), designed to effectively mitigate retrieval-induced noise while leveraging limited annotations in conjunction with input parts. |
Z. Xia; S. Zhao; C. Du; Z. Xiang; B. Cheng; |
| 137 | MCPO: Dynamic Masking and Multi-Comparison Policy Optimization Algorithm for LLM Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce Dynamic Masking and Multi-Comparison Policy Optimization (MCPO), a novel framework designed to enhance the reasoning robustness of LLMs. |
F. Ding; B. Wang; Xiaoping-Zhang; W. Ding; |
| 138 | A User-Item Aware Encoding Framework for Short Video Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Traditional encoding methods from the Professional Generated Content (PGC) era face significant challenges due to two unique characteristics of user-generated short videos: massive daily uploads (reaching hundreds of millions) and heterogeneous content-consumer relationships (varying video quality and diverse consumer contexts). To address these challenges, we propose UIAE (User-Item Aware Encoding), a novel multiple bitrate ladder group encoding method comprising three key components: 1) Establishing user-item relationships via rule-based or DNN-based models; 2) Developing local optimization models maximizing Quality of Experience (QoE) for sub-populations based on contextual consumption patterns; 3) Deriving globally optimal encoding strategies through hierarchical model integration. |
W. Deng; H. Liu; B. Wang; X. Li; D. Fu; Z. Wang; |
| 139 | Precision Neural Networks: Joint Graph and Relational Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To make precision estimation task-aware, we formulate an optimization problem that jointly learns the network parameters and the precision matrix, and solve it via alternating optimization, by sequentially updating the network weights and the precision estimate. |
A. Cavallo; S. Rey; A. G. Marques; E. Isufi; |
| 140 | Emodrive: An Emotion-Aware Vision-Language Model for Human-Centric Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces EmoDrive, a Vision-Language Model (VLM)-based framework that integrates real-time facial emotion recognition with road scene perception to enable adaptive, human-centric driving. |
X. Zhang; Z. Z. Hu; C. Wang; X. Chen; Q. Qu; |
| 141 | Universal Denoising Patterns for Diffusion Image Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Considering that the fake images closely resemble real ones, we propose a feature separation loss to enhance detector’s discrimination capacity. |
Y. Qian; Q. Cai; Y. Pan; T. Yao; Y. Chen; T. Mei; |
| 142 | A Data-Informed Adaptive Convolution Kernel Learning Method for Image Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel data-informed adaptive convolution kernel learning method. |
L. Dai; |
| 143 | DTT-BSR: Gan-Based Dttnet With Rope Transformer Enhancement For Music Source Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The challenge lies in both separating overlapping sources and reconstructing signals degraded by production effects such as compression and reverberation. We therefore propose DTT-BSR, a hybrid generative adversarial network (GAN) combining rotary positional embeddings (RoPE) transformer for long-term temporal modeling with dual-path band-split recurrent neural network (RNN) for multi-resolution spectral processing. |
S. Tan; |
| 144 | Magnet Tracking By A Magnetic Sensor Array with Interactive Multiple Model Estimation For Small-Scale Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes a passive magnetic source tracking method for potential medical applications that can operate in environments without line-of-sight. |
H. Hou; S. Xu; K. C. Ho; M. Cai; K. Doğançay; T. Xu; |
| 145 | TIWNet: A Template-Based Real-Time Image Watermarking Method Using Invertible Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, image-dependent watermarking networks suffer from high embedding costs, although encoder-decoder-based template methods reduce these costs and introduce excessive redundancy that affects image visual quality. To address these limitations, we propose a template-based image watermarking method using the Invertible Neural Network (INN). |
P. Zhou; Y. Li; Y. Zhao; Y. Wu; S. Liu; |
| 146 | LYAPUNOV-Constrained Integral Reinforcement Learning for Stable Admittance Control in Non-Rigid Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a Lyapunov-Constrained Integral Reinforcement Learning (LC-IRL) framework for online admittance parameterization that embeds an energy-based Lyapunov projection into each policy update, enforcing a prescribed decay of the virtual mass–spring energy during learning. |
C. Xu; |
| 147 | Repeater-Assisted Massive MIMO Full-Duplex Communications Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We consider a wireless network comprising multiple single-antenna repeaters that amplify and instantaneously re-transmit received signals in a full-duplex (FD) communication setting. |
M. Mohammadi; D. Kudathanthirige; H. A. Suraweera; H. Quoc Ngo; M. Matthaiou; |
| 148 | Peeking Into The Future for Contextual Biasing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a contextual biasing method for attention based encoder decoder (AED) models using a list of candidate named entities. |
R. Selvakumar; C. Tseng; E. Kim; V. R. Apsingekar; Y. Tang; |
| 149 | Pseudo-Siamese Network for Planning in Target-Oriented Proactive Dialogues Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a Forward-Focused Bidirectional Pseudo-Siamese Network (FF-BPSN) for dialogue path planning toward predefined dialogue targets. |
X. Kang; M. Li; Y. Zheng; F. Kong; |
| 150 | ST-HNTM: Joint Speech-Text Neural Topic Modeling on The Hypersphere Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose ST-HNTM, the first multimodal neural topic model that jointly integrates speech and text within a shared hyperspherical latent space. |
D. Guo; Z. Luo; N. Bouguila; W. Fan; |
| 151 | Compound-QA: A Benchmark for Evaluating LLMs on Compound Questions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Compound Question Synthesis (CQ-Syn) to build Compound-QA, a benchmark targeting questions composed of multiple interrelated sub-questions. |
Y. Hou; |
| 152 | Unlocking The Potential of Social Media Preference for Annotation-Efficient Large Language Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing Social Media Preference (SMP) datasets show a significant performance gap compared to benchmark preference datasets. To bridge this gap, we introduce FIR, an analytical framework that dissects SMP dataset into three core components: Feedback, Instruction, and Response pairs. |
W. Zhang; W. Che; |
| 153 | Image-Pixel Realignment for Open-Vocabulary Semantic Segmentation Via Self-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce STSeg, a novel framework that integrates pixel-level semantic alignment with adaptive self-training. |
A. Yang; Q. Liu; Y. Fan; Q. Zhou; |
| 154 | Clustering of Multisource Remote Sensing Data Via Low-Rank Tensor Learning with Spatial Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing clustering methods often struggle with limited spatial modeling, weak cross-source consistency, and poor scalability. To tackle these challenges, this paper proposes an innovative method called Clustering of Multisource Remote Sensing Data via Low-Rank Tensor Learning with Spatial Constraints (LRTSC). |
Z. Cao; |
| 155 | EATS2: Enabling Efficient and Accurate Trajectory Similarity Computation Via Self-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, real-world datasets often suffer from sparsity in urban areas, further limiting the availability of sufficiently similar pairs to build robust training sets. To address these challenges, we propose EATS2, an Efficient and Accurate Trajectory Similarity Computation Framework via Self-training. |
Z. Cao; |
| 156 | DyWPE: Signal-Aware Dynamic Wavelet Positional Encoding for Time Series Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Dynamic Wavelet Positional Encoding (DyWPE), a novel signal-aware framework that generates positional embeddings directly from input time series using the Discrete Wavelet Transform (DWT). |
H. Irani; V. Metsis; |
| 157 | Dual-Path Compression for Real-time Multimodal Clickbait Detection: Quantization and Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a dual-path compression framework for multimodal clickbait detection, evaluated on a new fine-grained Chinese dataset (15,012 samples) that distinguishes three deception mechanisms: non-clickbait, curiosity gap, and content mismatch. |
H. Song; |
| 158 | Generative Spatiotemporal Modeling for Uncertainty Quantification in High-Dimensional Physical Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Traditional deterministic models fail to capture this, collapsing to blurry, physically implausible mean-state predictions and offering no measure of confidence. We introduce Prism, a generative spatiotemporal framework that directly addresses this by learning the probability distribution of future states. |
F. Liu; |
| 159 | Depth-Guided Metric-Aware Temporal Consistency for Monocular Video Human Mesh Recovery Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a comprehensive depth-guided framework that achieves metric-aware temporal consistency through three synergistic components: A Depth-Guided Multi-Scale Fusion module that adaptively integrates geometric priors with RGB features via confidence-aware gating; A Depth-guided Metric-Aware Pose and Shape (D-MAPS) estimator that leverages depth-calibrated bone statistics for scale-consistent initialization; A Motion-Depth Aligned Refinement (MoDAR) module that enforces temporal coherence through cross-modal attention between motion dynamics and geometric cues. |
J. Cen; |
| 160 | HMD: Enhancing Vision Transformer Distillation Via Mask Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: ViTs inherently excel at mask reconstruction due to their flexible input processing and global attention mechanisms. To leverage this capability, we propose Hierarchical Mask Distillation (HMD), a novel distillation framework that integrates a mask reconstruction objective. |
X. Shi; K. Pu; J. Yan; B. Zheng; Z. Cheng; |
| 161 | Deformable Attention Graph Representation Learning for Histopathology Whole Slide Image Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a novel GNN framework with deformable attention for pathology image analysis. |
M. Fu; |
| 162 | CSFusion: Flexible Multi-Modal Image Fusion Via Content-Style Cross Modulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose CSFusion, a novel model framing fusion as a cross-modal bidirectional image style transfer problem. |
Y. Song; X. Ma; X. Cai; L. Duan; X. Xu; S. Wan; |
| 163 | TimeDiff: Leveraging Differential Domain Representations for Long Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, in many real-world scenarios, the detailed variations in time series (i.e., its differences) are critical for decision-making. To address this gap, we propose TimeDiff, a novel framework for long time series forecasting that enhances predictive accuracy by modeling in the differential domain. |
Y. Tao; |
| 164 | Multimodal Palpation Sensing for Precise Prediction of Breast Lump Hardness and Size Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce PalpationDataset, a dataset with detailed characterization of lump size and hardness, collected from controlled silicone phantoms and heterogeneous porcine tissue using force–motion signals. |
Y. Zang; |
| 165 | Summary of The Inaugural Music Source Restoration Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present the inaugural MSR Challenge, featuring objective evaluation using Multi-Mel-SNR, Zimtohrli, and FAD-CLAP on studio-produced mixtures, alongside subjective evaluation on real-world degraded recordings. |
Y. Zang; |
| 166 | DISCERN: Discrepancy Learning for Weakly Supervised Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, their performance often degrades in medical imaging due to insufficient consideration of medical characteristics, such as distributional discrepancies, ambiguous boundaries, and structural interference. To address these issues, we propose an innovative discrepancy learning model, DISCERN, which harnesses distribution discrepancies to enhance the localization of medical regions of interest. |
G. Su; |
| 167 | CVaR-Aware Network Slicing for Tail Latency Under Tiered Deadlines Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a conditional value-at-risk (CVaR)-aware network slicing framework that provides end-to-end resource isolation and explicitly optimizes the tail of the delay distribution while enforcing hard-deadline reliability targets. |
S. Niu; Q. Peng; Z. He; |
| 168 | TRM-UNet: An Efficient Event-Guided Motion Deblurring Network Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present TRM-UNet, a compact yet powerful U-shaped architecture for event-based deblurring. |
D. Fan; X. Tang; Q. Chen; F. Xu; |
| 169 | Towards Multi-View Hierarchical Video-to-Piano Generation with MIDI Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a hierarchical V2P framework that introduces MIDI as an intermediate representation, with progressive MIDI prediction (pitch, velocity, sustain) guiding waveform synthesis. |
C. Liu; Z. Chen; G. Chen; C. Ding; N. Sebe; |
| 170 | Dual-Space Knowledge Distillation with Key-Query Matching for Large Language Models with Vocabulary Mismatch Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we systematically analyse the attention mechanism of DSKD-CMA through manual token alignment probing and heatmap visualisations, revealing both strengths and limitations. |
S. E. Tsiapali; C. -T. Do; K. Knill; |
| 171 | Endocaver: Handling Fog, Blur and Glare in Endoscopic Images Via Joint Deblurring-Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose EndoCaver, a lightweight transformer with a unidirectional-guided dual-decoder architecture, enabling joint multi-task capability for image deblurring and segmentation while significantly reducing computational complexity and model parameters. |
Z. Wu; |
| 172 | IRIS: Low-Complexity High-Efficiency Neural Network Codec for Real-Time Audio Transmission Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces IRIS (Internet Real-time Intelligent Streaming Codec): an end-to-end, low-complexity, low-latency neural audio codec. |
Z. Wu; |
| 173 | Intrinsic Semantic Consistency Enhancement for Robust Hierarchical Understanding in VLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While effective, our work reveals that this approach often disrupts generalization on coarse-grained concepts and fails to correct the models’ inherent pre-training biases. To address this, we introduce the Intrinsic Semantic Consistency Enhancement (InCoe) framework. |
Z. Wu; |
| 174 | Diffusion-Based Natural Adversarial Perturbations Towards Segment Anything Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although different adversarial attacks targeting SAM have been proposed, they have not paid sufficient attention to the stealthiness of malicious images crafted by attackers. In this paper, we introduce a diffusion-based approach that generates natural adversarial samples targeting SAM, such that the perturbed images remain imperceptibly natural to human observers while leading to incorrect segmentation. |
H. Xiao; |
| 175 | TEAMo: Trait and Emotion Aware Motion Generation in 3D Human Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite recent advances, existing approaches are largely confined to simplistic categorical style descriptors, failing to capture continuous personality traits and thus compromising emotional richness and psychological realism. To bridge this gap, we propose the Trait and Emotion Aware Motion generation framework (TEAMo), a psychologically grounded approach that explicitly integrates personality traits into the motion synthesis pipeline. |
B. Tang; D. Zhu; S. -G. Kuai; C. -L. Deng; |
| 176 | Int-MeanFlow: Few-Step Speech Generation with Integral Velocity Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, its direct application to TTS encounters challenges, including GPU memory overhead from Jacobian-vector products (JVP) and training instability due to self-bootstrap processes. To address these issues, we introduce IntMeanFlow, a framework for few-step speech generation with integral velocity distillation. |
W. Wang; R. Cao; Y. Guo; Z. Chen; K. Chen; Y. Huo; |
| 177 | Data-Driven Algorithms for Robust or Selective CFAR Detection in Colored Gaussian Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce two novel, low-complexity, learning-based CFAR detectors that leverage maximal invariant statistics as input features: (i) a lightweight neural net-work with residual encoder blocks, and (ii) a modified single-layer network with random features. |
A. Coluccia; E. Mele; A. Fascista; |
| 178 | ConfMamba-SAM: Structured State Space Modeling with Memory-Augmented Prompting for Automatic Brain Lesion Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Accurate and consistent brain lesion segmentation from clinical MRI and CT volumes remains challenging due to microscale lesions, low contrast, anisotropic resolution, and interslice discontinuities. To address these issues, we propose ConfMamba-SAM, an end-to-end, fully automatic segmentation framework that leverages a frozen foundation model backbone with lightweight, trainable adapters for efficient adaptation. |
Z. Cheng; |
| 179 | Grassmannian Kernel Framework for Site Effect Correction in Multi-Site FMRI Studies Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: Functional magnetic resonance imaging (fMRI) is widely used in neurobiology studies to investigate brain networks, and large-scale fMRI studies increasingly rely on data pooled … |
W. Wang; |
| 180 | Covariance Filters and Neural Networks Over Hilbert Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we take a first step by introducing a novel convolutional learning framework for signals defined over infinite-dimensional Hilbert spaces, centered on the (empirical) covariance operator. |
C. Battiloro; A. Cavallo; E. Isufi; |
| 181 | RLBR: Reinforcement Learning with Biasing Rewards for Contextual Speech Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents a novel fine-tuning method, Reinforcement Learning with Biasing Rewards (RLBR), which employs a specialized biasing words preferred reward to explicitly emphasize biasing words in the reward calculation. |
B. Ren; R. Fan; Y. Shen; W. Chen; J. Li; |
| 182 | Autoregressive-Gaussian Mixture Models: Efficient Generative Modeling of WSS Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a generative model that integrates Autoregressive (AR) parameterization into a Gaussian Mixture Model (GMM) for modeling Wide-Sense Stationary (WSS) processes. |
K. Klein; B. Böck; N. Turan; W. Utschick; |
| 183 | Precoder Design in Multi-User FDD Systems with VQ-VAE and GNN Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, by utilizing a vector quantized-variational autoencoder (VQ-VAE), we circumvent one of the key drawbacks of GMMs, i.e., the number of GMM components scales exponentially to the feedback bits. |
S. Allaparapu; M. Baur; B. Böck; M. Joham; W. Utschick; |
| 184 | Multi-User Channel Estimation With One-Bit ADCS: A Semi-Blind Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This comes with a high computational cost for large antenna arrays and many users. This article tackles this problem by deriving an approximation of the well-known arcsine law to formulate the multi-user channel estimation problem from a per-user perspective, as it is standard practice in full-resolution systems. |
F. Weißer; W. Utschick; |
| 185 | Superpixel-Informed Continuous Low-Rank Tensor Representation for Multi-Dimensional Data Recovery Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, classical LRTR methods face two critical limitations: (1) they assume holistic data is low-rank, which is often violated in real-world scenarios with significant spatial variations; and (2) they are constrained to discrete meshgrid data, limiting flexibility. To overcome these limitations, we propose a Superpixel-informed Continuous Low-Rank Tensor Representation (SCTR) framework. |
Z. Wang; J. Wang; R. Zheng; Z. Wu; |
| 186 | RAVE: Retrieval and Scoring Aware Verifiable Claim Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present RAVE (Retrieval and Scoring Aware Verifiable Claim Detection), a framework that combines evidence retrieval with structured signals of relevance and source credibility. |
Y. Li; A. Zubiaga; |
| 187 | Securing INR-Based Steganography with Quantum Circuit-Driven Weight Initialization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing methods rely on low-entropy seeds to locate hidden information, leaving them vulnerable to brute-force attacks. To address this, we propose a novel parameterized quantum circuit-based initialization scheme. |
Q. Song; H. Han; Z. Luo; J. Qi; R. Wan; |
| 188 | Anchor Field Consistency for Imperceptible Adversarial Attacks on 3D Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We attribute this to nearest-neighbor correspondence sensitivity when comparing clean and adversarial shapes. To address this, we propose Anchor Field Consistency (AFC), which evaluates the clean and adversarial shapes at the same anchors. |
K. Tang; Z. Cao; W. Peng; X. Wang; P. Zhu; Z. Tian; |
| 189 | DFFNet: Combining Similar and Different Dual Feature Flows to Achieve Multiple Weather Removal Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose DFFNet, a dual-stream network for all-in-one weather image restoration. |
S. Liu; K. Zuo; W. Xu; H. Xiao; |
| 190 | DELNet: Continuous All-in-one Weather Removal Via Dynamic Expert Library Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose DELNet, a continual learning framework for weather image restoration. |
S. Liu; K. Zuo; H. Xiao; |
| 191 | Multi-Modal Fake News Detection Via Intra-Calibrated Cross-Modal Fusion and Modality-Wise Attention Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address these, we propose an Intra-calibrated Cross-modal Fusion and Modality-wise Attention Aggregation (ICFMA) method. |
L. Zhao; H. Wei; Y. Wang; C. He; |
| 192 | FEDCADS: Robust Federated Learning Via Dual Distillation and Participation-Aware Optimization Under Non-IID Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, it still faces challenges related to incomplete utilization of the global model and errors induced by partial client participation. To address these challenges, we propose a novel FL paradigm, named FedCADS, which uses a dynamic dual distillation mechanism to effectively utilize the global model to guide local model training, achieving a multi-level client drift reduction. |
J. Lai; D. Li; F. Zhang; R. Wang; J. Hu; H. Cheng; |
| 193 | FW-VTON: Flattening-and-Warping for Person-to-Person Virtual Try-On Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present FW-VTON, a three-stage framework: (1) garment flattening to reconstruct a canonical, pose-agnostic garment from the source; (2) garment warping to align the flattened garment with the target pose; and (3) seamless integration onto the target person. |
Z. Wang; |
| 194 | ICPO: Illocution-Calibrated Policy Optimization for Multi-Turn Conversation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We find that standard post-training techniques like Reinforcement Learning with Verifiable Rewards (RLVR) exacerbate this issue by rewarding confident, direct answers, thereby inducing overconfidence and discouraging the model from seeking clarification. To address this, we propose Illocution-Calibrated Policy Optimization (ICPO), a novel training framework that sensitizes the model to instruction ambiguity. |
Z. Wang; |
| 195 | Decomposing Multilingual Representations: How Scale, Architecture, and Data Shape Functional Specialization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces a framework to dissect their internal representations, revealing a phenomenon we term Functional Specialization: the emergence of distinct neural circuits for language-specific form versus language-agnostic semantics. |
Z. Wang; |
| 196 | CodEOE: A Benchmark for Jointly Extracting Cross-Document Events and Opinions From Social Media Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Event extraction and opinion/sentiment analysis have been extensively studied within recent decades, but their joint research remains an under-explored area. To bridge this gap, we introduce a challenge Cross-Document Event-Opinion Extraction (CodEOE) task, which requires a model extracting event triggers and arguments as well as their associated opinions or sentiments by understanding cross-document long contexts. |
Z. Wang; |
| 197 | CTR-LoRA: Curvature-Aware and Trust-Region Guided Low-Rank Adaptation for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce CTR-LoRA, a framework guided by curvature trust region that integrates rank scheduling with stability-aware optimization. |
Z. Wang; |
| 198 | Conjugate Relation Modeling For Few-Shot Knowledge Graph Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods, however, struggle to capture complex relational patterns and mitigate data sparsity. To address these challenges, we propose a novel FKGC framework for conjugate relation modeling (CR-FKGC). |
Z. Wang; Q. Zeng; H. Duan; C. Cheng; M. Zou; Z. Wang; |
| 199 | CGNN+: A Graph Neural Instrumental Variable Framework for Robust Causal Inference in Networked Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose CgNN+, a graph neural instrumental variable framework that treats the network topology as a structured source of instrumental variation. |
X. Du; |
| 200 | Hi-Former: A Hierarchical Transformer Pedestrian-Vehicle Detector Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Traffic scene perception poses unique challenges due to occlusion, scale variation, and dense interactions among pedestrians and vehicles. To tackle these issues, we propose Hi-Former, a traffic-oriented transformer detector equipped with the Hierarchical Feature Interaction (HiFI) module, which leverages Channel-grouped Multi-head Attention (CHMA) with local positional encoding to reduce redundancy and enhance hierarchical feature interaction. |
B. Fang; W. Liu; H. Liu; F. Yan; T. Deng; |
| 201 | Physics-Driven 3D Gaussian Rendering for Zero-Shot MRI Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a zero-shot MRI SR framework using explicit Gaussian representation to balance data requirements and efficiency. |
S. Liu; L. Zhang; W. Huang; Z. Zhang; Z. Wang; |
| 202 | FABEM: Frequency-Aware Boundary Enhancement Module for Small Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While DETR-based models have shown strong performance in terms of training efficiency and inference capability, these models struggle with preserving high-frequency boundary details during the downsampling process, leading to degraded boundary details essential for small object detection. To address this limitation, this paper introduces a frequency-aware boundary feature enhancement module (FABEM) tailored for DETR-like detection frameworks. |
S. Liu; |
| 203 | Dualguard: Two-Stage Alignment Preservation for Safe PEFT Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose DualGuard, a holistic two-stage framework that ensures both behavioral stability and parameter integrity. |
S. Liu; |
| 204 | A Game-Theoretic Approach for Distributed MEC-Enabled Collaborative Inference in AIGC Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Specifically, we explore a distributed MEC-enabled AIGC network with heterogeneous GDMs, where multiple MUs each hold a local GDM and an ES hosts multiple GDMs. |
L. Ye; Z. Xiong; L. Gao; D. Niyato; |
| 205 | ReTools: Reflection-Enhanced Tool Invocation for Domain-Specific QA Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing approaches such as ReAct and RestGPT partially mitigate these issues yet remain limited in handling multi-step reliability, iterative recovery, and domain robustness. To address these gaps, we propose ReTools, a Tree-of-Thoughts (ToT) based framework that integrates three modules: (1) task planning, which decomposes complex queries into executable subtasks; (2) tool planning, which selects tools and generates accurate parameters while supporting reflective correction; and (3) reflective iteration, which monitors execution results and adapts to domain-specific requirements. |
F. Dong; |
| 206 | Robust Unsupervised Set-Level Anomaly Detection for Small Test-Time Sets Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a method based on a PoolModel autoencoder, introducing an outputset oversampling (enlarging the decoder’s output set) to reduce score variance for small sets. |
R. Nakai; |
| 207 | PGSENet: Prior-Guided Spectrum Enhancement Network Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This disrupts the continuity of high- and low-frequency information along certain directions in the spectrum, which often leads to insufficient detail recovery. To address this issue, we propose a Prior-Guided Spectrum Enhancement Network (PGSENet). |
T. Mei; Y. Hu; L. Chen; Y. Fang; Q. Lin; Y. Wu; |
| 208 | Asymmetric Region Denoising and Rotation Equivariant for Image Reflection Symmetry Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Asymmetric regions act as background clutter that disrupts symmetric pattern matching, while convolutional neural networks fail to preserve consistent transformations for symmetric features under image variations. To address these issues, we propose Asymmetric Region Denoising Module (ARD) and Rotation Equivariant Feature Similarity Matching (REFSM) that effectively suppress asymmetric interference and extract refined symmetric patterns. |
D. Yin; R. Su; C. Zhao; F. Yu; |
| 209 | Inverse Halftoning Via Weighted Sobel Conditioned Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes DiffSo, a weighted Sobel-conditioned diffusion model for high-fidelity inverse halftoning. |
S. Shen; J. Yao; D. Zhang; K. Tang; D. Zhao; Z. Gu; |
| 210 | Evaluating Compositional Structure in Audio Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a benchmark for evaluating compositionality in audio representations. |
C. Chen; B. Steers; B. McFee; J. Bello; |
| 211 | Multi-Scale Generative Modeling for Fast Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose a multi-scale generative modeling in the wavelet domain that employs distinct strategies for handling low and high-frequency bands. |
X. Xiao; |
| 212 | Surgical-Clip: A Dual-Branch Temporal Clip for Surgical Video Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods mainly rely on a single temporal scale, which fails to capture both aspects. To overcome this limitation, we present Surgical-CLIP, a dual-branch extension of CLIP that learns complementary long- and short-horizon representations from surgical video. |
M. He; M. Zhang; W. Yuan; |
| 213 | Full-to-Missing Modality Knowledge Distillation for Mulitmodal 3D Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, most studies assume complete multimodal inputs, ignoring that practical factors, i.e., camera failures, can cause missing modality and severely degrade the performance of models. To tackle this problem, we propose a novel Full-to-Missing Modality Knowledge Distillation framework (FMKD) for multimodal 3D semantic segmentation under missing conditions. |
X. Cai; X. Ma; Y. Song; L. Duan; X. Xu; S. Wan; |
| 214 | Deep Spatio-Temporal Models for Decoding Purkinje Cell Activity in Tongue Movements Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we investigate whether spike activity from Purkinje cells can be used to classify targeted licking behavior in mice. |
M. Zeeshan; L. Bina; L. W. J. Bosman; C. I. De Zeeuw; M. A. Siddiqi; M. Taj; |
| 215 | Staged Diffusion with Hybrid Mixture-of-Experts (MOE) for Multimodal Sentiment Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Humans resolve such ambiguity through a learning-before-correction strategy: first aligning facial expressions, vocal tone, and speech, then using this knowledge to infer or correct meanings. To mimic this process, we propose SDHM (Staged Diffusion with Hybrid Mixture-of-Experts), a two-stage framework. |
K. Zheng; G. Sheng; |
| 216 | FS-LoRA: Fast and Slow Low-Rank Adaptation for Class Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a new LoRA-based CIL method, called Fast and Slow Low-Rank Adaptation (FS-LoRA). |
Y. Hou; X. Tong; K. Mu; |
| 217 | HREI: Hybrid Long-Short Retrieval and Efficient Inference for Knowledge Base Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose HREI, a novel low-resource KBQA framework that enhances retrieval accuracy and reasoning efficiency. |
S. Liu; X. Su; J. Li; Z. Duo; G. Gao; |
| 218 | Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annotation for Human-Like Speech Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing corpora rarely provide large-scale, word-level annotations of such cues, leaving them overlooked in conventional ASR and TTS research. Therefore, we present Emilia-NV, the first large-scale Mandarin corpus with word-level annotations for both lexical content and 18 paralinguistic vocalizations. |
H. Liao; |
| 219 | Accelerating Federated Learning Through Dropout of Renewable Neuron Parameters Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a novel Federated Dropout framework, Gecko, which lowers communication by dropping part of the model parameters and approximately reconstructs the full model after transmission, thus balancing efficiency and accuracy.The core of Gecko is a low-rank decomposition method with minimal information loss: critical parameters are fully kept in a retained matrix, while secondary parameters are reconstructed via their linear relations with the preserved ones. |
H. Liao; |
| 220 | Decorrelation-Enhanced Multiband Subband Adaptive Filtering for RIR Tracking in Sound Field Control Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a multiband structured subband adaptive filtering approach to RIR tracking, which effectively reduces the impact of input signal correlation due to colored excitation. |
J. Zhang; J. Xie; D. Shi; W. Zhang; J. Chen; J. Benesty; |
| 221 | Dual-Graph: Protocol Interaction-Aware Flow Representation for Accurate Unidirectional Encrypted Traffic Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose Dual-Graph, a protocol interaction-aware representation framework for accurate unidirectional ETC. |
Z. Gu; |
| 222 | Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, cross-style synthesis combining both dialect and emotion remains challenging and largely unexplored, mainly due to the scarcity of dialectal data with emotional labels. To address this, we propose Hierarchical Expressive Vector (HE-Vector), a two-stage method for Emotional Dialectal TTS. |
P. Feng; |
| 223 | Large Vision Models Can Solve Mental Rotation Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present a systematic evaluation of ViT, CLIP, DINOv2, and DINOv3 across a range of mental-rotation tasks, from simple block structures similar to those used by Shepard and Metzler to study human cognition, to more complex block figures, three types of text, and photo-realistic objects. |
S. R. Mason; A. Gjølbye; P. C. Højbjerg; L. Tětková; L. K. Hansen; |
| 224 | GRNet: Graph Reconstruction Network for Robust Multimodal Sentiment Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, in real-world scenarios, missing modalities pose significant challenges. To address this, we propose a novel framework, Graph Reconstruction Network (GR-Net), which leverages temporal and neighbor alignment relationships in multimodal data to reconstruct missing information. |
Z. Xu; L. Tian; P. Zhang; X. Peng; H. Yao; |
| 225 | RGSC: Retrieve and Then Generate Image-Text Pairs from Semantic Concepts for Unsupervised Vision-Language Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing UVLP approaches are mainly generation-based or retrieval-based: the former produces well-aligned but overly simplistic pairs, while the latter provides richer samples but suffers from weak alignment. To tackle these problems, we propose a method to Retrieve and then Generate image-text pairs from Semantic Concepts (RGSC). |
Z. Xu; W. Zhao; S. Ji; P. Zhang; K. Zhang; H. Yao; |
| 226 | Deep Dubbing: End-to-End Auto-Audiobook System with Text-to-Timbre and Context-Aware Instruct-TTS Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While TTS boosts efficiency, it struggles with emotional expression, intonation control, and contextual scene adaptation. To address these challenges, we propose DeepDubbing, an end-to-end automated system for multi-participant audiobook production. |
Z. Dai; |
| 227 | Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This limitation arises from three factors: (1) existing commonly used speech encoders, like the Whisper family, underperform in low-resource languages and lack support for broader spoken language understanding tasks; (2) the ASR-based alignment paradigm requires training the entire SLLM, leading to high computational cost; (3) paired speech–text data in low-resource languages is scarce. To overcome these challenges in the low-resource language Thai, we introduce XLSR-Thai, the first self-supervised learning (SSL) speech encoder for Thai. |
M. Shao; |
| 228 | Inverse Rendering for High-Genus 3D Surface Meshes from Multi-View Images with Persistent Homology Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Reconstructing 3D objects from images is inherently an ill-posed problem due to ambiguities in geometry, appearance, and topology. This paper introduces collaborative inverse rendering with persistent homology priors, a novel strategy that leverages topological constraints to resolve these ambiguities. |
X. Gao; |
| 229 | From Past To Future: Leveraging Event Causality For Explainable Prediction With Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduces CAPE (Causal-Aware event Prediction with Explanation), a comprehensive framework that uses natural-language events related to the target entity for open-ended prediction and generates causal explanations. |
X. Gao; |
| 230 | Meta-Reinforcement Learning with Contextual Bias Reduction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, due to the uneven distribution of these historical data in different tasks, the context variable tends to memorize trajectory features from training rather than capturing essential task information, introducing bias and impairing the agent’s generalization ability. In response to this challenge, we examine this problem from a causal perspective and identify that contextual bias arises from the indirect effect via the indirect path from task to context variable. |
S. Lan; |
| 231 | SafeTR: Verifiable Semantic Tree-Ring Watermark for Diffusion Model Against Forgery Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, current methods are vulnerable to semantic-forgery attacks that exploit their static, global secret keys to transplant or reuse watermarked latents, breaking the assumption that detection implies authenticity. We propose SafeTR, a hardened framework that counters these attacks by replacing the static key with a dynamic, per-image index embedded in the latent space. |
J. Xie; P. Yu; J. Fei; X. Zhou; Z. Xia; |
| 232 | Deepfake-HMDE: Hierarchical Mixture of Deepfake Experts For Deepfake Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the straightforward application of MLLMs faces two key issues: (i) data heterogeneity among various deepfake methods, and (ii) insufficient robustness for different deepfake methods. To address these issues, we propose a hierarchical mixture-of-experts framework tailored for deepfake detection, i.e, Deepfake-HMDE. |
Z. Ren; J. Zhang; X. Feng; Y. Li; C. Chen; |
| 233 | NMGE: Nested Multi-Granularity Expert Groups for Complexity-Aware Routing in Multilingual Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Nested Multi-Granularity Expert Groups (NMGE), a novel MoE architecture where experts are organized into groups of varying sizes in a nested structure. |
L. Shao; |
| 234 | Progressively Injecting Structural Semantics from The Frequency Domain Into Mamba for Accurate Curvilinear Structure Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While Mamba effectively models global dependencies in curvilinear structures, its sequence-based state-space modeling introduces the issue of structural fragmentation. To address this, we propose High-Frequency Refinement VMamba (HR-VMamba), a method that progressively injects structural semantics into Mamba to refine its representation. |
W. Cai; |
| 235 | PLA-Loss: Potential Label-Aware Training for Top-K Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing training objectives that optimize for top-k accuracy are often semantically agnostic and tend to degrade top-1 performance. We introduce the Potential Label-Aware Loss (PLA-Loss) to address this trade-off. |
K. Wang; S. Jia; T. Ma; B. Cao; |
| 236 | Decision Fusedconv: Efficient Offline Reinforcement Learning Via Fused State-Reward Encoding and Hybrid Temporal Convolution Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Decision Transformer (DT) represents return-to-go, state, and action as independent tokens, resulting in inflated sequence length and quadratic attention cost. To address this inefficiency, we propose Decision FusedConv (DFC), which jointly encodes return and state to shorten sequences and employs a gated hybrid convolutional module that integrates global uniform and local heterogeneous convolutions. |
Z. Tian; |
| 237 | Domain-Adversarial Eat With Lora Fine-Tuning For ESDD 2026 Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Aiming at the generalization issue of detection models for unseen synthetic audio, we propose a solution combining LoRa fine-tuning, domain adversarial training, MoE(Mixture of Experts), and ArcFace loss. |
F. Wei; |
| 238 | Pianoroll-Event: A Novel Score Representation for Symbolic Music Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Discrete-event representations achieve compact encoding but fail to adequately capture structural invariance and spatial locality. To address these complementary limitations, we propose Pianoroll-Event, a novel encoding scheme that describes pianoroll representations through events, combining structural properties with encoding efficiency while maintaining temporal dependencies and local spatial patterns. |
L. Qian; H. Gu; D. Li; B. Cao; Q. Liu; |
| 239 | Deopt: Synergizing Large Language Models and Differential Evolution for Join Order Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose DEOpt, a novel framework synergizing large language models (LLMs) with differential evolution (DE). |
Q. Zhang; J. Yang; Y. Wu; X. Xu; S. Zhang; Z. Ding; |
| 240 | Marco-Voice: A Unified Framework for Expressive Speech Synthesis with Voice Cloning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The goal of this work is to address longstanding challenges in achieving highly expressive, controllable, and natural speech generation that faithfully preserves speaker identity across diverse linguistic and emotional contexts. |
F. Tian; |
| 241 | Demystifying The Roles of LLM Layers in Retrieval, Knowledge, and Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present a systematic study of depth utilization across diverse dimensions, including evaluation protocols, task categories, and model architectures. |
X. Song; K. Wang; P. Li; L. Yin; S. Liu; |
| 242 | WaveSP-Net: Learnable Wavelet-Domain Sparse Prompt Tuning for Speech Deepfake Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, this approach is not parameter-efficient and may lead to suboptimal generalization to realistic, in-the-wild data types. To address these limitations, we introduce a new family of parameter-efficient front-ends that fuse prompt-tuning with classical signal processing transforms. |
X. Xuan; X. Liu; W. Zhang; Y. -C. Lin; X. Lin; T. Kinnunen; |
| 243 | GenFacts-Generative Counterfactual Explanations for Multi-Variate Time Series Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce GenFacts, a generative framework based on a class-discriminative variational autoencoder. |
S. Seifi; A. Ibrahimi; T. Sukianto; C. Carbonelli; L. Servadei; R. Wille; |
| 244 | Super-Resolved Quantum Sensing: A Hardware-Algorithm Co-Design for Complex Bioimaging Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we develop an end-to-end hardware-algorithm co-design that achieves robust, high-resolution fluorescence imaging in diverse biological environments. |
Y. Tan; Z. Chu; |
| 245 | FreeAnimate: Training-Free Human Image Animation with Preview-Guided Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce FreeAnimate, a training-free framework that leverages the inherent capabilities of image diffusion models to enable temporal consistency, identity preservation, and background stability. |
Y. Zeng; Y. Shi; Z. Lu; Q. Liao; |
| 246 | EgoPressDiff: Multimodal Video Diffusion for Egocentric UV-Domain Hand-Pressure Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present EgoPressDiff, a conditional video diffusion framework that generates UV-pressure maps from visual input. |
Y. Zeng; Z. Gao; Y. Shi; Z. Lu; W. Yang; Q. Liao; |
| 247 | Edge Collaborative Gaussian Splatting with Integrated Rendering and Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite the nonconvexity of the problem, we propose an efficient penalty majorization minimization (PMM) algorithm to obtain the critical point solution. |
Y. Wan; |
| 248 | ABP-SAM: A Sam-Based Method with Auto Bbox Prompter for Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These challenges stem from a domain gap caused by its training data and a costly reliance on manual prompts from clinicians, which complicates its practical application. To overcome this, we propose ABP-SAM, a method that integrates SAM with an Auto Bbox Prompter. |
Y. Xiong; X. Shu; H. Zhu; G. Guo; P. Wei; D. Yuan; |
| 249 | Dynamic Self-Distillation Former for Weakly Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address the issues of pseudo-label noise and overfitting in weakly supervised semantic segmentation (WSSS), this paper proposes a Dynamic Self-Distillation Former (DSDF-WSSS) to improve model stability and segmentation accuracy under a Transformer architecture. |
F. Kong; J. Lu; |
| 250 | Compositional Image Synthesis with Inference-Time Scaling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite their impressive realism, modern text-to-image models still struggle with compositionality, often failing to render accurate object counts, attributes, and spatial relations. To address this challenge, we present a training-free framework that combines an object-centric approach with self-refinement to improve layout faithfulness while preserving aesthetic quality. |
M. Ji; S. Lee; N. Ahn; |
| 251 | Principle-Guided Multimodal Reasoning with Minimal Human Demonstrations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite their impressive capabilities, MLLMs still face challenges in fine-grained tasks, such as OCR in multilingual contexts or recognizing small or occluded objects, which limits their reliability in real-world applications. To overcome these limitations, we propose PrinM, a principle-guided multimodal reasoning framework that enhances MLLMs with specialized tool experts. |
C. Ji; |
| 252 | Ister: Linear Transformer for Efficient Multivariate Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, their widespread adoption is hindered by the quadratic computational complexity of self-attention, which limits scalability on high-dimensional sequences. To address this challenge, we propose the Inverted Seasonal-Trend Decomposition Transformer (Ister), a novel architecture that enhances both predictive accuracy and computational efficiency. |
F. Cao; S. Yang; Z. Chen; Y. Liu; L. Cui; |
| 253 | MDBoost: A Multi-Dimensional Reweighting Framework for Robust Gradient Boosting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This issue leads to the catastrophically decreased performance of Boosting models, whose vulnerability stems from their intrinsic mechanism of focusing on high-error samples, failing to distinguish between genuinely hard-to-learn instances and those that are mislabeled. To address this critical gap, we propose a reweighting Boosting framework, MDBoosting, that assesses sample importance from three complementary perspectives. |
R. Zhang; G. He; J. Wang; R. Wang; Z. Wang; F. Nie; |
| 254 | Brain-HGCN: A Hyperbolic Graph Convolutional Network for Brain Functional Network Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Brain-HGCN, a geometric deep learning framework based on hyperbolic geometry, which leverages negatively curved space to model brain network hierarchy with high fidelity. |
J. Jia; |
| 255 | Geodesic Prototype Matching Via Diffusion Maps for Interpretable Fine-Grained Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This mismatch is particularly detrimental to prototype-based interpretable fine-grained recognition, where even subtle semantic distinctions are crucial. To mitigate this issue, this work presents a novel paradigm for prototype-based recognition by grounding similarity in the intrinsic geometry of deep features. |
J. Jia; |
| 256 | VISA: Virtual Identity for Secure Face Anonymization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose VISA (Virtual Identity for Secure face Anonymization), a novel framework for generating anonymized yet identifiable faces. |
X. Zeng; X. Hu; S. Li; X. Zhang; Z. Qian; |
| 257 | Frequency-Guided Multi-Level Reasoning for Scene Graph Generation in Video Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes the Frequency-guided Relational Multi-level Reasoning (FReMuRe) model, which enhances the modeling ability of long-tail relationships from a mechanism perspective. |
C. Li; Y. Duan; X. Tao; |
| 258 | PGFed: Prompt-Guided Distillation for Personalized Federated Learning with Model Heterogeneity Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing proxy-based methods still face a key challenge: although proxy models are homogeneous in architecture, the knowledge they contain is biased by heterogeneous client models. To address this challenge, we propose PGFed, a prompt-guided distillation MHPFL method. |
X. Yang; J. Feng; L. Zhong; L. Wang; B. Fang; Q. Liao; |
| 259 | Joint Multi-Dimensional Features and Academic Network Embedding for Author Name Disambiguation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we propose a method of author name disambiguation based on multi-dimensional feature fusion and academic network embedding. |
X. Ma; Z. Ban; |
| 260 | Lightweight and Generalizable Acoustic Scene Representations Via Contrastive Fine-Tuning and Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose ContrastASC, which learns generalizable acoustic scene representations by structuring the embedding space to preserve semantic relationships between scenes, enabling adaptation to unseen categories without retraining. |
K. Yuan; |
| 261 | Adapting Diarization-Conditioned Whisper for End-to-End Multi-Talker Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a speaker-attributed (SA) Whisper-based model for multi-talker speech recognition that combines target-speaker modeling with serialized output training (SOT). |
M. Kocour; M. Karafiat; A. Polok; D. Klement; L. Burget; J. Černocký; |
| 262 | Multi Stage Training with Dynamic Data Balancing for Multilingual Speech Recognition and Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Training large-scale multilingual speech models is often hindered by severe data imbalances across tasks, languages, and corpora. We introduce a systematic, multi-stage training framework to ad-dress this challenge. |
N. Koluguri; M. Sekoyan; N. Tadevosyan; N. Karpov; J. Balam; B. Ginsburg; |
| 263 | Moment-Based Posterior Sampling for Multi-Reference Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a Bayesian approach to the problem of multi-reference alignment – the recovery of signals from noisy, randomly shifted observations. |
A. Janson; J. Andén; |
| 264 | Deep Image Prior with L0 Gradient Regularizer for Image Smoothing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Because constructing a proper training dataset for image smoothing is challenging, we propose DIP-ℓ0, a deep image prior framework that incorporates the ℓ0 gradient regularizer. |
N. T. Tran; K. Bui; J. Xin; |
| 265 | Sparsity-Regularized Latent Diffusion Models for Radar Clutter Suppression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address the limitations imposed by ground clutter on the detection of low, slow, and small (LSS) targets, this paper proposes a novel clutter suppression method based on a conditional Latent Diffusion Model (LDM). |
Z. Guo; H. Xu; Y. Quan; |
| 266 | Multimodal Room Impulse Response Generation Through Latent Rectified Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Current methods suffer from two fundamental limitations: the scarcity of full-band RIR datasets and the inability of existing models to generate acoustically accurate responses from diverse input modalities. We present PromptReverb, a two-stage generative framework that addresses these challenges. |
A. Vosoughi; Y. Zang; Q. Yang; N. Paek; R. Leistikow; C. Xu; |
| 267 | An Effective Data Augmentation Method By Asking Questions About Scene Text Images Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a VQA-inspired data augmentation framework that strengthens OCR training through structured question-answering tasks. |
X. Yao; L. Kang; |
| 268 | Training-Free Inference-Time Scaling for Audio Source Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We prove this approach guarantees improvement over one-step inference, provide error bounds based on model smoothness and metric robustness, and establish theoretical connections to denoising diffusion bridge models. |
Y. Zang; J. Li; Q. Kong; |
| 269 | FMSP-IR: Frequency Modulation and Structure Priors for All-in-One Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite recent advances, most methods remain restricted to spatial-domain modeling, thereby limiting their effectiveness in representing frequency-domain characteristics under complex degradations. To address this limitation, we propose an integrated framework, FMSP-IR, which incorporates two core modules: the adaptive frequency decoupling and modulation module (AFDM) and the structure-aware gating module (SAGM). |
Y. Tu; T. Hu; Q. Yan; |
| 270 | Improving Sign Language Translation Via Gloss Guided Temporal and Representation Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, temporal misalignment between sign videos and glosses, as well as the modality gap between visual and semantic representations, limit the effectiveness of G2T in joint training. To address these issues, we propose Gloss-guided Temporal and Representation Alignment (GTRA), a framework comprising three components: (1) Gloss-guided Temporal Alignment, which monotonically aligns visual features with gloss order; (2) Semantic-guided Representation Alignment, which introduces auxiliary supervision to align visual representations with semantic embeddings; (3) Gloss-text Data Augmentation, which expands gloss-text parallel data to further enhance the effectiveness of G2T training. |
J. Feng; Z. Liu; T. Shi; P. Liu; F. Shang; W. Feng; |
| 271 | WTRSS: Unleashing The Power of Wavelet Transform in Radar Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the existing deep learning models are difficult to specifically handle the characteristics of radar frequency maps : the data is anisotropic with low signal-to-noise ratio (SNR). Therefore, we propose WTRSS, a RSS method inspired by wavelet transform. |
F. Chen; T. Tan; T. Li; Z. Lu; Q. Liao; |
| 272 | Planning-Oriented Adversarial Attack Against End-to-end Autonomous Driving Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these attacks lack temporal coherence and fail to misguide the planning stage, resulting in limited attack capabilities. To address this problem, we propose a planning-oriented adversarial attack (PAA), which is a temporal-aware and trajectory-guided attack framework designed to maximize future collision rates. |
H. Tan; R. Li; J. Zhang; H. Zhang; D. Shao; Z. Gu; |
| 273 | Parallax-Aware Spatial Transformer: Fusing Physics and Learning for Terahertz Near-Field Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Simulation results demonstrate PAST achieves millimeter-level and 0.06-degree accuracy, defining a new and highly efficient approach for THz near-field localization. |
Z. Zeng; C. Han; |
| 274 | MeanVC: Lightweight and Streaming Zero-Shot Voice Conversion Via Mean Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we propose MeanVC, a lightweight and streaming zero-shot VC approach. |
G. Ma; |
| 275 | The ICASSP 2026 Automatic Song Aesthetics Evaluation Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper summarizes the ICASSP 2026 Automatic Song Aesthetics Evaluation (ASAE) Challenge1, which focuses on predicting the subjective aesthetic scores of AI-generated songs. |
G. Ma; |
| 276 | VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, while most progress has focused on semantic accuracy and instruction following, the ability of SLMs to adapt their speaking style based on spoken instructions has received limited attention. We introduce Voice Style Adaptation (VSA), a new task that examines whether SLMs can modify their speaking style—such as timbre, prosody, or persona—following natural language spoken instructions. |
J. Zhan; |
| 277 | OCTIP: Compact Geography-Aware IP Embeddings for Nearest-Neighbor IP Signal Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We study Nearest-Neighbor IP Signal Retrieval, a problem that retrieves semantically related IP signals directly in the IP address space without relying on precise geographic coordinates, traffic logs, or active measurements. To address this challenge, we propose OCTIP, a compact OCTet-level IP encoder for IPv4, together with IPSPRE, a reproducible framework for geography-preserving evaluation. |
H. Feng; C. Wang; F. Niu; |
| 278 | PromptSID: A Self-Iterative Distillation Framework For Unsupervised Adaptation Of Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose PromptSID, a self-iterative distillation framework for unsupervised adaptation of VLMs. |
Y. Lin; X. Zhuang; J. Zhang; C. Li; Z. Huang; Y. Zou; |
| 279 | Self-Supervised Depth Map Super-Resolution Via Spectral-Bias-Aware Kolmogorov-Arnold Network Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose Spectral-bias-Aware KAN (SA-KAN), a novel self-supervised framework that resolves the inherent spectral bias in KAN for DSR. |
R. Jian; Y. Lyu; Y. Dai; |
| 280 | Continual Neural Network Retrieval for Ever-Expanding Model Zoo Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Given a library of pre-trained deep learning models, it is hard to find models appropriate to a task with a specific query dataset. |
Z. Shang; Y. Liu; E. Liu; A. Argyriou; H. Li; X. Gu; |
| 281 | GSTNET: A Geospatial-Temporal Graph Network for Group Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here, we propose the Geospatial-temporal Graph Network (GstNet). |
P. Hu; J. Li; F. Hong; Y. Peng; J. Wu; R. Hu; |
| 282 | SVPO: A LLM Reinforcement Learning Method Based on Stepwise Value Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Methods like SPO and VAPO, which provide more granular supervision signals, introduce a significant amount of additional computational overhead. To address these limitations, we propose Stepwise Value Policy Optimization (SVPO), an efficient Reinforcement Learning (RL) algorithm based on step-level value estimation. |
Z. Zeng; Z. Ding; B. Zhang; M. Wan; C. Jiang; N. Ding; |
| 283 | Jointly Conditioned Diffusion Model for Multi-View Pose-Guided Person Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present jointly conditioned diffusion model (JCDM), a jointly conditioned diffusion framework that exploits multi-view priors. |
C. Xie; |
| 284 | 2I-Instruct: Generative Joint Empathy Detection and Empathy Intent Classification Via Inter-Task and Inter-Instance Interactions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the differing label sets of the two tasks prevent them from sharing the same decoder, limiting knowledge sharing during decoding. A generative method can fundamentally address this issue. |
L. Jiang; D. Wu; Z. Li; Y. Li; H. Huang; |
| 285 | Scattering Mechanism-Aware Deep Learning Framework for Polarimetric SAR Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Traditional methods, relying on predefined bases or linear assumptions, often suffer from model complexity and rank deficiencies, while purely data-driven deep learning approaches, despite strong nonlinear fitting ability, lack physical interpretability. To address this, we propose a scattering mechanism-aware deep learning framework for PolSAR decomposition. |
S. Zhang; D. Zhuang; L. Zhang; B. Zou; |
| 286 | IDEAVATAR: Identity-Preserving Avatar Generation with Controllable Emotions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a novel 3D human generation framework conditioned on multimodal image and text inputs, enabling precise identity preservation and expressive facial control. |
T. Yuan; |
| 287 | Self-Supervised Learning with Efficient On-Device Training For Intra-Patient Cardiac Arrhythmia Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In cardiac arrhythmia detection, labeled electrocardiography (ECG) data is limited especially at the individual level. To address this, we propose a subject-dependent VICReg self-supervised learning framework with on-device training. |
Z. Zhong; C. Park; J. Gu; |
| 288 | Progressive Refinement Training for Low-Resource Neural Speech Coding and Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel Progressive Refinement (PR) strategy by leveraging a three-stage collaborative training framework. |
R. Hu; L. Yang; Y. Xu; Q. Hu; J. Lu; |
| 289 | CLG-MSTS: Contrastive Learning-Guided Multi-Scale Temporal-Spatial Network for Cross-Subject Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose a novel method called Contrastive Learning-Guided Multi-Scale Temporal-Spatial Network (CLG-MSTS) to learn subject-invariant representations. |
C. Li; J. Xin; Q. Shen; B. T. Dai; X. Liu; Z. Wang; |
| 290 | Ro-Bench: Large-Scale Robustness Evaluation of MLLMs with Text-Driven Counterfactual Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce Ro-Bench, the first benchmark for evaluating MLLMs on dynamic out-of-distribution (OOD) counterfactual video test sets. |
Z. Yang; J. Li; M. Diao; Y. Jing; K. Liang; |
| 291 | Multi-Stream Music Transformer for Multi-Dimension Automatic Song Aesthetics Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a unified multi-stream music transformer for the ICASSP 2026 SongEval Challenge. |
X. Fan; G. Niu; |
| 292 | Quadratic Flow: Constant Acceleration As A Prior for Learning Better Velocity Field Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While recent single-step generation approaches achieve efficient sampling by modeling the average velocity field, their global uniform-velocity constraint limits generalization to complex generation tasks. To mitigate this limitation, we propose Quadratic Flow, a generative method that dynamically models sampling trajectories through average velocity fields under the assumption of constant acceleration. |
Z. Wu; B. Sun; J. He; |
| 293 | Learning Graphical Models Under Low-Rank Factor Analysis Structure Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose an MLE problem formulation for learning graphical models considering jointly the low-rank FA structure and sparsity pattern on precision matrices. |
R. Zhou; J. Ying; W. Pu; L. Zhao; W. Wang; |
| 294 | A State-Dependent Markov Diffusion Process for Generative Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a State-Dependent Markov Diffusion Process (SDMDP) with an adaptive transition rate that responds to the characteristics of input noise, thereby improving convergence and performance. |
Y. Iqbal; T. Zhang; A. Iqbal; X. Zhao; Y. Geng; |
| 295 | Scalable Bayesian Fine-Tuning of LLMs for Multi-Objective Bayesian Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As the default choice for surrogate modeling in multi-objective Bayesian optimization (MOBO), Gaussian processes (GPs) struggle with irregular high-dimensional variables and non-stationary spaces. To alleviate these challenges, we adapt the prevailing large language model (LLM) as the surrogate model given its powerful capability in feature extraction based on large-scale pre-training. |
H. Xiang; H. Zhang; Q. Lu; |
| 296 | Sentinel Model As A Try: A Dual-Model Architecture for Defending Against Data Extraction Attacks in Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, existing defense strategies are no longer sufficient to counter attacks with enhanced capabilities and have failed to resolve the inherent conflict between security and usability. Therefore, we propose the Sentinel Model As a Try (SMAT), a dual-model defense architecture. |
J. Hu; |
| 297 | AuditGPT: A Multi-Agent Framework for Enhancing Static Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Similarly, emerging Large Language Models (LLMs)-based approaches are impeded by context window limitations, restricting the holistic, multi-file analysis essential for complex vulnerability detection. To address these challenges, we introduce AuditGPT, a multi-agent framework that orchestrates collaboration between LLMs and static analysis for comprehensive vulnerability detection. |
J. Hu; |
| 298 | Toward Non-Parameterized Time Series Embedding for Efficient Forecasting: A Dynamical System Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: According to the embedding theory, dynamical systems and time series can be mutually transformed using observation functions and numerical reconstruction techniques. |
J. Hu; |
| 299 | T-Mimi: A Transformer-Based Mimi Decoder for Real-Time On-Phone TTS Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, Mimi’s decoder, which employs a hybrid transformer and convolution architecture, introduces significant latency bottlenecks on edge devices due to the the compute intensive nature of deconvolution layers which are not friendly for mobile-CPUs, such as the most representative framework XNNPACK [1]. This paper introduces T-Mimi, a novel modification of the Mimi codec decoder that replaces its convolutional components with a purely transformer-based decoder, inspired by the TS3-Codec architecture. |
H. Wu; |
| 300 | MCI-OTFusion: A Multimodal Model for MCI Detection and Cognitive Score Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose MCI-OTFusion, a multimodal framework that integrates speech and textual features to classify MCI and predict Mini-Mental State Examination (MMSE) scores. |
Y. Lin; |
| 301 | Joint Calibration and Direction-of-Arrival Estimation for Sparse Linear Arrays: Identifiability and Array Design Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we investigate the parameter identifiability problem for sparse linear arrays (SLAs) under certain stochastic assumptions. |
W. Zheng; Z. Yang; |
| 302 | Semantic-Aware Discrete Online Cross-Modal Hashing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, most existing methods rely only on discrete labels, overlooking rich semantic information, and face challenges in discrete optimization and efficient updates. To address these issues, we propose a novel supervised OCMH method, Semantic-Aware Discrete Online Cross-Modal Hashing (SADOCH). |
Z. Yao; R. Zhai; L. Wang; G. Gu; |
| 303 | MSP-ReID: Hairstyle-Robust Cloth-Changing Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, treating the head holistically without distinguishing between face and hair leads to over-reliance on volatile hairstyle cues, causing performance degradation under hairstyle changes. To address this issue, we propose the Mitigating Hairstyle Distraction and Structural Preservation (MSP) framework. |
X. He; L. Wan; |
| 304 | Patch-Based Active Source-Free Domain Adaptation for Annotation-Efficient Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To improve annotation efficiency while preserving data privacy, we propose a novel Active Source-Free Domain Adaptation (ASFDA) framework. |
J. Dong; Y. Zhang; Z. Zhang; L. Lin; Y. -W. Chen; R. Tong; |
| 305 | Vision Meets Language: Adaptive Joint Pruning for Efficient Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing pruning methods alleviate redundancy but remain limited: attention-based strategies may discard task-critical regions, while text-guided approaches risk overlooking implicitly important information. To address this, we propose the first visual-text joint pruning framework, which integrates visual attention distributions with text-aware signals to more reliably identify and remove redundant tokens. |
G. Wu; |
| 306 | Discrepancy-Aware Disentangled Contrastive Learning for Multimodal Rumor Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose DMCL (Disentangled Contrastive Learning), a discrepancy-aware framework for multimodal rumor detection that explicitly models cross-modal inconsistencies through subspace disentanglement. |
K. Lu; H. Zhang; Y. Yang; C. Meng; G. Yin; B. Fang; |
| 307 | S2Voice: Style-Aware Autoregressive Modeling with Enhanced Conditioning for Singing Style Conversion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present S2Voice, the winning system of the Singing Voice Conversion Challenge (SVCC) 2025 for both the in-domain and zero-shot singing style conversion tracks. |
Z. Wang; X. Xia; C. Huang; L. Xie; |
| 308 | Toward Robust Spatial Multi-Omics Integration: Spatial Multi-Omics Integration Via Alignment and Scheduled Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Spatially resolved multi-omics promises system-level insights into cellular state, regulation, and communication, yet robustly integrating heterogeneous modalities while preserving spatial boundaries remains challenging. |
K. Tan; |
| 309 | Low-Rank Weighted Amplitude and Phase Fusion for CSI-Fingerprint Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces LORWAPF, a low-rank weighted amplitude-phase fusion for indoor localization, which exploits the complementarity of distance and orientation information represented by CSI amplitude and phase. |
K. Tan; |
| 310 | Graph-Based Modeling of Heterogeneous Data Fusion with Enterprise Association Relationships: Enhancing Corporate Credit Rating Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose an innovative framework that integrates heterogeneous data sources—structured tabular data (financial and non-financial indicators) and corporate correlation graphs—into a unified dynamic graph learning model. |
B. Wen; Z. Yang; Y. Wang; L. Zhou; |
| 311 | Multi-View Spectral Clustering with Adaptive Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by the prior, the ground truth cluster assignment matrix of high-dimensional data can always be embedded in the linear space of the data, we propose the Multi-view spectral Clustering with Adaptive Regression (MCAR) framework. |
Q. Qiang; B. Zhang; Y. Hua; |
| 312 | Maximum Entropy-Based Efficient Fuzzy Graph Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Graph Clustering faces critical challenges in balancing computational efficiency and uncertainty quantification. To address these issues, we propose a novel fuzzy graph clustering framework, termed Maximum Entropy-Based Efficient Fuzzy Graph Clustering (MEFC), which establishes an explicit connection between graph clustering and fuzzy clustering under an anchor graph setting. |
Q. Qiang; B. Zhang; Y. Hua; |
| 313 | A Differential Denoising Transformer for Polyp Segmentation in Colonoscopy Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While Transformer-based models have shown promise by modeling long-range dependencies, their dense self-attention (SA) mechanisms often suffer from attention noise and boundary over-smoothing. To address this, we propose DDT-Former, a novel Differential Denoising Transformer that integrates a multi-head Edge-aware Differential Attention (MH-EDA) module within a hierarchical encoder–decoder framework. |
Y. Tong; |
| 314 | From PowerSGD to PowerSGD+: Low-Rank Gradient Compression For Distributed Optimization With Convergence Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the convergence guarantees of PowerSGD remain unclear, particularly in stochastic settings. In this paper, we show that PowerSGD does not always converge to the optimal solution and provide a clear counter-example to support this finding. |
S. Xie; C. Chen; K. Yuan; |
| 315 | Top-1 Compression Suffices for Federated Unlearning with The Help of Adaptive Error Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To achieve the best of both worlds, we propose EFFACE, which adaptively selects a suitable compression strategy depending on the distortion to the true local stochastic gradient, aided with controllable error compensation. |
B. Xiao; S. Liu; Q. Ling; |
| 316 | Advancing LLM-Based Multi-Channel Multi-Speaker Speech Recognition with Global Cross-Channel Attention and Sentence-Ordered First-In First-Out Serialized Output Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While multi-microphone input outperforms single-microphone setups in multi-speaker scenarios, multi-channel multi-speaker ASR remains challenging due to data scarcity, complex acoustic conditions, and cross-channel dependency modeling limitations. To address these challenges, we propose a novel framework integrating a Large Language Model (LLM) into multi-channel multi-speaker ASR for the first time. |
G. Wan; |
| 317 | UniKGLM: A Unified LLM-Driven Multi-Task Reasoning Framework for Knowledge Graph Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Nevertheless, confining LLMs to isolated pipeline stages fails to unleash their full-chain cognitive potential. To address this limitation, we propose UniKGLM, a full-chain, multi-task reasoning framework integrating type inference, path semantic retrieval, and triple reranking, leveraging a text-to-text approach within a unified structure that fine-tunes LLM with LoRA. |
Z. Jiang; Z. Wang; |
| 318 | A Novel Multiscale Order-Frequency Spectral Correlation Estimator for Angle-Time Cyclostationary Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, a fast and reliable multiscale OF-SC estimator, termed the order-frequency wavelet cyclic modulation spectrum (OF-WCMS), is proposed by incorporating the continuous wavelet transform. |
H. Ren; Z. Zhong; R. -B. Sun; X. Chen; |
| 319 | TrafficHTG: Revolutionizing Network Traffic Generation with Hierarchical Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, traditional simulation tools struggle to reproduce the detailed characteristics of real network traffic, while model-based generation approaches are often limited by their research objectives or model performance, typically resulting in either the inability to generate raw traffic or the production of low-quality synthetic traffic. To address these challenges, this paper proposes a hierarchical autoregressive architecture for traffic generation, named TrafficHTG, which leverages protocol-aware semantic segmentation and hierarchical encoder-decoder mechanism to enable effective transfer of autoregressive models to the task of traffic generation. |
J. Qin; |
| 320 | TrafficMoE: Adaptive Multi-Perspective Feature Fusion for Enhancing Malicious Traffic General Detection Capability Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although pre-training techniques can alleviate the problem of data scarcity, most existing methods pre-train or fine-tune models from scratch using packet-level information, which restricts them to learning only one-dimensional packet-level features and thus limits the model’s general detection capability across diverse types of attacks. To address these challenges, we propose TrafficMoE, which enhances traffic understanding in attack scenarios by integrating cross-attention mechanisms and position-dependent gating to jointly analyze traffic features extracted from multiple perspectives. |
J. Qin; |
| 321 | Multi-Scale Positivity Graph Transformer for Fine-Grained Image Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: Existing fine-grained image recognition (FGIR) methods mainly fuse scene text with high-level visual features to identify image, failing to capture the low- and middle- level … |
M. Duan; P. Zhang; J. Wang; L. Liu; T. Zhang; P. Shi; |
| 322 | Diffface-Edit: A Diffusion-Based Facial Dataset for Forgery-Semantic Driven Deepfake Detection Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet existing AI-generated face datasets seldom cover subtle, region-level manipulations, and the effect of splice attacks between real and edited samples on detectors remains underexplored; we term these detector-evasive samples. To fill these gaps, we introduce DiffFace-Edit, a dataset with over two million fake images and edits across eight facial regions with diverse single- and multi-region combinations. |
F. Ding; W. Yi; X. He; M. Xiao; J. Xu; J. Du; |
| 323 | Constrained Local Point Cloud Perturbations Using Adaptive Curvature for 3D Adversarial Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: 3D point cloud recognition models are highly susceptible to adversarial perturbations, whereas existing approaches often introduce visible distortions, suffer from weak transferability, and achieve limited attack success. To address these challenges, we propose a novel adversarial framework that constrains point perturbations through reversible transformation, employs hierarchical sampling to preserve structural keypoints, and refines perturbations using gradient-guided updates. |
Z. Xu; |
| 324 | ForgetMark: Stealthy Fingerprint Embedding Via Targeted Unlearning in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce ForgetMark, a stealthy fingerprinting framework that encodes provenance via targeted unlearning. |
Z. Xu; |
| 325 | DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Dual-Layer Nested Fingerprinting (DNF), a black-box method that embeds a hierarchical backdoor by coupling domain-specific stylistic cues with implicit semantic triggers. |
Z. Xu; |
| 326 | KinGuard: Hierarchical Kinship-aware Fingerprinting to Defend Against Large Language Model Stealing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Conventional backdoor fingerprinting, however, is flawed by a stealth-robustness paradox: to be robust, these methods force models to memorize fixed responses to high-perplexity triggers, but this targeted overfitting creates detectable statistical artifacts. We resolve this paradox with KinGuard, a framework that embeds a private knowledge corpus built on structured kinship narratives. |
Z. Xu; |
| 327 | CZSRSSC: Continual Zero-Shot Remote Sensing Scene Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: Remote sensing scene classification, a core technology in fields such as disaster response, resource management, and urban planning, often faces challenges in real-world … |
Z. Xu; |
| 328 | Align to The Pivot: Dual Alignment with Self-Feedback for Multilingual Math Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We attribute the decline to the model’s inconsistent multilingual understanding and reasoning alignment. To address this, we present Pivot-Aligned Self-Feedback Multilingual Reasoning (PASMR), aiming to improve the alignment of multilingual math reasoning abilities in LLMs. |
C. Zhao; X. Huang; X. Han; S. Huang; C. Deng; J. Feng; |
| 329 | Structure-Aware Adversarial Purification: Dynamic Masking and Attribution Refinement In Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents a novel adversarial purification framework,Structure-Aware Attribution-guided Purification with Diffusion Models (SAAP-DM), which integrates multiple attribution-based explainability techniques into a diffusion process. |
C. Fan; W. Lu; D. Zhang; D. Zhao; Z. Gu; |
| 330 | KPMG: A Graphical Koopman-Mamba Approach for Financial Markets Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, current models suffer from disturbances and impurities embedded in the financial data. To address these challenges, we propose KPMG, an efficient architecture that integrates the strengths of Mamba and Graph Neural Networks. |
S. Xiong; C. Tang; F. Okubo; T. Minematsu; Y. Hu; A. Shimada; |
| 331 | GlucoMixer: An Efficient Glucose Monitoring Model with Mixers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To strike a balance between accuracy and trustworthiness, we propose GlucoMixer, an Encoder-only architecture built predominantly with Mixer modules. |
S. Xiong; J. Wang; T. Sun; C. Tang; F. Okubo; A. Shimada; |
| 332 | GazeFormer-MoE: Context-Aware Gaze Estimation Via Clip and MoE Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a semantics modulated, multi scale Transformer for 3D gaze estimation. |
X. Zhao; X. Chen; A. Chaddad; |
| 333 | NEWA: A Dual-Level FNIRS-Guided Distillation Framework for Enhancing EEG-Based BCIs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address these, we propose fNIRSGuided EEG Channel Weight Generator and Contrastive Aligner (NEWA), a framework that integrates a two-layer knowledge distillation strategy within a decoupled training–inference architecture, leveraging multimodal data during training but relying solely on EEG for inference. |
K. Zeng; G. Cai; T. Ma; |
| 334 | CG-DMER: Hybrid Contrastive-Generative Framework for Disentangled Multimodal ECG Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recent multimodal approaches that integrate ECGs with accompanying clinical reports show strong potential, but they still face two main concerns from a modality perspective: (1) intra-modality: existing models process ECGs in a lead-agnostic manner, overlooking spatial–temporal dependencies across leads, which restricts their effectiveness in modeling fine-grained diagnostic patterns; (2) inter-modality: the existing methods directly align ECG signals with clinical reports, introducing modality-specific biases due to the free-text nature of the reports. In light of these two issues, we propose CG-DMER, a contrastive-generative framework for disentangled multimodal ECG representation learning, powered by two appealing designs: (1) Spatial-temporal masked modeling is designed to better capture fine-grained temporal dynamics and inter-lead spatial dependencies by applying masking across both spatial and temporal dimensions and reconstructing the missing information. |
Z. Niu; |
| 335 | DART: A Dual-Modality Adaptive Representation with Divergence Training Framework for ZS-CIR Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Besides, they cannot match the various search requirements in different scenarios. To address these challenges, we propose DART, a Dual-modality Adaptive Representation with divergence Training framework for ZS-CIR. |
S. Liu; Y. Wang; J. Lin; Y. Wen; C. Yuan; |
| 336 | Averaging Is Not Enough: Preserving Client-Specific Knowledge in Federated PEFT with One-Round Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we identify the root cause is the incompatibility between PEFT methods and FL’s aggregation mechanism, where conventional averaging fails to preserve personalized client knowledge, leading to suboptimal performance and slower convergence. |
H. Cheng; J. Huang; Q. Liu; L. Zhang; |
| 337 | SAFE-IMM: Robust and Lightweight RADAR-Based Object Tracking on Mobile Platforms Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose SAFE-IMM, a lightweight IMM variant for tracking on mobile and resource-limited platforms with a safe covariance-aware gate that permits WTA only when the implied jump from the mixture to the winner is provably bounded. |
D. Mandaokar; B. Rinner; |
| 338 | CAST-ACF: Robust Generation and Evaluation for Multi-Granularity Timeline Summarization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Second, current evaluation metrics have a bias that gives overly optimistic scores. To solve these problems, we introduce a new framework consisting of: (i) Chronology-windowed Abstractive Summarization for multi-granularity Timelines (CAST), an end-to-end generation method using a chronology-windowed approach; and (ii) ACF (Alignment/Coverage/Factuality), an improved evaluation method that assesses timelines without relying on the Hungarian algorithm to ensure fairness. |
Y. Ai; F. Kong; |
| 339 | Balancing Efficiency and Fidelity In Image Super-Resolution Via Attention-Enhanced Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Prevailing Transformer-based models provide strong global modeling capabilities but still struggle to capture spatial and channel dependencies as well as multi-scale textures, whereas lightweight CNNs and distillation methods reduce complexity at the cost of degraded reconstruction quality. To overcome these challenges, we propose the Efficient Separable Distillation Attention Network (ESDANet) that unifies blueprint separable convolutions with dual residual distillation for efficient feature compression. |
Y. Niu; X. Chen; J. Hua; H. Li; Z. Wang; |
| 340 | Class-Imbalanced Multi-view Clustering Via Synthetic Minority Over-Sampling Technique Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In addition, these methods fail to achieve coordinated alignment when integrating clustering distributions from different views. To address these issues, we propose a framework, called Class-imbalanced Multi-view Clustering via Synthetic Minority OverSampling Technique (CMC-SMOTE). |
W. Liu; J. Zhu; J. Tan; Y. Zhang; M. Miao; |
| 341 | Marking The Margin: Robust DNN Watermarking Against Removal Attacks Via Sculpting Decision Boundaries Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Unfortunately, existing watermarking methods are vulnerable to adaptive removal attacks, as their separable designs fail to transfer to surrogate models trained via model extraction. In this work, we propose MarginMark, a novel framework that embeds an inseparable geometric signature into the decision boundary, providing robustness against such attacks. |
D. Xue; Y. Liu; S. Xiao; Z. Kang; W. Li; J. Yang; |
| 342 | MSBench: Can Speech Language Models Generate Multi-Speaker Dialogues in One Passƒ Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing evaluations that primarily focus on short, single-speaker utterances often neglect long-form, multi-speaker dialogues typical of podcasts and other real-world settings. To address this, we introduce MSBench, a benchmark designed to assess the ability of SLMs to generate natural, multi-speaker dialogues with semantic and paralinguistic cues. |
Z. Xu; T. Liu; H. Shen; M. Liu; L. Duan; |
| 343 | Constructing Composite Features for Interpretable Music-Tagging Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Combining multiple audio features can improve the performance of music tagging, but common deep learning-based feature fusion methods often lack interpretability. To address this problem, we propose a Genetic Programming (GP) pipeline that automatically evolves composite features by mathematically combining base music features, thereby capturing synergistic interactions while preserving interpretability. |
C. Xue; |
| 344 | Learnable Instance Attention Filtering for Adaptive Detector Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, existing attention filtering mechanisms are typically heuristic or teacher-driven, rather than learned with the student. To address these limitations, we propose Learnable Instance Attention Filtering for Adaptive Detector Distillation (LIAF-KD), a novel framework that introduces learnable instance selectors to dynamically evaluate and reweight instance importance during distillation. |
C. Liu; Q. Lan; Z. Ding; X. Chu; Q. Tian; |
| 345 | FEDPROTOALIGN: Federated Prototype Alignment Under Identity Inconsistency for Gait Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Beyond non-IID shifts from cross-camera viewpoints, federated gait recognition uniquely faces identity-space inconsistency, where the same person receives inconsistent labels across clients (e.g., cameras), degrading discriminative representations. To address these issues, we propose FedProtoAlign (FPA), a federated framework for unsupervised, identity-aware representation learning under disjoint identity spaces. |
C. Lin; |
| 346 | ECSA: Dual-Branch Emotion Compensation for Emotion-Consistent Speaker Anonymization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these methods often degrade the emotional information in speech, limiting their reliability in emotion-sensitive scenarios. To mitigate this issue, we propose an emotion-preserving speaker anonymization framework. |
C. Lin; |
| 347 | Digital Human-Assisted Smart Contract Vulnerability Detection Under Limited Sample Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address the aforementioned challenges, we propose a digital human framework based on the large language model (LLM). |
P. Su; J. Hu; X. Yao; X. Cui; |
| 348 | Wavelet-Consistent Diffusion Posterior Sampling for Limited-Angle and Sparse-View CT Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Wavelet-Consistent Diffusion Posterior Sampling (WCDPS), a framework built around a novel anchor refinement (AR) strategy. |
J. Hu; C. Fu; |
| 349 | Multi-Agent Deep Reinforcement Learning-Based IoV Secure Data Transmission Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a multi-agent reinforcement learning-based secure transmission framework for IoV systems against a smart attacker that can perform various attack patterns. |
X. Lu; Z. Liu; D. Ren; Z. Liu; Y. Bu; |
| 350 | A Framework For Text-To-Semantic Segmentation Map Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Specifically, we propose a semantic segmentation map generation framework named TSeg, in which a low-to-high resolution strategy is designed for higher input consistency. |
X. Zheng; G. Jiang; S. Hou; W. Wang; |
| 351 | Unsupervised UAV Detection from Sparse Lidar Via Temporal Dispersion Signatures Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose SLTDS, a training-free framework that uses a Temporal Dispersion Signature (TDS) to separate moving targets from static and noisy background: static structures exhibit a wide temporal spread across scans, whereas UAVs form compact, transient spatiotemporal clusters. |
S. Yuan; Z. Qi; Z. Duan; Y. Li; B. Lou; |
| 352 | Emotion Recognition Based on EEG Neuroscience Features: Microstate and Source Microstate Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: For cross-subject recognition, the microstate method achieved accuracies of 89.73±1.38% (SEED), 86.64±2.38% (valence)/83.47±1.85% (arousal) on DEAP. |
S. Hu; L. Ding; G. Hu; W. Yao; Y. Lin; Z. Lv; |
| 353 | Hyper-GST: A Dyadic EEG Based Graph-Swin Transformer Model for AIGC Deepfake Video Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We extracted inter-subject neural synchrony metrics, including Inter-Subject Correlation, Phase-Locking Value, and Coherence, as robust neuro-biomarkers for video authenticity. To decode these patterns, we propose Hyper-GST, a dual-branch model fusing a Swin Transformer and GNN. |
S. Hu; Z. Zha; Y. Fang; D. Jia; Z. Lv; |
| 354 | Heterogeneous Adversarial Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing HFL approaches face challenges in balancing personalization performance and data privacy protection. To address these limitations, we propose HFedAdv, a novel Heterogeneous Federated learning method that employs Adversarial learning for an effective balance of model personalization and generalization while protecting data privacy. |
W. Wang; L. Yi; G. Wang; X. Liu; |
| 355 | ITDS-SQL: Enhancing Text-to-SQL Parsing By In-Context Learning with Inference Time Data Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing data synthesis methods face two key limitations when applied to ICL: (1) they require the pre-generation of large-scale synthetic datasets for example retrieval, which is resource-intensive, and (2) they often introduce domain-irrelevant or query-intent-inconsistent natural language question (NLQ)–SQL pairs, leading to significant performance degradation. To address these challenges, we propose ITDS-SQL, an inference-time data synthesis framework for ICL augmentation in text-to-SQL tasks. |
J. Su; S. Duan; Y. Zhang; C. Liu; P. Han; |
| 356 | IEUOD: Improving Underwater Object Detection Via Shallow Feature Guidance from Underwater Image Enhancement Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose Image Enhancement Guided Underwater Object Detection (IEUOD), a novel training framework that uses enhanced images as supervisory signals rather than direct inputs. |
W. Ouyang; W. Qiu; X. Zhong; Z. Wu; |
| 357 | Respire-Mamba C-UNet: Consistency-Trained Autoencoder for High-Fidelity Respiratory Sound Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Respire-Mamba C-UNet, a unified framework that integrates a physiology-aware SincConv frontend with power-law scaling, a Pyramid-UNet encoder for multi-scale representation, and a consistency-trained UNet encoder–decoder equipped with a Temporal Mamba bottleneck, further enhanced by variance-preserving rescaling and per-band frequency gating. |
Rishabh; Y. Meena; D. Kumar; K. Singh; Nidhi; |
| 358 | Melos: Sentence-To-Section Training with Multi-Task Learning for LLM-Driven Song Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address these, we propose a large language model (LLM)-based framework with a novel two-stage training strategy that progresses from sentence-level to section-level. |
D. Wu; J. Lu; B. Su; S. Lei; X. Cai; Z. Wu; |
| 359 | Q4Q: Quantum for Quantization in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Q4Q (Quantum-computing for Quantization), a quantum-based method for efficient bitwidth allocation in model quantization. |
G. Li; Y. Li; J. Jia; T. Deng; Y. Tao; |
| 360 | SSG-DIT: A Spatial Signal Guided Framework for Controllable Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose SSG-DiT (Spatial Signal Guided Diffusion Transformer), an efficient framework for high-fidelity controllable video generation. |
P. Hu; Y. Gu; L. Luo; F. Ren; |
| 361 | Hashing-Baseline: Rethinking Hashing in The Age of Pretrained Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce Hashing-Baseline, a strong training-free hashing method leveraging powerful pre-trained encoders that produce rich embeddings. |
I. Moummad; K. Zaher; L. Rauch; A. Joly; |
| 362 | Probabilistic Graphical Modeling for Biomedical Signal Completion with Non-Random Missingness on Patient Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Furthermore, valuable relational information encapsulated in patient similarity networks is often ignored. To address these challenges, we propose PSMR-MNAR, a novel probabilistic graphical model for biomedical signal completion. |
D. Xue; W. Lu; |
| 363 | VividTalker: A Modular Framework for Expressive 3D Talking Avatars with Controllable Gaze and Blink Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The synthesis of believable 3D talking avatars is critically hampered by two fundamental obstacles: fragmented, inefficient pipelines and the absence of realistic nonverbal behaviors, leading to a lifeless dead-eye gaze. We present VividTalker, a unified framework that systematically addresses these twin challenges. |
H. Xiong; J. Zhang; Z. Wang; T. Pan; Q. Hu; |
| 364 | See What You Need: Query-Aware Visual Intelligence Through Reasoning-Perception Loops Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Mimicking human cognition, we propose CAVIA, a training-free framework that closes the loop between reasoning and perception through adaptive bidirectional dialogue. |
Z. Dong; |
| 365 | Contextual Clue Mining and Class Calibration for Weakly Supervised Video Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Additionally, for multi-category anomaly classification, CLIP’s compact normal-class embedding potentially leads to inter-class confusion, causing misclassification of normal frames. To address these challenges, we propose the Contextual Clue Mining network for Weakly Supervised Video Anomaly Detection (C2M-VAD), a novel WS-VAD framework integrating two key components: a temporal selective kernel module that adaptively adjusts the receptive field via input-conditioned kernel fusion, and an anomaly-class calibration module that mitigates semantic confusion through residual correction of CLIP logits. |
S. Zhang; |
| 366 | AdaFlow: Efficient Long Video Editing Via Adaptive Attention Slimming and Keyframe Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a novel and training-free method, AdaFlow, towards efficient and effective long video editing. |
S. Zhang; |
| 367 | Relate: Enhance Composed Video Retrieval Via Minimal-Redundancy Hierarchical Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address these limitations, we face three key challenges: (1) hierarchical semantic modeling, (2) temporal sparsification, and (3) modification-driven aggregation. Based on this, we propose a minimal-Redundancy hiErarchical coLlaborATive nEtwork (RELATE). |
S. Zhang; |
| 368 | Whisper-MLA: Reducing GPU Memory Consumption of ASR Models Based on MHA2MLA Conversion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, its Multi-Head Attention (MHA) mechanism results in significant GPU memory consumption due to the linearly growing Key-Value (KV) cache usage, which is problematic for many applications especially with long-form audio. To address this, we introduce Whisper-MLA, a novel architecture that incorporates Multi-Head Latent Attention (MLA) into the Whisper model. |
S. Zhang; |
| 369 | FTIN: Frequency-Time Integration Network for Inertial Odometry Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, high IMU sampling rates introduce substantial redundancy that impedes IO’s ability to attend to salient components, thereby creating an information bottleneck. To address this challenge, we propose a cross-domain IO framework that fuses information from the frequency and time domains. |
S. Zhang; |
| 370 | Robust In-Bed Human Pose and Shape Estimation from Pressure Images with Clinical Awareness Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a clinically grounded dataset collected in simulated intensive care settings, featuring diverse bed configurations and precise 3D annotations. |
C. Fang; |
| 371 | ViTex: Visual Texture Control for Multi-Track Symbolic Music Generation Via Discrete Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We identify instrumentation, the choice of instruments and their roles, as a natural dimension of control in multi-track composition, and propose ViTex, a visual representation of instrumental texture. |
X. Yi; Q. He; G. Xia; Z. Wang; |
| 372 | PILED: Physics-Informed Low-Light Enhancement and Deblurring Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Specifically, we introduce a novel illumination-invariant regularizer derived from the Kubelka–Munk reflection theory, which leverages photometric invariant features to achieve reliable illumination recovery while preserving intrinsic reflectance and color properties. |
P. Yu; H. Yang; P. Jia; X. Tang; F. Xu; |
| 373 | Tips Over Tricks: Simple Prompts for Effective Zero-Shot Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We revisit the backbone and use TIPS—a VLM trained with spatially aware objectives. |
A. Salehi; |
| 374 | Sequential Multiple Testing with Three Hypotheses and Known Number of Streams Following Each Hypothesis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we consider the problem of testing the marginal distributions of multiple independent, sequentially observed data streams, where for each stream there are three hypotheses to select from. |
Y. Xing; Y. Chen; T. Qu; |
| 375 | Context-Aware Deep Hashing for Cross-Domain Image Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing methods typically focus on generating reliable pseudo-labels or semantic prototypes for target domain modeling, which primarily capture category-related cues but often overlook the valuable contextual information embedded in background regions. To address this limitation, we propose a Context-Aware Deep Hashing (CADH) method that explicitly exploits contextual relationships within images to enhance semantic richness and generate more discriminative hash codes. |
S. Xiao; H. Cui; X. Han; L. Zhao; F. Li; C. Zheng; |
| 376 | VoXtream: Full-Stream Text-To-Speech With Extremely Low Latency Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present VoXtream, a fully autoregressive, zero-shot streaming text-to-speech (TTS) system for real-time use that begins speaking from the first word. |
N. Torgashov; G. E. Henter; G. Skantze; |
| 377 | Box-Chain VLA: Explicit Reasoning-to-Action Interfaces for Generalizable Robotic Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This architectural separation creates a semantic gap between high-level planning and low-level control, limiting fine-grained grounding and robustness in complex, cluttered environments. To bridge this gap, we propose Chain-of-Boxes Reasoning VLA (Box-Chain VLA), a framework that unifies reasoning and action within a shared latent space. |
H. Huang; |
| 378 | Prior Knowledge Driven Multi-View Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In multi-view clustering tasks, graph construction and information fusion typically lack effective prior knowledge, which makes it difficult to adequately capture the internal structure of multi-view data and thereby degrades clustering performance. To address this issue, Prior Knowledge Driven MultiView Clustering (PKDMVC) model is introduced, which incorporates the first-order neighbor relationships of samples as prior knowledge to guide the clustering process. |
H. Xin; |
| 379 | DPAnet: Dual Pyramid Attention Network for Multivariate Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose the Dual Pyramid Attention Network (DPANet), a novel architecture that explicitly decouples and concurrently models temporal multi-scale dynamics and spectral multi-resolution periodicities. |
Q. Li; X. Zhang; S. Wang; W. Jia; |
| 380 | SimToken: A Simple Baseline for Referring Audio-Visual Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a simple framework, SimToken, that integrates a multimodal large language model (MLLM) with the Segment Anything Model (SAM). |
D. Jin; Y. Zhou; J. Zhou; J. Ma; R. Guo; D. Guo; |
| 381 | Pixel-Patch Graph Regularized Group Sparse Representation for Single-Image Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Most GSR-based methods focus on preserving similarity but often ignore noise with spatial correlations, resulting in over-smoothing. In this paper, pixel-patch graph regularized group sparse representation (PPGR-GSR) is proposed to address this limitation. |
X. Hou; X. -Q. Jiang; S. Zhou; H. Feng; |
| 382 | Disaster-Affected Area Extraction Method Through Pixel Difference Convolution and Frequency-Domain Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, applying these models to disaster area extraction presents significant challenges – small spatial scales, weak spectral differences, and ambiguous boundaries. To address these limitations, we propose FourierPDC-SAM, a novel SAM adaptation framework. |
W. Su; S. Qiu; R. Ju; |
| 383 | Characterizing The Confounding Effects of Cybersickness On Mental Workload And Stress Detection Performance During An Immersive Virtual Reality Driving Task Using Electroencephalography Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we aim to characterize the confounding effects that cybersickness may have on measures of stress and workload. |
A. Tiwari; |
| 384 | TPP-LLM: Time Series Popularity Prediction Via LLM Empowered By Textual Prototype and Prompt Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Time series Popularity Prediction via LLM empowered by textual prototype and prompt (TPP-LLM) that addresses the popularity prediction task by introducing dual-scale information enrichment for time series: Patch-level Text Prototype Retrieval and Sequence-level Prompt Generation. |
Z. Wang; Z. Xie; Y. Sun; W. Niu; C. Su; |
| 385 | Human Mesh Recovery from Partial Point Cloud Without Human Annotations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Nevertheless, accurately recovering the complete 3D human meshes from such data remains a challenge. To address this, we propose segHMR to identify point-wise semantics of the partial point cloud in a canonical semantic space and then leverage semantic alignment and semantic correction to recover the mesh from the point cloud. |
C. Su; B. Jin; F. Zhang; S. Li; Z. Wang; |
| 386 | ParaAegis: Parallel Protection for Flexible Privacy-Preserved Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This lack of flexibility hinders the practical implementation. To address this, we introduce ParaAegis, a parallel protection framework designed to give practitioners flexible control over the privacy-utility-efficiency balance. |
Z. Wu; Y. Li; T. Liao; J. Lou; C. Chen; |
| 387 | Dynamic Spectrogram Analysis with Local-Aware Graph Networks for Audio Anti-Spoofing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The rapid proliferation of deepfake techniques has introduced diverse forgery artifacts, posing substantial challenges to audio anti-spoofing. This paper proposes a model that adaptively tailors feature extraction to the specific nature of these artifacts. |
Y. Li; C. Chen; D. Chen; N. Zeng; K. Zeng; |
| 388 | DBFT-SD: Weakly Supervised Multimodal Detection of Sensitive Audio-Visual Content Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The rapid growth of online audio-visual content raises urgent demands for detecting sensitive information under weak supervision. We propose the Dynamic Bottleneck Fusion Transformer (DBFT) and its self-distilled variant (DBFT-SD). |
S. Xiao; X. Ji; H. Yan; X. Yu; |
| 389 | QCA-RAG: Efficient Retrieval for LLMs Via Query Complexity Awareness Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To adress thie issue, we propose QCA-RAG, a Query Complexity-Aware framework that dynamically adjusts retrieval behavior based on query complexity, enabling the LLM to generate high-quality responses with reduced retrieval overhead. |
Y. Zhu; L. Li; J. Liu; H. Chen; Z. Chen; L. Xi; |
| 390 | MCF-Net: A Mamba-based Efficient Network for Radar Jamming Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, such transformations introduce additional computational overhead during dataset generation and risk information loss. To tackle with these problems, we propose the Mamba-based Complex Signal Fusion Network (MCF-Net), an efficient deep learning architecture and the first to apply the Mamba framework to radar jamming recognition. |
W. Zheng; J. Shi; Y. Li; Y. Xia; |
| 391 | SiNC: Similarity-Informed Network Calibration for Robust Enzyme-Substrate Prediction with Unreliable Negative Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Datasets are augmented by randomly sampling potential non-substrates to overcome the lack of negative data. To address this challenge, we propose SiNC (Similarity-informed Network Calibration) framework unifying similarity-aware data calibration, SMILEScore, with tailored, noise-robust SiNC-Loss. |
Q. Zhai; Y. Deng; D. Li; W. Geng; J. Weng; |
| 392 | DC-Mamber: A Dual Channel Prediction Model Based on Mamba and Linear Transformer for Multivariate Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Transformers effectively model global dependencies but face quadratic complexity and weak local sensitivity, while Mamba, based on state space models (SSMs), achieves linear complexity and efficient long-range modeling yet struggles with parallel global contextual information. To address these limitations, we propose DC-Mamber, a dual-channel prediction model based on Mamba and linear Transformer. |
B. Fan; S. Ma; Y. -B. Zhao; G. Xu; |
| 393 | A Multi-Dimensional Feature Fusion and Multi-Level Domain Adaptation Network for Cross-Subject EEG Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, inter-subject variability in EEG data distribution causes substantial domain shifts, severely limiting model generalization across subjects. To address this challenge, we propose M2Net, a novel network that integrates multi-dimensional feature fusion and multi-level domain adaptation for cross-subject EEG emotion recognition. |
C. Xie; R. Chen; Z. Huang; J. Zhang; L. Qiu; J. Pan; |
| 394 | SDR-STE: Synergistic Disentanglement and Refinement for Photorealistic Scene Text Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods often handle content and style in an implicit and entangled manner, leading to mutual interference between background reconstruction and text rendering, which can result in structural distortions and texture artifacts, particularly in scenes with complex backgrounds and high-frequency details. To address these issues, we propose a disentangled generation framework that explicitly separates structure repair from texture synthesis in the latent space, and decomposes coarse-grained generation and fine-grained enhancement in the image space, thereby establishing clear division of labor and synergy between background and foreground rendering. |
Z. Jia; J. Wang; R. Jin; K. Song; Z. Wang; |
| 395 | Modality-Aware Token Filtering and Common Feature Enhancement Network for Multi-Modal Vehicle Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present the Modality-Aware Token Filtering and Common Feature Enhancement Network (MCNet). |
M. Deng; Y. Deng; Z. Tang; G. Xiao; |
| 396 | HD-PPT: Hierarchical Decoding of Content- and Prompt-Preference Tokens for Instruction-Based TTS Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although instruction-based Text-to-Speech (Instruct-TTS) models are proposed, these models still lack fine-grained control due to the modality gap between single-level text instructions and multilevel speech tokens. To address this limitation, we propose HD-PPT, a framework that transforms speech synthesis into a structured, hierarchical task. |
S. Nie; X. Xing; J. Xing; B. Liu; X. Xu; |
| 397 | Synergizing Large-Scale Music Representations and Metric-Based Meta-Learning For Few-Shot Song Aesthetics Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: With only 2k+ labeled songs, the task reduces to a few-shot regime, imposing a severe generalization burden. To tackle these challenges, we leverage tens of millions of tracks from Netease Cloud Music to pre-train two music representation models from different perspectives, yielding holistic representations for accurate aesthetics scoring. |
J. Chen; X. Bai; X. Pan; X. Mu; |
| 398 | Multimodal Multi-Agent Empowered Legal Judgment Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Traditional methods often rely on statistical analyses or role-based simulations but face challenges with multiple allegations, diverse evidence, and lack adaptability. In this paper, we introduce JurisMMA, a novel framework for LJP that effectively decomposes trial tasks, standardizes processes, and organizes them into distinct stages. |
Z. Kang; |
| 399 | Beyond Human Skeletons: Prompt-Guided Graph Matching for Multi-Limbed Pose Estimation in Artistic Imagery Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a prompt-guided multi-instance framework combining visual prompts, cross-modal attention, and graph-based matching. |
Y. Xian; Y. Lee; Y. Xiang; T. Shen; D. Gao; |
| 400 | Mask-Free Thangka Restoration Via Retrieval-Guided Diffusion with Semantic and Structural Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a retrieval-augmented, region-aware diffusion method that restores color, texture, and missing content without user masks. |
Y. Xian; T. Shen; Y. Xiang; Y. Lee; D. Gao; |
| 401 | Streaming Speech Recognition with Decoder-Only Large Language Models and Latency Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a novel streaming ASR approach that integrates a read/write policy network with monotonic chunkwise attention (MoChA) to dynamically segment speech embeddings. |
G. Wan; W. Zhang; J. -X. Zhang; S. Xiong; J. Gao; Z. Ye; |
| 402 | Non-Homogeneous Haze Removal Based on Deep Unfolding Network for Remote Sensing Images Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To solve the challenge of removing non-homogeneous dense haze from the remote sensing images, we propose a novel alternating iterative dehazing algorithm based on the atmospheric scattering model. |
Y. Zhan; Q. Yu; J. Nie; J. Liu; Z. Wang; |
| 403 | Coarse-to-fine Trajectory Prediction Via Time-Aware Interaction Predictor and Conditional Diffusion-based Refiner Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods typically rely on history-based interaction modeling, which often fails to capture potential future impacts among agents, leading to inaccurate predictions. To address this, we propose TAIDR, a two-stage framework that combines Time-Aware future Interaction modeling with Diffusion-based Refinement. |
G. Zheng; J. Lin; Z. Liu; Z. Li; F. Rong; S. Su; |
| 404 | Task-Aware Modality-as-Experts Fusion of NIR and Microscopic Image for Textile Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We tackle the long-standing need for fast, non-destructive textile analysis by pairing a new method with a new resource. |
J. Kim; M. Chi; |
| 405 | Bridging Academia and Industry: Large-Scale NIR Signal Foundation for Robust Multi-Task and Real-World Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present FabricSpectra, a large, real-world foundation dataset for fabric component analysis. |
J. Kim; M. Chi; |
| 406 | StyMam: A Mamba-Based Generator for Artistic Style Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: SD-based methods reduce such issues but often fail to preserve content structures and suffer from slow inference. To address these issues, we revisit GAN and propose a mamba-based generator, termed as StyMam, to produce high-quality stylized images without introducing artifacts and disharmonious patterns. |
Z. Hong; |
| 407 | Disentangled Structure Prior Propagation for Guided Depth Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the redundant texture details in color images often interfere with the super-resolution process. To address this challenge, we propose the Disentangled Structure Prior Propagation Network (DSPPNet), which includes the Structure-Texture Disentangler (STD) module that isolates and purifies structure-specific features, and the Structure Prior Propagation (SPP) module that propagates purified structural priors across the network to guide depth reconstruction. |
X. Sun; H. Li; X. Ye; R. Xu; |
| 408 | RISC-V Microarchitecture Information Leakage Attack Via Transient Execution Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The proposed attack framework comprehensively analyzes cache security and timing side-channel attacks across various microarchitectures. |
J. Wang; R. Zhai; Y. Wang; C. Liang; B. Cui; |
| 409 | Semantic and Temporal-Aware Distillation for Class-Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To solve the problems, we propose two lightweight and plug-and-play enhancement mechanisms for CIL models. |
D. Guo; X. Wang; Y. Zheng; X. Wang; Y. Cui; |
| 410 | Tensorformer-Based Multimodal Depression Detection from Concurrent Gait Patterns and Physiological Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Depression detection remains challenging as current systems often rely on active sensing modalities that require explicit participant engagement, thereby limiting ecological validity and introducing social desirability bias. To address this limitation, we propose Tensorformer-MG, a novel multimodal framework that integrates passive gait monitoring with physiological signals for unobtrusive depression assessment. |
C. Fu; |
| 411 | Query-Scalable Few-Shot Semantic Segmentation Via In-Context Variational Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Few-shot semantic segmentation (FSS) faces critical challenges in scaling to diverse query images, as existing methods often struggle to generalize across varying query distributions with limited support samples, especially when query sets exhibit large intra-class variations or increasing complexity. To address this, we propose a novel framework for Query-Scalable Few-Shot Semantic Segmentation via In-Context Variational Inference. |
Z. Xing; S. Chen; W. Tan; B. Yan; |
| 412 | Hybrid Hierarchical-Pyramid Transformer Cascade Class-Aware Selector for Colorectal Lesion Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, this work proposes a new small deep learning model for accurate small object detection and classification. |
W. Fan; H. Fang; G. Luo; X. Luo; |
| 413 | Fed-MET: Memory-Efficient Elastic Training in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Fed-MET (Federated Memory-Efficient Elastic Training), a new FL framework that enables elastic training across multiple memory-constrained devices by freely choosing trainable NN modules. |
C. Miao; T. Chang; M. Wu; Y. Zha; J. Peng; X. Wang; |
| 414 | Qwen-Simplify: Exploring Sentence Simplification Via Qwen-Based Reinforcement Learning Paradigm Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we explore the Qwen-based RL paradigm in the context of sentence simplification. |
P. Zhou; G. Li; X. Huang; |
| 415 | DIMO: Dual-Strategy Learning for Ambiguous Samples in Class-Imbalanced Facial Expression Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing approaches often overlook this issue, thereby compromising model performance. To address this issue, we propose the DIscriminative Margin Optimizer (DIMO), which enhances the discriminative capability for ambiguous samples and improves overall robustness through a dual-strategy design. |
L. Zeng; L. Luo; Y. Gu; F. Ren; |
| 416 | PSQ-PMC: A Hardware-Friendly Quantization Scheme for Spike-Based Neural Radiance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite significantly reducing energy consumption during inference, the membrane potential of spike neurons occupies a large portion of memory resources. To address this, we propose a hardware-friendly quantization scheme that is tailored for the spike-based NeRF model. |
R. Lin; J. Li; Z. Meng; P. Zhou; |
| 417 | ParaGSE: Parallel Generative Speech Enhancement with Group-Vector-Quantization-Based Neural Speech Codec Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recently, generative speech enhancement has garnered considerable interest; however, existing approaches are hindered by excessive complexity, limited efficiency, and suboptimal speech quality. To overcome these challenges, this paper proposes a novel parallel generative speech enhancement (ParaGSE) framework that leverages a group vector quantization (GVQ)-based neural speech codec. |
F. Liu; Y. Ai; |
| 418 | Robust Supervised Learning for Ballistocardiogram Quality Assessment Under Limited Inter-Rater Agreement Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a machine learning framework for ordinal regression of BCG signal quality (poor, unreliable, borderline, good, and excellent), leveraging time-frequency features from empirical mode decomposition and Welch’s method. |
M. S. Islam; |
| 419 | MFF-NET: Image Manipulation Localization Method Based on Multi-Scale Feature Fusion Network Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents a multi-scale feature fusion network (MFF-Net) that integrates global noise features via Transformers and local artifact cues via CNNs. |
S. Wang; S. Chen; Q. Wu; L. Cao; Y. Xing; |
| 420 | PE-Sleuth: Program-Level Semantics and Static Feature Fusion for Interpretable Ransomware Detection with LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing PE-based static detection methods suffer from limited feature coverage, weak generalization, and poor interpretability. To address these challenges, we propose PE-Sleuth, a framework that fuses program-level semantics with static features, while leveraging large language models (LLMs) for classification and rationale generation. |
H. Dai; |
| 421 | AccelGS: An Acceleration Framework for Large-Scale 3D Gaussian Splatting Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose AccelGS, an acceleration framework for large-scale 3DGS training. |
Y. Kou; |
| 422 | BSMP-SENet:Band-Split Magnitude-Phase Network for Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a frequency-aware magnitude–phase speech enhancement framework that incorporates learnable subband decomposition, multi-scale temporal modeling, and adaptive cross-band integration within a compact backbone. |
X. Ju; |
| 423 | A Lightweight Semantic Segmentation System for 3D Medical Image Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To tackle the slow speed and high energy consumption of semantic segmentation models for 3D medical images, we propose a lightweight system based on FPGA. |
X. Wu; Y. Xue; F. Qiao; Q. Song; |
| 424 | OrthoVAD: Weakly Supervised Video Anomaly Detection Via Prototype Orthogonality Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This leads to severe confusion between normal and abnormal features in the representation space. To address this challenge, this paper proposes a novel framework named OrthoVAD. |
T. Zhu; |
| 425 | A Framework for Bipartite Graph Structure Learning Through Eigenvector Partitioning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a novel bipartite GL framework that directly infers bipartite structures by leveraging the characteristic eigenstructure of bipartite adjacency matrices. |
X. Shi; A. Jiang; R. Yang; Y. Tang; M. Li; Y. Zhu; |
| 426 | Imperceptible Adversarial Example Generation Controlled By High-Frequency Signal Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes a High-Frequency Signal Controlled Generative Adversarial Network (HiF-GAN), which improves the imperceptibility of adversarial examples by extracting high-frequency signal from images and constraining perturbations to high-frequency regions. |
Q. Zhang; Z. Ying; X. Zhang; Q. Li; S. Meng; |
| 427 | VisualPrism: Disperse-and-Focus Token Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing token compressors often aggregate features indiscriminately, leading to the loss of critical global context and semantic drift under high compression. To end this, this paper introduces VisualPrism, a prior-guided-then-compress framework for visual token compressor, inspired by fovea–periphery coordination in human vision. |
R. He; |
| 428 | Sustainable Incentive for Model Trading in Decentralized and Personalized Federated Learning Via DAG-Blockchain Consensus Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a Directed Acyclic Graph (DAG)-blockchain-based trading framework with sustainable incentives. |
P. Hao; Z. Liu; J. Liu; G. Sun; |
| 429 | Gaussian Locality Prior For Contrast–Reconstruction Learning: State–Space Model-Based Time–Series Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We revisit unsupervised TSAD with a simple recipe: inject a learnable Gaussian Locality Prior into similarity/attention logits to encode temporal clustering [3], [4], and train with two signals—a two-view KL consistency aligning dependency distributions [5], [6] and a lightweight reconstruction loss providing amplitude/local-shape cues [7]. |
T. Han; Y. Li; Q. Xiong; S. Zheng; J. Guo; |
| 430 | Cardiobridge-DM: Bridging Cross-Cohort Heart Sound Synthesis Via Rhythm-Aware Semi-Supervised Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Robust AI for cardiac auscultation is hampered by data fragmentation, creating a trade-off between small, richly-annotated datasets and large, weakly-labeled ones. We introduce CardioBridge-DM, a semi-supervised diffusion framework that bridges this gap by synthesizing high-fidelity, controllable phonocardiograms (PCGs). |
C. Xu; S. Li; H. Wang; |
| 431 | Disentangling Physiology from Fidelity: Latent-Guided Diffusion Models for Cross-Modal Cardiac Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce MambaDiff-VAE, a principled framework for bidirectional ECG-PCG synthesis. |
C. Xu; S. Li; W. Xuan; H. Wang; |
| 432 | AdaGrad-Fusion: Adaptive Gradient Fusion for Memory-Efficient ECG Foundation Model Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite their potential in electrocardiogram (ECG) analysis, foundation models face a deployment crisis due to prohibitive computational and memory overhead during fine-tuning. To address this, we propose AdaGrad-Fusion (AGF), a novel framework for memory-efficient fine-tuning. |
C. Xu; Y. Zhao; Z. Zhang; H. Wang; |
| 433 | Bridging The Gap: Transforming Natural Language Questions Into SQL Queries Via Abstract Query Pattern and Contextual Schema Markup Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We identify two important gaps: the structural mapping gap and the lexical mapping gap. To tackle these two gaps, we propose PAS-SQL, an efficient SQL generation pipeline based on LLMs, which alleviates gaps through Abstract Query Pattern (AQP) and Contextual Schema Markup (CSM). |
Y. Kong; H. Hu; D. Zhang; Z. Xu; W. Wang; |
| 434 | A Competition-Cooperation Graph Adversarial Augmentation Learning with Application to Brain Disease Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose the Signed Graph Adversarial Augmentation Contrastive Learning Network (SGA-CLNet), a competition–cooperation framework for brain disease detection. |
M. Yuan; J. Wang; W. Xiong; J. Li; T. Xu; M. Shao; |
| 435 | AERIS-RTDetR: Ultrasound-Aware Real-Time Detection with Orthogonal Aniso-Scale Blocks And Echogenicity-Guided Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Anisotropic Echo-Reliability Integrated Sonographic Real-Time DETR, AERIS-RTDetR, which fuses anisotropy-aware priors with echo-calibrated reliability. |
F. Liu; J. Wang; Q. Zhang; Y. Zhang; H. Pan; |
| 436 | Adaptive Metaheuristic-Optimized Stochastic Resonance Network for DOA Estimation in Low-SNR Underwater Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes an adaptive metaheuristic-optimized SR network (AMO-SRN) that enhances signal detection while preserving phase consistency. |
T. Zhang; J. Liu; Z. Zou; |
| 437 | S-PRESSO: Ultra Low Bitrate Sound Effect Compression with Diffusion Autoencoders and Offline Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present S-PRESSO, a 48kHz sound effect compression model that produces both continuous and discrete embeddings at ultra-low bitrates, down to 0.096 kbps, via offline quantization. |
Z. Lahrichi; G. Hadjeres; G. Richard; G. Peeters; |
| 438 | Bayesian Uncertainty-Aware MRI Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a novel framework for joint magnetic resonance image reconstruction and uncertainty quantification using under-sampled k-space measurements. |
A. K. Eldaly; M. Figini; D. C. Alexander; |
| 439 | ReVIS: Reconstructing The Individual Visual Perception Via EEG Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address these, we propose ReVIS, a self-attention mechanism that captures cross-channel and cross-temporal interactions while incorporating an adaptive gating mechanism to dynamically modulate attention weights based on the corresponding EEG signals. |
Y. Tan; X. Yin; X. Chen; G. Zhang; C. Xu; |
| 440 | ILSA: Information Loss-Guided Sparsity Allocation for Pruning Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose ILSA, an Information Loss-guided Sparsity Allocation framework that employs virtual pruning to estimate perturbations and evaluates layer sensitivity via KL divergence, cosine similarity, and L2 distance. |
L. Li; Y. Wang; Z. Wang; F. Bao; |
| 441 | Joint Learning of Deterministic and Stochastic Parameters of Sparse Bayesian Neural Networks for Probabilistic Image Registration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes a sparse BNN framework for probabilistic image registration. |
Y. Hua; X. Yang; Y. Zhao; |
| 442 | Score-Guided Motion Planning: Learning The Gradient Field of Promising Regions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, a score-guided sampling framework, ScorePlanner, is proposed to address the critical challenge of sampling inefficiency in sampling-based motion planning. |
S. Wang; Q. Wu; Q. Huang; Z. Cheng; |
| 443 | Curvature-Driven Synchrosqueezing Transform: A Fine-Scale Bidirectional Method for Time-Frequency Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose the Curvature-Driven Synchrosqueezing Transform (CSST), which extends the bidirectional Synchrosqueezing Transform (BSST) framework by replacing block-wise direction estimation with pixel-wise curvature minimization, achieving finer directional resolution and continuous energy compression. |
C. Xu; V. Bruni; D. Vitulano; Y. Liao; |
| 444 | A Data-Driven Framework for Personal Sound Zone Control Addressing Loudspeaker Nonlinearities Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This nonlinearity corrupts the conventionally measured acoustic transfer functions (ATFs) and invalidates the linear control assumptions upon which these systems are built. To address these dual failure points, we propose a complete, two-stage, data-driven framework. |
L. Zhou; C. Gong; C. Huang; H. Liu; L. Gan; L. Shi; |
| 445 | Refining Cross-Modal Contradiction Via Iterative Focusing for Multimodal Sarcasm Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, this paper proposes a novel Iterative Contradiction Focusing Network (ICF) that dynamically locates and amplifies the core semantic conflicts between image and text through an iterative reasoning process. |
Z. Wang; B. Wang; P. Liu; L. Xu; |
| 446 | FED: A Fine-Grained Enhanced Dual-Routing Network for Multimodal Sarcasm Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose a novel framework, Fine-grained Enhanced Dual-routing (FED) for the detection of multimodal sarcasm. |
Z. Wang; B. Wang; F. Xu; Z. Yu; P. Liu; L. Xu; |
| 447 | TF-MAMBANET: A Temporal and Frequency Fused Bidirectional Mamba Architecture for PPG Foundation Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes TF-MambaNet, a foundation model for PPG signals. |
Z. Bao; Y. Benezeth; F. Yang; Y. Zhang; H. Wang; C. Li; |
| 448 | Diffusion Bridges with Dual Conditioning and Trajectory Consistency for Multi-Contrast MRI Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing methods independently predict each timestep without effectively utilizing intermediate predictions, causing temporal inconsistency and reduced structural fidelity. To address this, we propose dual conditioning and trajectory consistency strategies. |
X. Pei; R. Ding; Z. Zeng; X. Wen; K. Guo; |
| 449 | Federated Camouflaged Poisoning Attack in Federated Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose FedCPA, a camouflaged poisoning attack that keeps models benign during federated training and activates only after FU removes the camouflage carrier. |
W. Lai; Q. Yan; S. Liang; K. Zhong; |
| 450 | DFMAD: Data-Free Backdoor Defense for Federated Learning Via Multi-Teacher Adversarial Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Data-Free Backdoor Defense for FL via Multi-Teacher Adversarial Distillation (DFMAD), which requires no real data. |
K. Zhong; Q. Yan; W. Lai; |
| 451 | Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This mismatch is particularly problematic for dysarthric speech, where articulatory imprecision and disfluencies can cause severe semantic distortions. To bridge this gap, we introduce a Large Language Model(LLM)-based agent for post-ASR correction: a Judge–Editor over the top-k ASR hypotheses that keeps high-confidence spans, rewrites uncertain segments, and operates in both zero-shot and fine-tuned modes. |
X. Zheng; S. Dong; B. Phukon; M. Hasegawa-Johnson; C. D. Yoo; |
| 452 | Residual Diffusion with Fused Accelerated Shared Distribution and Frequency-Adaptive Selection for Unified Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, their iterative sampling process still leaves considerable room for acceleration and efficiency improvements. To overcome these limitations, we propose RDiFAS-FA, which is a novel diffusion-based unified restoration framework. |
C. Li; |
| 453 | AISHELL6-Whisper: A Chinese Mandarin Audio-Visual Whisper Speech Dataset with Speech Recognition Baselines Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present AISHELL6-Whisper, a large-scale open-source audio-visual whisper speech dataset, featuring 30 hours each of whisper speech and parallel normal speech, with synchronized frontal facial videos. |
C. Li; |
| 454 | ICASSP 2026 Urgent Speech Enhancement Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: The ICASSP 2026 URGENT Challenge advances the series by focusing on universal speech enhancement (SE) systems that handle diverse distortions, domains, and input conditions. This … |
C. Li; |
| 455 | Homomorphic Convolution Reimagined: Eliminating Rotation Bottlenecks for Practical Privacy-Preserving CNN Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a homomorphic convolution framework that substantially reduces rotation cost. |
C. Li; |
| 456 | AG-Fusion: Adaptive Gated Multimodal Fusion for 3D Object Detection in Complex Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a novel Adaptive Gated Fusion (AG-Fusion) approach that selectively integrates cross-modal knowledge by identifying reliable patterns for robust detection in complex scenes. |
S. Liu; C. Xu; Y. Li; Q. Wang; D. Shi; |
| 457 | CellExch: A Multimodal Feature-Based Framework for End-to-End Prediction of LRI-Mediated Cellcell Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To assist with CCC analysis, a computational framework named CellExch is proposed. |
H. Xia; B. Ji; S. Peng; |
| 458 | ArchAgent: Scalable Legacy Software Architecture Recovery with LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present ArchAgent, a scalable agent-based framework that combines static analysis, adaptive code segmentation, and LLM-powered synthesis to reconstruct multiview, business-aligned architectures from cross-repository codebases. |
R. Pan; B. Mao; T. Ma; Z. Ling; |
| 459 | RSCC-Diff: A Novel Generative Paradigm Empowers Differential-Loss-Guided MLLM for Remote Sensing Change Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, its potential remains constrained by the lack of domain-specific fine-tuning data. To address this challenge, we propose RSCC-Diff, a three-stage MLLM framework for RSICC. |
S. Yang; J. Zhang; W. Yin; F. Fang; G. Zhang; H. Song; |
| 460 | Supervised Makeup Transfer with A Curated Dataset: Decoupling Identity and Makeup Features for Enhanced Transformation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods often suffer from limited datasets, poor disentanglement between identity and makeup features, and weak controllability. To address these issues, we make three contributions. First, we construct a curated high-quality dataset using a train–generate–filter–retrain strategy that combines synthetic, realistic, and filtered samples to improve diversity and fidelity. |
Q. Pan; Y. Wu; X. Zhao; L. Xie; G. Sun; R. Liang; |
| 461 | LePER: Label-Free Edge Polarity Reweighting for Heterophily Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose LePER, a lightweight, label-free edge polarity reweighting scheme that computes a local-contrast polarity from centered node features and reweights edges in a single preprocessing pass. |
G. Fan; M. Zhang; M. Zhao; L. Pan; |
| 462 | IM-RACG: Information Density-Based Adaptive Masking Strategy for Retrieval-Augmented Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This ignorance leads to the inclusion of redundant or counterproductive code segments that can degrade generation quality. To tackle this, we propose IM-RACG (Information density-based adaptive Masking strategy for RACG). |
C. Shi; M. Gao; B. Li; Y. Fan; Z. Gao; |
| 463 | User-Level Safety Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This limits the effectiveness of using Large Language Models (LLMs) for particular professions in their work or research. To overcome this issue, we introduce a novel task called User-Level Safety Alignment (ULSA), which requires LLMs to customize their safety alignment to match specific roles, providing tailored responses accordingly. |
Z. Zhang; L. Jing; |
| 464 | Advancing Fine-Grained Sentiment Analysis in Complex Contexts: A New Benchmark and Interpretation-Enhanced Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper aims to advance research on fine-grained sentiment analysis (FSA) in complex contexts. |
G. Xie; |
| 465 | Wifi-Gen: High-Resolution Indoor Imaging from Wifi Signals Using Generative Ai Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: Indoor imaging is a critical task for robotics and internet-of-things. WiFi as an omnipresent signal is a promising candidate for carrying out passive imaging and synchronizing … |
J. Shi; B. Zhang; A. Dubey; R. Murch; L. Jing; |
| 466 | HSRI: High-Fidelity Shape Representation with Image Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing approaches often compress shapes via variational autoencoders (VAEs) and rely on latent diffusion models to infer representations from single images, but VAEs struggle to capture intricate geometric details. To address this limitation, we propose HSRI, a cross-modal variational autoencoder (CVAE) that encodes 3D shapes into a latent space and decodes them to predict occupancy values of query points. |
P. Liu; H. Xiao; X. Wen; L. Wang; F. Sun; |
| 467 | CAMA: Character-Aware Masking and Alignment for Self-Supervised STR Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose CAMA, a novel framework that integrates linguistic awareness into pre-training by synergizing masked image modeling (MIM) and contrastive learning (CL). |
X. Gong; Y. Xue; |
| 468 | ORSc: Object-Aware Reinforcement with Semantic Consistency for Hallucination Mitigation in MLLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in visual-language understanding but suffer significantly from object hallucinations, generating descriptions that contain objects inconsistent with the visual input. We present ORSc (Object-aware Reinforcement with Semantic Consistency), a novel framework that addresses this fundamental challenge through three key innovations: a self-supervised Object-aware Self-Verification (OSV) mechanism that eliminates external detector dependency by leveraging the model’s internal attention patterns and hidden state dynamics, providing formal guaranties on verification accuracy; A Semantic Consistency Reinforcement (SCR) module employing multi-relational Graph Attention Networks to explicitly model object relationships with theoretical guaranties on representation stability; A Dynamic Layer-wise Semantic Fusion (DLSF) strategy that integrates knowledge from preceding layers guided by information-theoretic measures of semantic consistency. |
J. He; X. Shi; H. Xie; Y. Zhang; M. Shang; |
| 469 | Dynamic Attention-Aware Shaping for Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Dynamic Attention-Aware Shaping (DAAS), a post-hoc and dynamic method that enhances OOD detection performance. |
J. He; H. Xie; X. Shi; Y. Wang; M. Shang; |
| 470 | TSAR: Scalable Time Series Forecasting Meets Next-Scale Autoregressive Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Time Series AutoRegressive modeling (TSAR), a conditional next-scale autoregressive framework that formulates forecasting as a coarse-to-fine process. |
H. Yang; Y. Bian; R. C. Qiu; Z. Ling; |
| 471 | MeshRF: Residual Fusion of Vertices, Edges, and Faces for Mesh Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the inherent complexity and irregularity of the mesh data present significant challenges for neural networks. To address these challenges, we propose MeshRF, a novel lightweight network that extracts local topological features from multiple dimensions, including vertices, edges, and faces of a mesh, and then performs cross-dimensional fusion by specialized residual convolution and fusion modules. |
G. Zheng; L. Yuan; Y. Han; H. Duan; J. Zhang; G. Zhai; |
| 472 | The Speech Analysis for Neurodegenerative Diseases Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Participants were required to design models capable of detecting and classifying the severity of voice disorders, i.e., dysarthria (Task 1), and predicting the progression of the neurodegenerative disease by forecasting the worsening of dysarthria (Task 2) based on vocal signal analysis. |
G. Sannino; |
| 473 | Cancer of Unknown Primary Prediction Via Semantic Prompting and Tumor Environment-Aware Patch Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Additionally, in weakly supervised settings, these methods focus solely on high-attention tumor-containing patches, overlooking cellular characteristics in non-affected regions. To overcome these limitations, we propose a novel framework for predicting lymph node metastasis of unknown primary, which explores both tumor and non-tumor patches under the guidance of textual descriptions of the overall tumor environment. |
Q. Jia; Q. Bo; S. Yao; Y. Liu; L. Sun; Y. Zhu; |
| 474 | IPI2: Mitigating Indirect Prompt Injections on Unmanned Aerial Vehicle Agents Using Physical Invariants Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present IPI2, a defense framework that mitigates IPI attacks by leveraging UAVs’ physical invariant properties. |
Q. Zhong; S. Liu; J. Liu; K. Pan; Y. Cheng; W. Xu; |
| 475 | HarmoniFuse: A Component-Selective and Prompt-Adaptive Framework for Multi-Task Speech Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Such designs risk task interference and performance degradation, especially under limited data conditions. To address these limitations, we propose HarmoniFuse, a component-selective and prompt-adaptive framework for multi-task speech language modeling. |
Y. Si; R. Yang; Y. Gao; J. Feng; C. Deng; S. Zhang; |
| 476 | TAML: Task-Aware Metric-Driven Meta Learning for Few-Shot Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: With limited samples and high-capacity backbones, adapting these models to learn transferable metric features remains challenging. To address this, we propose TAML, a task-level meta-learning paradigm that constructs meta-metric tasks to explicitly learn metric-relevant representations and enhance generalization. |
W. Wu; Y. Zhang; S. Zheng; X. Zhu; Y. Chen; Y. Dang; |
| 477 | Adverse Effect Removal Network Via Unsupervised Weather Type Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Images captured in traffic scenes under adverse weather such as rain, fog, snow, and nighttime often suffer from low visibility, posing challenges for autonomous driving and urban surveillance. To address this, we propose AERNet, an unsupervised all-weather image enhancement network that restores adverse-weather images to clear conditions. |
Y. Tang; B. Fang; C. Zhao; H. Liu; F. Yan; T. Deng; |
| 478 | Compact Representation Learning for Multimodal Drug-Drug Interaction Event Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose CompactDDI, a novel framework that learns compact intra- and inter-modality representations for improved DDIE prediction. |
Z. Guo; |
| 479 | MAGE: A Coarse-to-Fine Speech Enhancer with Masked Generative Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce MAGE, a Masked Audio Generative Enhancer that advances generative speech enhancement through a compact and robust design. |
T. H. Pham; T. Dat Nguyen; P. T. Tran; J. S. Chung; D. D. Nguyen; |
| 480 | EBAD-GS: Deblurring Gaussian Splatting with Event-Driven Bundle Adjustment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Deblurring Gaussian Splatting with Event-driven Bundle Adjustment (EBAD-GS), a method for high-fidelity 3D scene reconstruction from dual-modal data comprising event streams and severely blurred images. |
Y. Deng; Y. Wang; C. Tang; J. Fan; R. Xiao; J. Lv; |
| 481 | A Neural Operator for Spatiotemporal Significant Wave Height Prediction Based on Spectral Residual Region Partitioning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Data-driven methods provide an attractive and computationally efficient alternative to numerical models for spatiotemporal wave height prediction, as they directly learn input–output mappings in an end-to-end manner and thereby reduce computational costs. |
N. Song; X. Li; J. Wu; M. Ye; X. Liang; J. Nie; |
| 482 | Asynchronous SSVEP-BCI Recognition Via Multi-Start-Point Slice Ensembles and Hard Voting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing asynchronous SSVEP detection approaches face two major limitations: (1) dependence on a single-window decision that fails to utilize temporal information fully; and (2) a single-output mechanism that inadequately distinguishes between intentional control (IC) and non-intentional control (NC) states, resulting in high false positive rates (FPRs). This paper proposes Multi-Start-Point Slice Ensembles with Hard Voting (MSPHV) to alleviate these issues. |
H. Wu; Y. Tu; D. Wu; |
| 483 | Magnitude-Aware Semantics-Regularized Test-Time Adaptation for Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Magnitude-Aware Semantics-Regularized (MASR), a unified TTA framework that stabilizes optimization and enhances semantic discrimination. |
G. Yu; |
| 484 | A Complex-Domain Coil-Slice Sensitive (CSS) Noise Simulation for MRI: Enabling Strong Generalization in Real Denoising Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, a Complex-domain Coil-Slice Sensitive (CSS) noise simulation method is proposed, which integrates coil-sensitive noise activation, slice-sensitive noise level adjustment, and complex-domain noise injection with coil-slice adaptivity. |
J. Wu; J. Guan; X. Tang; K. Tong; F. Ai; X. Zeng; |
| 485 | Benchmarking Gaslighting Attacks Against Speech Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce gaslighting attacks, strategically crafted prompts designed to mislead, override, or distort model reasoning as a means to evaluate the vulnerability of Speech LLMs. |
J. Wu; B. Zhu; X. Zou; Q. Zhang; X. Fang; P. Zhou; |
| 486 | Revisiting The Connection Between MCCA-Genvar and IVA-G: Role of Orthogonality and Deflation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We revisit this connection and demonstrate that the main difference between these methods is in fact not orthogonality but deflation, which is inherent to most mCCA objective functions, including genvar. To show this, we introduce orthogonal IVA-G (o-IVA-G) and deflationary orthogonal IVA-G (d-o-IVA-G) and compare them with IVA-G and mCCA-genvar in simulations inspired by the functional Magnetic Resonance Imaging (fMRI) subgroup identification problem. |
I. Lehmann; B. Gabrielson; T. Adali; |
| 487 | When Differential Privacy Meets Wireless Federated Learning: An Improved Analysis for Privacy and Convergence Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, foundational questions on how to precisely characterize privacy loss remain open, and existing work is further limited by convergence analyses that rely on restrictive convexity assumptions or ignore the effect of gradient clipping. To overcome these issues, we present a comprehensive analysis of privacy and convergence for DPWFL with general smooth non-convex loss objectives. |
Y. Chen; H. Liang; X. Tu; |
| 488 | MVI: High-Resolution Roadside Vehicle Imaging By MMWAVE Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To fill the gap, we propose mVI, the first mmWave-based roadside vehicle imaging system, which reforms the blurry vehicle mmWave point cloud into a clear image utilizing the diffusion model. |
W. Xu; |
| 489 | Power Consumption Reduction in ELAA-Assisted ISAC Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we consider power consumption reduction in extremely large antenna arrays (ELAAs) for integrated sensing and communication (ISAC) applications. |
X. Cao; M. Mohammadi; H. Q. Ngo; M. Matthaiou; |
| 490 | Tabular Synthesis Based on Bi-Directional Feedback Conditional Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, obtaining high-quality synthetic tabular data remains an ongoing challenge due to the complex interplay between continuous and discrete variables, which often leads to the loss of important correlations and the introduction of spurious relationships in the generated data. To address this challenge, we introduce BiFeD, a Bi-directional Feedback Conditional Diffusion model tailored for generating both continuous and discrete variables. |
Q. Zhang; Y. Tang; J. Tian; Y. He; L. Xu; S. Liu; |
| 491 | Multi-View Crowd Counting with Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose SSLCounter, a novel self-supervised learning (SSL) framework for MVC that leverages neural volumetric rendering to alleviate the reliance on large-scale annotated datasets. |
H. Mo; X. Zhang; T. Shi; Z. Wu; |
| 492 | Understanding Generalization in Decentralized Learning: A Time-Uniform and Topology-Aware Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These contradict with the empirical observation that the generalization performance typically stabilizes over training time and depends on network topology. To bridge this gap, in this paper we establish the first non-divergent and topology-aware generalization guarantee for decentralized learning with convex losses, using the tool of perturbed gradient analysis. |
H. Ye; T. Sun; Q. Ling; |
| 493 | Neon: One-Shot Text-To-Video Tuning Via Noise Latent Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose NEON, a diffusion-based method to model the Noise latEnt dynamics for One-shot text-to-video tuNing. |
X. Xiao; S. Lei; Y. Hu; |
| 494 | HCL-CSC: Hierarchical Contrastive Learning with IDS-Aware Character Similarity for Chinese Spelling Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose HCL-CSC, a novel framework with three key innovations: (1) IDS tree-based character similarity modeling that combines structural decomposition with confusion dictionaries to capture precise morphological relationships, (2) multi-granularity contrastive learning operating simultaneously at character, sequence, and consistency levels for comprehensive representation learning, and (3) confusionaware dynamic hard negative mining that intelligently adapts sample selection based on model confidence and character confusion patterns. |
S. Wang; C. Tong; L. Jiang; |
| 495 | Activity Recognition Using Inaudible Acoustic FMCW Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we leverage Frequency-Modulated Continuous Wave (FMCW) of inaudible acoustic signals to achieve accurate and generalizable activity recognition as well as static target detection. |
R. Zhou; |
| 496 | Make Your Move: Make Your 3D Contents By Adapting Multi-View Diffusion Models to External Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a training-free, plug-and-play scheme that aligns edited assets with their original geometry in a single inference run. |
W. Wang; H. Xu; J. Meng; H. Wang; |
| 497 | A Query-Based End-to-End Transformer for Third-Person Human Gaze Analysis Via Joint Fine-Tuning Strategy Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a query-based end-to-end Transformer capable of modeling temporal human gaze behaviors to perform diverse gaze analysis tasks. |
S. Ye; Y. Huang; Z. Wang; F. Fang; G. Zhang; H. Song; |
| 498 | Latent Variable Estimation Via Kernel and Graph for Gaussian Process Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces a novel Gaussian Process Regression (GPR) algorithm that leverages kernel regression to embed input data into a latent variable space, thereby enforcing structural consistency between the covariance of the latent variables and that of the outputs, as represented by their respective graph Laplacian matrices. |
X. Miao; |
| 499 | Multi-Scale Adaptive Neighborhood Awareness Transformer for Graph Fraud Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the inherent inductive bias of GNNs, including the homogeneity assumption and the limited global modeling ability, hinder the effectiveness of these models. To address these challenges, we propose Multi-scale Neighborhood Awareness Transformer (MANDATE), which alleviates the inherent inductive bias of GNNs. |
J. Lv; Q. Du; Y. Zhang; Y. Han; S. Li; |
| 500 | Explaining Face Verification Decisions with Pairwise Facial Feature Explanation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Intrinsic facial features, such as the eyes, mouth, and eyebrows, are more intuitive for humans, indicating that explanations focusing on these facial features may align better with human perception. Inspired by this, we propose the FS-PFFE interpretability framework for face verification, providing pairwise facial feature explanations with improved comprehensibility over pixel-wise methods. |
S. Li; Q. Du; J. Zhang; J. Lv; |
This table only includes 500 papers selected by our daily digest algorithm. To continue with the full list (~4,500 papers), please visit Paper Digest: ICASSP-2026 (Full List).