Paper Digest: ICASSP 2026 Papers & Highlights

May 4, 2026July 23, 2026 admin

The IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) is one of the top signal processing conferences in the world. In 2026, it is to be held in Palo Alto. To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights to quickly get the main idea of each paper.

Search within ICASSP-2026

Literature review on a topic

Generate a written review of ICASSP-2026 research on any topic, with each claim cited to specific papers.

Browse & explore

Browse ~ 5,700 authors (ICASSP-2026), or explore the “Best Paper” Digest listing the most influential ICASSP papers of recent years.

Note: ICASSP-2026 accepts more than 4,500 papers, this page only includes 500 of them selected by our daily paper digest algorithm. Interested users can choose to read All 4,500 ICASSP-2026 papers in a separate page, which takes quite some time to load.

Since 2018, Paper Digest has built a foundation of data spanning decades of conferences, journals, and research topics. The platform features a daily digest service that sifts through tens of thousands of new papers, clinical trials, news articles, and community posts, filtering the noise to highlight what matters most to specific interests. Beyond daily updates, dozens of built-in research tools streamline the academic workflow, supporting efficient reading and writing, comprehensive literature reviews, and automated research report generation.

Paper Digest Team
New York City, New York, 10017
team@paperdigest.org

TABLE 1: Paper Digest: ICASSP 2026 Papers & Highlights

	Paper	Author(s)
1	Quality Enhancement for Anomaly Detection Via Injective Linear Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While deep learning methods, including CNNs and Transformers, have achieved strong performance in this area, their effectiveness degrades on compressed inputs commonly encountered in real-world scenarios due to bandwidth and storage constraints. To address this, we propose an injective linear attention-based quality enhancement framework for anomaly detection.	Z. Ma; H. R. Tohidypour; P. Nasiopoulos; V. C. M. Leung;
2	Dual-Guided Generative Frame Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose dual-guided generative frame interpolation (DGFI), a framework that integrates semantic guidance from vision-language models and flow guidance into a pre-trained diffusion-based image-to-video (I2V) generator.	Y. Wei; H. Amirpour; C. Timmerer;
3	BINR: Live Video Broadcasting Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While the Quality of Experience (QoE) has been extensively studied for Video-on-Demand (VoD) services, the QoE of live broadcast videos remains relatively underexplored. In this paper, we address this gap by proposing a novel machine learning–based model for QoE prediction in live video broadcasting scenarios.	H. Amirpour; M. Hamidi; W. Zhou; L. Atzori; C. Timmerer;
4	DVT-AD: Discriminative Vision Transformers for Scalable Unsupervised Anomaly Detection Via Simple Self-Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce Discriminative Vision Transformers for Scalable Unsupervised Anomaly Detection via Simple Self-Distillation (DVT-AD), a simple yet highly effective self-distillation framework.	M. Wong; C. A. Da Costa Filho; G. Munro; O. Dukor; A. Judi; M. Lawson;
5	RCAL: Reinforced Cross-Modal Alignment for Multimodal Sentiment Analysis with Sparse Visual Frames Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose RCAL, a vision-centric framework explicitly designed for MSA under extreme visual sparsity.	X. Song; X. Tao; J. Wu; T. T. Khoei;
6	Zero-Shot TTS with Enhanced Audio Prompts: BSC Submission for The 2026 WildSpoof Challenge TTS Track Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To handle acoustic noise, we implement a multi-stage enhancement pipeline using the Sidon model, which significantly outperforms standard Demucs in signal quality.	J. Giraldo; A. Peiró-Lilja; R. Zevallos; C. España-Bonet;
7	SpeechCT-CLIP: Distilling Text-Image Knowledge to Speech for Voice-Native Multimodal CT Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet, nearly all medical AI systems rely exclusively on written text. In this work, we address this gap by exploring the feasibility of learning visual-language representations directly from spoken radiology reports.	L. Buess;
8	Heatmap-to-SMPL Multi-View Radar Transformer for Multi-Person 3D Pose Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, by retaining heatmap fidelity and simultaneously exploiting shape priors, we propose RHAMP: a Radar HeAtmapto-SMPL Pose transformer for 3D human pose estimation.	S. Kato; P. P. Wang; T. Fujihashi; A. Markham;
9	DA-VLM: Data Factory with Minimal Effort Using VLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, it often requires costly training or compromises performance. We address these limitations by proposing a novel automated pipeline that combines pre-trained ControlNet and Vision-Language Models to generate pixel-level labelled realistic images without additional training or manual annotations.	J. Ye; J. -X. Zhong; Q. Xie; Y. Zhou; N. Trigoni; A. Markham;
10	Identifying Birdsong Syllables Without Labelled Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we build the first fully unsupervised algorithm to decompose birdsong recordings into sequences of syllables.	M. Teng; J. Boussard; D. Rolnick; H. Larochelle;
11	SSUN: Symmetric Cross-Stage State Interaction Deep Unrolling Network for Hyperspectral and Multispectral Image Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although recent approaches have incorporated the deep unrolling network (DUN) to enable more explainable reconstruction, their performance remains constrained by weak cross-stage state dependencies between iterative steps. To handle this limitation, this paper proposes a symmetric cross-stage state interaction deep unrolling network (SSUN) for HS-MS image fusion, with a focus on enhancing long-range dependencies across successive stages.	X. Shen;
12	WiRAG: Retrieval-Augmented Generation with Large Language Models (LLM) Framework for WiFi-Based Human Activity Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose WiRAG, a retrieval-augmented generation (RAG) with large language model (LLM) framework for WiFi-based human activity recognition (HAR).	X. Shen;
13	An AMP-Based Asymptotic Analysis for Nonlinear One-Bit Precoding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The considered scheme employs a convex-relaxation-then-quantization (CRQ) approach to the well-known minimum mean square error (MMSE) model, which includes the classical one-bit precoder SQUID as a special case. To analyze its asymptotic behavior, we develop a novel analytical framework based on approximate message passing (AMP).	Z. Wu; J. Ma; Y. -F. Liu; B. Clerckx;
14	Channel Estimation for Holographic MIMO Systems with Mutual Coupling Awareness Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In contrast to traditional MIMO systems, the Toeplitz structure of MC matrix does not persist in HMIMO systems due to the more intricate impedance characteristics, leading to a substantial increase in the number of parameters to be estimated. To address this issue, we propose an approximate unitary diagonalization method for the MC matrix based on plane wave decomposition.	A. Tang; S. Song; C. -Y. Tsui; R. C. de Lamare; M. Debbah;
15	SWAN: Boosting Image Super-Resolution with Stochastic Wavelet Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a Stochastic Wavelet Attention (SWA) mechanism that efficiently models global-local dependencies in both spatial and frequency domains.	S. Xiong;
16	MMFast: Rethinking Vision-Language Interaction in Efficient MLLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work investigates the fusion dynamics within auto-regressive MLLMs and reveals that critical fine-grained interactions occur predominantly in intermediate layers, while early and late layers exhibit significant redundancy. Motivated by these insights, we propose MMFast, a novel MLLM architecture that achieves a superior trade-off between efficiency and performance.	S. Xiong;
17	SLAP: Scalable Language-Audio Pretraining with Variable-Duration Audio and Multi-Objective Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Third, the standard contrastive training objective operates on global representations, which may hinder the learning of dense, fine-grained audio features. To address these challenges, we introduce Scalable Language-Audio Pretraining (SLAP), which scales language-audio pretraining to 109 million audio-text pairs with variable audio durations and incorporates multiple training objectives.	X. Mei;
18	Multi Stage Training with Dynamic Data Balancing for Multilingual Speech Recognition and Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Training large-scale multilingual speech models is often hindered by severe data imbalances across tasks, languages, and corpora. We introduce a systematic, multi-stage training framework to ad-dress this challenge.	N. Koluguri; M. Sekoyan; N. Tadevosyan; N. Karpov; J. Balam; B. Ginsburg;
19	PPDD: A Unified Push–Pull Adversarial Objective in Feature and Logit Spaces for Dataset Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose PPDD, a unified Push–Pull objective that aggregates gradients into a single update: a Push term that maximizes reverse KL in logit space to mine low-density, high-uncertainty boundary regions, and two Pull terms that anchor fidelity via feature space MSE and semantic calibration.	H. Huang; Y. Zhang; J. Song; W. Zhao; P. Ren;
20	FastEagle: Cascaded Drafting for Accelerating Speculative Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present FastEagle, a non-autoregressive cascaded drafter that emits an entire draft in a single forward pass.	H. Huang; J. Song; W. Zhao; P. Ren;
21	FinUA: Generating Diverse User Interactions for Financial Dialogue Systems Through User Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing simulation methods are unsuitable for this domain, as they generate data that is homogenized, factually inaccurate, and stylistically monotonous. To address these shortcomings, we propose the Financial User Agent (FinUA), a novel simulator incorporates two key mechanisms: a dialogue goal divergence mechanism to generate diverse and factually grounded goals, and a profile augmentation method to imbue simulated users with authentic linguistic habits and irrational behaviors.	S. Dou;
22	DSVM-UNET : Enhancing VM-UNET With Dual Self-Distillation For Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a simple yet effective approach to improve the model by Dual Self-distillation for VM-UNet (DSVM-UNet) without any complex architectural designs.	R. Shao;
23	SA-SSL-MOS: Self-Supervised Learning MOS Prediction with Spectral Augmentation for Generalized Multi-Rate Speech Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While self-supervised learning (SSL) models have been widely adopted in SQA to boost performance, a key limitation is that they are pre-trained on 16 kHz speech and therefore discard high-frequency information present in higher sampling rates. To address this issue, we propose a spectrogram-augmented SSL method that incorporates high-frequency features (up to 48 kHz sampling rate) through a parallel-branch architecture.	F. Cao;
24	Physigen: Integrating Collision-Aware Physical Constraints for High-Fidelity Human-Human Interaction Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a general-purpose and computationally efficient optimization strategy named PhysiGen to explicitly integrate collision-aware physical constraints for human-human interaction generation.	N. Lei;
25	HADEN: Hierarchical Attentive Alignment and Dual-Contrastive Enhancement Network for Multimodal Few-Shot Relation Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Multimodal few-shot relation extraction (MM-FSRE) requires fusing textual and visual information to identify new relations under low-resource scenarios, but existing methods suffer from inadequate modal alignment and heavy data reliance. To address this, we propose the HADEN framework, which employs a CrossModal-Hierarchical Attention (CHA) module for dynamic alignment of multi-layer semantics and Dual-Perspective Contrastive Learning (DPCL) to enhance feature clustering.	Z. Ni; H. Li; Y. Sun;
26	DyWPE: Signal-Aware Dynamic Wavelet Positional Encoding for Time Series Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Dynamic Wavelet Positional Encoding (DyWPE), a novel signal-aware framework that generates positional embeddings directly from input time series using the Discrete Wavelet Transform (DWT).	H. Irani; V. Metsis;
27	Signal-Driven Joint Safety-Comfort Objective for Real-Time Trajectory Replanning on Rutted Roads Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a signal-driven joint objective for real-time trajectory replanning on rutted roads.	X. Shen; K. Li; H. Hu; Z. Zhang; N. Tang;
28	Dual Prototype Learning and Multi-Stream Perturbation for Robust Semi-Supervised Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite significant progress in this field, current methods still face challenges such as insufficient feature space constraints and pseudo-label noise, which limit further improvement of model performance. To address these limitations, we propose a robust semisupervised medical image segmentation method via dual prototype learning and multi-stream perturbation.	G. Du; J. Xu; R. Wu; X. Zeng; S. Xiong;
29	Staged Diffusion with Hybrid Mixture-of-Experts (MOE) for Multimodal Sentiment Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Humans resolve such ambiguity through a learning-before-correction strategy: first aligning facial expressions, vocal tone, and speech, then using this knowledge to infer or correct meanings. To mimic this process, we propose SDHM (Staged Diffusion with Hybrid Mixture-of-Experts), a two-stage framework.	K. Zheng; G. Sheng;
30	Multi-Band Frequency Prompt Tuning for Source-Free Cross-Domain Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Source-Free Cross-Domain Few-Shot Learning (SF-CDFSL) suffers from the absence of source-domain data and the scarcity of target-domain samples, which makes it challenging to transfer domain knowledge and to learn discriminative representations for novel classes in the target domain. To address these issues, We propose Multi-Band Frequency Prompt Tuning (MB-FPT), a prompt-based framework that simultaneously aligns domain information and enhances class discrimination.	R. Wu; S. Xiong;
31	AURA: YCbCr-Based Universal Raw-Reconstruction for Inverse ISP Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods struggle to balance robustness and generalization. To address this, we propose AURA, a universal RAW reconstruction architecture that requires no camera metadata.	H. Cheng;
32	NMGE: Nested Multi-Granularity Expert Groups for Complexity-Aware Routing in Multilingual Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Nested Multi-Granularity Expert Groups (NMGE), a novel MoE architecture where experts are organized into groups of varying sizes in a nested structure.	L. Shao;
33	GLAP: General Contrastive Audio-Text Pretraining Across Domains and Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Current CLAP methods enable sound and music retrieval in English, ignoring multilingual spoken content. To address this, we introduce general language audio pretraining (GLAP), which expands CLAP with multilingual and multi-domain abilities.	H. Dinkel;
34	FocalCodec-Stream: Streaming Low-Bitrate Speech Coding Via Causal Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present FocalCodec-Stream, a hybrid codec based on focal modulation that compresses speech into a single binary codebook at 0.55 – 0.80 kbps with a theoretical latency of 80 ms. Our approach combines multi-stage causal distillation of WavLM with targeted architectural improvements, including a lightweight refiner module that enhances quality under latency constraints.	L. D. Libera; C. Subakan; M. Ravanelli;
35	Generalizable Detection of Audio Deepfakes Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present our comprehensive study aimed at enhancing the generalization capabilities of audio deepfake detection models.	J. A. Lopez; G. Stemmer; H. C. Maruri;
36	Vision Meets Language: Adaptive Joint Pruning for Efficient Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing pruning methods alleviate redundancy but remain limited: attention-based strategies may discard task-critical regions, while text-guided approaches risk overlooking implicitly important information. To address this, we propose the first visual-text joint pruning framework, which integrates visual attention distributions with text-aware signals to more reliably identify and remove redundant tokens.	G. Wu;
37	Channel Prediction Under Network Distribution Shift Using Continual Learning-Based Loss Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work addresses catastrophic forgetting in channel prediction by proposing a continual learning framework based on loss regularization.	M. A. Mohsin;
38	Conditional Prior-Based Non-Stationary Channel Estimation Using Accelerated Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work proposes conditional prior-based diffusion for channel estimation, which learns a history-conditioned score to denoise noisy channel snapshots.	M. A. Mohsin;
39	Compositional Image Synthesis with Inference-Time Scaling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite their impressive realism, modern text-to-image models still struggle with compositionality, often failing to render accurate object counts, attributes, and spatial relations. To address this challenge, we present a training-free framework that combines an object-centric approach with self-refinement to improve layout faithfulness while preserving aesthetic quality.	M. Ji; S. Lee; N. Ahn;
40	Multi-View Spectral Clustering with Adaptive Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by the prior, the ground truth cluster assignment matrix of high-dimensional data can always be embedded in the linear space of the data, we propose the Multi-view spectral Clustering with Adaptive Regression (MCAR) framework.	Q. Qiang; B. Zhang; Y. Hua;
41	Maximum Entropy-Based Efficient Fuzzy Graph Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Graph Clustering faces critical challenges in balancing computational efficiency and uncertainty quantification. To address these issues, we propose a novel fuzzy graph clustering framework, termed Maximum Entropy-Based Efficient Fuzzy Graph Clustering (MEFC), which establishes an explicit connection between graph clustering and fuzzy clustering under an anchor graph setting.	Q. Qiang; B. Zhang; Y. Hua;
42	HILO: Hierarchical Feature Fusion Via Local-Global Attention for Multimodal Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, due to their inherent linguistic bias, visual information is often underrepresented during cross-modal fusion, which limits their overall multimodal representation capability. To mitigate this issue, we propose HILO, a novel vision-language architecture specifically designed for multimodal embeddings.	X. Zuo;
43	FEDPROTOALIGN: Federated Prototype Alignment Under Identity Inconsistency for Gait Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Beyond non-IID shifts from cross-camera viewpoints, federated gait recognition uniquely faces identity-space inconsistency, where the same person receives inconsistent labels across clients (e.g., cameras), degrading discriminative representations. To address these issues, we propose FedProtoAlign (FPA), a federated framework for unsupervised, identity-aware representation learning under disjoint identity spaces.	C. Lin;
44	ECSA: Dual-Branch Emotion Compensation for Emotion-Consistent Speaker Anonymization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these methods often degrade the emotional information in speech, limiting their reliability in emotion-sensitive scenarios. To mitigate this issue, we propose an emotion-preserving speaker anonymization framework.	C. Lin;
45	Domain-Adversarial Eat With Lora Fine-Tuning For ESDD 2026 Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Aiming at the generalization issue of detection models for unseen synthetic audio, we propose a solution combining LoRa fine-tuning, domain adversarial training, MoE(Mixture of Experts), and ArcFace loss.	F. Wei;
46	MMRQA: Signal-Enhanced Multimodal Large Language Models for MRI Quality Assessment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Traditional approaches face fundamental trade-offs: signal-based methods like MRIQC provide quantitative metrics but lack semantic understanding, while deep learning approaches achieve high accuracy but sacrifice interpretability. To address these limitations, we introduce the Multimodal MRI Quality Assessment (MMRQA) framework, pioneering the integration of multimodal large language models (MLLMs) with acquisition-aware signal processing.	F. Jia;
47	A State-Dependent Markov Diffusion Process for Generative Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a State-Dependent Markov Diffusion Process (SDMDP) with an adaptive transition rate that responds to the characteristics of input noise, thereby improving convergence and performance.	Y. Iqbal; T. Zhang; A. Iqbal; X. Zhao; Y. Geng;
48	Robust CPD-Based DOA Estimation for Rotating Distributed Array Systems Under Inter-Node Calibration Error Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Meanwhile, the rotating array configuration exacerbates the coupling of directional vectors, making it challenging for traditional matrix-based methods to achieve effective decoupling. To address these issues, we propose a robust canonical polyadic decomposition (CPD)based DOA estimation algorithm that constructs tensor modeling for RDAS.	Z. Xu; C. Zhou; Z. Shi;
49	Dual-Guided Multi-Granularity Implicit Alignment Network for Medical Visual Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing methods often neglect the challenge of multi-modal alignment due to the data heterogeneity. To address this issue, we propose a dual-guided multi-granularity implicit alignment network (Med-MGIA) that establishes cross-modal correlations without bounding box annotations.	Q. Teng; J. Chen; D. Yuan; Y. Liu; Z. Liu;
50	Flowiid: Single-Step Intrinsic Image Decomposition Via Latent Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This makes them costly to combine with other models in real-world settings. To address this problem, we propose a flow matching-based solution.	M. Singla; S. Kumari; S. Raman;
51	Fed-MET: Memory-Efficient Elastic Training in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Fed-MET (Federated Memory-Efficient Elastic Training), a new FL framework that enables elastic training across multiple memory-constrained devices by freely choosing trainable NN modules.	C. Miao; T. Chang; M. Wu; Y. Zha; J. Peng; X. Wang;
52	Contextual Biasing for ASR in Speech LLM with Common Word Cues and Bias Word Position Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we explore contextual biasing in SLLM based on acoustic cues associated with a set of common words whose pronunciations are partially similar to those of the target bias words.	S. Novitasari; T. Fukuda; G. Kurata; G. Saon;
53	Deformable Attention Graph Representation Learning for Histopathology Whole Slide Image Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a novel GNN framework with deformable attention for pathology image analysis.	M. Fu;
54	MFF-RVRDI: Multimodal Fusion Framework for Robust Video Recording Device Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose MFF-RVRDI, a multimodal framework that fuses video and audio features for robust device identification.	W. Li; Y. Cao; X. Shen;
55	Hashing-Baseline: Rethinking Hashing in The Age of Pretrained Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce Hashing-Baseline, a strong training-free hashing method leveraging powerful pre-trained encoders that produce rich embeddings.	I. Moummad; K. Zaher; L. Rauch; A. Joly;
56	Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This limitation arises from three factors: (1) existing commonly used speech encoders, like the Whisper family, underperform in low-resource languages and lack support for broader spoken language understanding tasks; (2) the ASR-based alignment paradigm requires training the entire SLLM, leading to high computational cost; (3) paired speech–text data in low-resource languages is scarce. To overcome these challenges in the low-resource language Thai, we introduce XLSR-Thai, the first self-supervised learning (SSL) speech encoder for Thai.	M. Shao;
57	FDCNet: Frequency Domain Channel Attention and Convolution for Lipreading Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In lipreading, conventional frontend frameworks primarily extract features in the spatial domain, which limits their ability to process mixed-frequency visual signals containing both low-frequency macroscopic lip shapes and high-frequency details, leading to insufficient extraction of critical information. To address this challenge, we propose a frequency-domain collaborative network, FDCNet.	Q. Yan; Q. Zhang; L. Zhang; L. Yu; L. Sheng;
58	Universal Denoising Patterns for Diffusion Image Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Considering that the fake images closely resemble real ones, we propose a feature separation loss to enhance detector’s discrimination capacity.	Y. Qian; Q. Cai; Y. Pan; T. Yao; Y. Chen; T. Mei;
59	Revisiting The Connection Between MCCA-Genvar and IVA-G: Role of Orthogonality and Deflation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We revisit this connection and demonstrate that the main difference between these methods is in fact not orthogonality but deflation, which is inherent to most mCCA objective functions, including genvar. To show this, we introduce orthogonal IVA-G (o-IVA-G) and deflationary orthogonal IVA-G (d-o-IVA-G) and compare them with IVA-G and mCCA-genvar in simulations inspired by the functional Magnetic Resonance Imaging (fMRI) subgroup identification problem.	I. Lehmann; B. Gabrielson; T. Adali;
60	Efficient Segment Anything with Depth-Aware Fusion and Limited Training Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a lightweight RGB-D fusion framework that augments EfficientViT-SAM with monocular depth priors.	Y. Zhou; X. Xie; P. Li; A. Kunz; A. Osman; X. Maldague;
61	Closed-Form Ziv-Zakai Bound for Compressive Time Delay Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Compressed sensing (CS) techniques have been adopted to time delay estimation problems, allowing the utilization of wider band signals for improved performance.	S. Wen; Z. Zhang; C. Zhou; Z. Shi;
62	S-PHiNe: Physics-Informed Multichannel Speech Enhancement Using Spectro-Spatial Fusion for Low-SNR Conditions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This study proposes S-PHiNe, a neural physics-informed framework integrating deep graph convolutional network (DGCN)-based spatial embedding for steering vector estimation, a complex beamforming neural network (CBNN) for spectral mask estimation, and physics-informed MVDR beamforming within end-to-end physics-constrained training.	S. Afrifa; T. Zhang; P. Appiahene; V. Varadarajan; Y. Geng;
63	KPMG: A Graphical Koopman-Mamba Approach for Financial Markets Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, current models suffer from disturbances and impurities embedded in the financial data. To address these challenges, we propose KPMG, an efficient architecture that integrates the strengths of Mamba and Graph Neural Networks.	S. Xiong; C. Tang; F. Okubo; T. Minematsu; Y. Hu; A. Shimada;
64	GlucoMixer: An Efficient Glucose Monitoring Model with Mixers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To strike a balance between accuracy and trustworthiness, we propose GlucoMixer, an Encoder-only architecture built predominantly with Mixer modules.	S. Xiong; J. Wang; T. Sun; C. Tang; F. Okubo; A. Shimada;
65	Interval-Aware Retrieval Framework For Speech-Based Automatic Alzheimer’s Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing systems typically insert symbolic pauses or attach acoustic features, followed by simple fusion, which weakens token-level alignment and lacks a normative reference for healthy timing. To address these issues, this paper proposes an interval-aware retrieval framework that explicitly incorporates temporal knowledge into speech-based AD detection.	M. Gu;
66	LVD-GS: Gaussian Splatting Slam for Dynamic Scenes Via Hierarchical Explicit-Implicit Representation Collaboration Rendering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing methods often rely on a single representation scheme, which limits their performance in large-scale dynamic outdoor scenes and leads to cumulative pose errors and scale ambiguity. To address these challenges, we propose LVD-GS, a novel LiDAR-Visual 3D Gaussian Splatting SLAM system.	W. Zhu;
67	Learning Reference-Guided Exposure Correction With Hybrid Illumination Characteristics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present HICNet, a reference-guided exposure correction framework.	H. Ren; Z. Bi; Z. Wan; H. Cheng;
68	Recovering Performance in Speech Emotion Recognition from Discrete Tokens Via Multi-Layer Fusion and Paralinguistic Feature Integration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents a comprehensive investigation of discrete tokens for SER.	E. Sun; A. R. Naini; C. Busso;
69	Denoising Diffusion Model for DOA Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The proposed model is trained on signals with a fixed number of sources, yet can generalize to scenarios with a variable number of sources.	F. Qian; C. Zhou; Z. Shi;
70	Diffusion-Link: Diffusion Probabilistic Model for Bridging The Audio-Text Modality Gap Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Diffusion-Link, a diffusion-based modality-bridging module that generatively maps audio embeddings into the text-embedding distribution.	K. Nam; J. Choi; H. Lee; J. Heo; J. S. Chung;
71	ReTools: Reflection-Enhanced Tool Invocation for Domain-Specific QA Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing approaches such as ReAct and RestGPT partially mitigate these issues yet remain limited in handling multi-step reliability, iterative recovery, and domain robustness. To address these gaps, we propose ReTools, a Tree-of-Thoughts (ToT) based framework that integrates three modules: (1) task planning, which decomposes complex queries into executable subtasks; (2) tool planning, which selects tools and generates accurate parameters while supporting reflective correction; and (3) reflective iteration, which monitors execution results and adapts to domain-specific requirements.	F. Dong;
72	A Game-Theoretic Approach for Distributed MEC-Enabled Collaborative Inference in AIGC Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Specifically, we explore a distributed MEC-enabled AIGC network with heterogeneous GDMs, where multiple MUs each hold a local GDM and an ES hosts multiple GDMs.	L. Ye; Z. Xiong; L. Gao; D. Niyato;
73	IODRESEARCH: Deep Research on Private Heterogeneous Data Via The Internet of Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose IoDResearch (Internet of Data Research), a private data-centric Deep Research framework that operationalizes the Internet of Data paradigm.	Z. Shi; Z. Guo; X. Ma; G. Huang; Y. Ma; X. Jing;
74	Generative Spatiotemporal Modeling for Uncertainty Quantification in High-Dimensional Physical Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Traditional deterministic models fail to capture this, collapsing to blurry, physically implausible mean-state predictions and offering no measure of confidence. We introduce Prism, a generative spatiotemporal framework that directly addresses this by learning the probability distribution of future states.	F. Liu;
75	Multimodal Multi-Agent Empowered Legal Judgment Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Traditional methods often rely on statistical analyses or role-based simulations but face challenges with multiple allegations, diverse evidence, and lack adaptability. In this paper, we introduce JurisMMA, a novel framework for LJP that effectively decomposes trial tasks, standardizes processes, and organizes them into distinct stages.	Z. Kang;
76	Ister: Linear Transformer for Efficient Multivariate Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, their widespread adoption is hindered by the quadratic computational complexity of self-attention, which limits scalability on high-dimensional sequences. To address this challenge, we propose the Inverted Seasonal-Trend Decomposition Transformer (Ister), a novel architecture that enhances both predictive accuracy and computational efficiency.	F. Cao; S. Yang; Z. Chen; Y. Liu; L. Cui;
77	SS-JDSC: Single-Speaker Japanese Dysarthric Speech Corpus Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing dysarthric-speech corpora have facilitated research progress, but suffer from two major limitations: restricted language coverage and limited data per speaker. In this paper, the SS-JDSC, the first open-source corpus of Japanese dysarthric speech for automatic speech recognition (ASR), is presented to address these challenges.	A. Ogasawara; S. Takamichi; J. Yang; G. Suenaga; Y. Tan;
78	BSMP-SENet:Band-Split Magnitude-Phase Network for Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a frequency-aware magnitude–phase speech enhancement framework that incorporates learnable subband decomposition, multi-scale temporal modeling, and adaptive cross-band integration within a compact backbone.	X. Ju;
79	LAMB: LLM-Based Audio Captioning with Modality Gap Bridging Via Cauchy-Schwarz Divergence Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, prior approaches that project audio features into the LLM embedding space without considering cross-modal alignment fail to fully utilize these capabilities. To address this, we propose LAMB, an LLM-based audio captioning framework that bridges the modality gap between audio embeddings and the LLM text embedding space.	H. Lee; J. Choi; K. Nam; J. S. Chung;
80	SPADE: Structured Pruning and Adaptive Distillation for Efficient LLM-TTS Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The goal of this paper is to introduce SPADE, a framework for Structured Pruning and Adaptive Distillation for Efficient Large Language Model-based text-to-speech (LLM-TTS).	T. D. Nguyen; J. Kim; J. -H. Kim; S. Choi; Y. Lim; J. S. Chung;
81	Securing INR-Based Steganography with Quantum Circuit-Driven Weight Initialization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing methods rely on low-entropy seeds to locate hidden information, leaving them vulnerable to brute-force attacks. To address this, we propose a novel parameterized quantum circuit-based initialization scheme.	Q. Song; H. Han; Z. Luo; J. Qi; R. Wan;
82	Training Quantized Spiking Neural Networks with Low-Bit Gradients Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Through analysis, we reveal the gradient channel heterogeneity issue that causes severe quantization error. To mitigate this, we propose dual-path gradient quantization (DP-GQ), applying distinct gradient quantization strategies to the synaptic weight pathway and input spike pathway.	X. Deng;
83	Predicting Emotions in Dialogue Responses By Modeling Implicit Factors Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel framework that integrates personality traits and initial emotion, and also infers user emotion, scene, and topic from the dialogue context as intermediate factors for emotion prediction. This multi-factor approach enables more accurate and context-aware emotional response generation.	Q. Dai;
84	MeanVC: Lightweight and Streaming Zero-Shot Voice Conversion Via Mean Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we propose MeanVC, a lightweight and streaming zero-shot VC approach.	G. Ma;
85	The ICASSP 2026 Automatic Song Aesthetics Evaluation Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper summarizes the ICASSP 2026 Automatic Song Aesthetics Evaluation (ASAE) Challenge1, which focuses on predicting the subjective aesthetic scores of AI-generated songs.	G. Ma;
86	Learning Explicitly Conditioned Sparsifying Transforms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Unlike existing approaches from the literature, our paper presents a new sparsifying transform model that explicitly controls both the data representation quality and the condition number of the learned transforms.	A. Pătrașcu; C. Rusu; P. Irofti;
87	CosAge: Federated Learning with Gradient Summaries for Centralized Client Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Within this framework, we propose COSAGE, a hybrid centralized policy that combines Age of Information (AoI) with gradient dissimilarity computed from a proxy update via the cos4 metric.	H. Asgari; S. Rini; A. Munari;
88	Parameter Adaptation in Hidden Markov Models with Equal Exit Probabilities Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Parameter-Adaptive α-HMM, which jointly infers α and the hidden state.	H. Pu; D. Sui; S. Vlaski; S. Leng;
89	DISCERN: Discrepancy Learning for Weakly Supervised Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, their performance often degrades in medical imaging due to insufficient consideration of medical characteristics, such as distributional discrepancies, ambiguous boundaries, and structural interference. To address these issues, we propose an innovative discrepancy learning model, DISCERN, which harnesses distribution discrepancies to enhance the localization of medical regions of interest.	G. Su;
90	DARL-CLIP: Density-Adaptive and Reinforcement Fine-Tuning CLIP for Cross-Scenario UAV Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes DaRL-CLIP, a density-adaptive and reinforcement fine-tuning CLIP agent, to enable robust cross-condition generalization of UAV object detection under imbalanced scenario distributions and limited scene diversity.	C. Guo;
91	WaterFlow: Explicit Physics-Prior Rectified Flow for Underwater Saliency Mask Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose WaterFlow, a rectified flow-based framework for underwater salient object detection that innovatively incorporates underwater physical imaging information as explicit priors directly into the network training process and introduces temporal dimension modeling, significantly enhancing the model’s capability for salient object identification.	R. Li; S. Lian; H. Li; Y. Li; W. Wu; S. Kwong;
92	Bridging SAR and Optical Domains: Synergizing Brownian Bridge Diffusion and Local Contrastive Learning for Image Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Yet, this task remains challenging due to intrinsic sensor limitations (e.g., speckle noise, complex scattering) and algorithmic constraints (e.g., GAN instability, structural degradation of the diffusion models). To address the above challenges, a novel approach named LCCBBDM (Local Contrastive Conditional Brownian-Bridge Diffusion Model) is proposed in this paper, which synergizes the conditional Brownian-bridge diffusion model with local contrastive learning.	Z. Dai; C. Huo; Z. Ren;
93	A Conversational Entity Linking Method Based on Sentence Level and Token Level Dual Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: With the development of intelligent assistants, dialogue systems have become increasingly important. Understanding user utterances is crucial for promoting human-machine …	H. Cheng; S. Li; H. Zhang; M. Fang; S. Liu;
94	RADI: A Retrieval-Augmented Dynamic In-Context Learning Framework for AIGC Image Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While the sophisticated semantic reasoning capabilities of Multimodal Large Language Models (MLLMs) make them theoretically well-positioned for this challenge, their practical application is hampered by performance instability in zero-shot and few-shot contexts. To address this limitation, we propose RADI, a training-free Retrieval-Augmented Dynamic In-Context Detecting Framework.	T. Bi; R. Ma; Y. Huang; Y. Wang; J. Liu; S. Zhang;
95	Mitigating Data Replication in Text-to-Audio Generative Diffusion Models Through Anti-Memorization Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A persistent challenge in generative audio models is data replication, where the model unintentionally generates parts of its training data during inference. In this work, we address this issue in text-to-audio diffusion models by exploring the use of anti-memorization strategies.	F. Messina; F. Ronchini; L. Comanducci; P. Bestagini; F. Antonacci;
96	Image-Pixel Realignment for Open-Vocabulary Semantic Segmentation Via Self-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce STSeg, a novel framework that integrates pixel-level semantic alignment with adaptive self-training.	A. Yang; Q. Liu; Y. Fan; Q. Zhou;
97	PE-Sleuth: Program-Level Semantics and Static Feature Fusion for Interpretable Ransomware Detection with LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing PE-based static detection methods suffer from limited feature coverage, weak generalization, and poor interpretability. To address these challenges, we propose PE-Sleuth, a framework that fuses program-level semantics with static features, while leveraging large language models (LLMs) for classification and rationale generation.	H. Dai;
98	Federated Camouflaged Poisoning Attack in Federated Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose FedCPA, a camouflaged poisoning attack that keeps models benign during federated training and activates only after FU removes the camouflage carrier.	W. Lai; Q. Yan; S. Liang; K. Zhong;
99	DFMAD: Data-Free Backdoor Defense for Federated Learning Via Multi-Teacher Adversarial Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Data-Free Backdoor Defense for FL via Multi-Teacher Adversarial Distillation (DFMAD), which requires no real data.	K. Zhong; Q. Yan; W. Lai;
100	Decoding Neural Mechanisms of Emotional Processing in Tinnitus: ERP and Gamma-Band EEG Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We investigated neural responses to negative emotion using EEG in 40 CST patients and 31 healthy controls (HC).	J. Xia;
101	Debatecti: Enhancing ATT&CK Technique Identification in CTI Reports Via A Role-Specialized Multi-Agent Debate Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Accurately automating the analysis of Cyber Threat Intelligence (CTI) reports to identify MITRE ATT&CK techniques remains a critical challenge due to the labor-intensive nature of manual mapping and the limitations of existing NLP and LLM-based methods, which often suffer from hallucinations, incoherent reasoning, and knowledge isolation. To address these challenges, we propose DebateCTI—a novel framework that integrates a multi-agent debate mechanism with parameter-efficient fine-tuning.	J. Xia;
102	Transfer Learning for Paediatric Sleep Apnoea Detection Using Physiology-Guided Acoustic Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes a transfer learning framework that adapts acoustic models pretrained on adult sleep data to paediatric OSA detection, incorporating SpO2-based desaturation patterns to enhance model training.	C. Niu;
103	In-Sync: Adaptation of Speech Aware Large Language Models for ASR with Word Level Timestamp Predictions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we extend an existing speech-aware language model to predict timestamps directly alongside transcripts.	X. Fan; V. Sunder; S. Thomas; M. Hasegawa-Johnson; B. Kingsbury; G. Saon;
104	Towards 2D Texture Binding Via Personalized Text-to-Image Generation Based on Texture-Object Decoupling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While fine-tuning diffusion models can embed specific textures into the modifier of text conditions, existing methods struggle on unseen objects. To overcome this, we propose Texture-Object Decoupling (TOD), which incorporates a multi-view texture rendering module to learn explicit object-texture mappings.	B. Xiao;
105	Anchor Field Consistency for Imperceptible Adversarial Attacks on 3D Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We attribute this to nearest-neighbor correspondence sensitivity when comparing clean and adversarial shapes. To address this, we propose Anchor Field Consistency (AFC), which evaluates the clean and adversarial shapes at the same anchors.	K. Tang; Z. Cao; W. Peng; X. Wang; P. Zhu; Z. Tian;
106	Surgical-Clip: A Dual-Branch Temporal Clip for Surgical Video Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods mainly rely on a single temporal scale, which fails to capture both aspects. To overcome this limitation, we present Surgical-CLIP, a dual-branch extension of CLIP that learns complementary long- and short-horizon representations from surgical video.	M. He; M. Zhang; W. Yuan;
107	Phase Optimization Driven Waveform Design with Good Correlation and Information Embedding Performances for Joint Radar-Communications Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Then, we introduce the metric of directional angular distance to quantify the phase offset between interval centers and waveform phases to be optimized. Building on this, we subsequently propose an ISL-versus-phase minimization-based optimization framework, incorporating a series of phase mapping and manipulation constraints.	T. Peng; Y. Li; R. Tao;
108	SchrÖMind: Mitigating Hallucinations in Multimodal Large Language Models Via Solving The Schrödinger Bridge Problem Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Minor perturbations can shift attention from truthful to untruthful states, and the autoregressive nature of text generation often prevents error correction. To address this, we propose SchröMind—a novel framework reducing hallucinations via solving the Schrödinger bridge problem.	Z. Shi; R. Liu; S. Yu; S. Munakata; K. Shirahata;
109	Semantic-Guided Modal Alignment for Multimodal Cardiovascular Disease Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Multimodal data can offer complementary insights from different perspectives, thereby enhancing detection accuracy.	G. Zhang; D. Liu; Y. Lu; H. Sun; B. Lin; Z. Shi;
110	RealCount: Robust Open-World Object Counting Via Duplex Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose RealCount, a multimodal vision-language framework incorporating dual-stream prompt/image adapters and duplex query/input contrastive learning.	Z. Shi; R. Liu; J. Takahashi; S. Jiang;
111	Navigating Modality Uncertainty: Modality-Interaction Enhanced Mixture-of-Experts for Multi-Modal Knowledge Graph Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While existing methods design various multi-modal fusion mechanisms, they largely overlook the inherent disparities in modality quality, as entity modalities across triples differ in informativeness, uncertainty, and noise, and they fail to address sample-specific modality uncertainty, ultimately resulting in suboptimal performance. To address this limitation, we propose MIMoE, a Modality-Interaction Enhanced Mixture-of-Experts framework with an uncertainty-aware router that adaptively integrates heterogeneous modalities.	H. Shen;
112	On The Foundational Condition for Non-Contact Vibration Measurement Using Phase-Based Microwave Interferometry Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The mathematical foundation of phase unwrapping is first-order or higher-order finite difference. Based on this finding, we derive the amplitude alias-free criterion, which involves sampling rate, carrier wavelength, and vibration behavior.	J. Cao; Z. Yang; A. K. Nandi;
113	Multi-Polynomial Phase Signal Parameter Estimation Using Time-Frequency Decomposition and Time-Series Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a phase unwrapping-based parameter estimation method to deal with multi-component PPSs with unknown polynomial degrees.	J. Cao; Z. Yang; A. K. Nandi;
114	FEDCADS: Robust Federated Learning Via Dual Distillation and Participation-Aware Optimization Under Non-IID Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, it still faces challenges related to incomplete utilization of the global model and errors induced by partial client participation. To address these challenges, we propose a novel FL paradigm, named FedCADS, which uses a dynamic dual distillation mechanism to effectively utilize the global model to guide local model training, achieving a multi-level client drift reduction.	J. Lai; D. Li; F. Zhang; R. Wang; J. Hu; H. Cheng;
115	TEAMo: Trait and Emotion Aware Motion Generation in 3D Human Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite recent advances, existing approaches are largely confined to simplistic categorical style descriptors, failing to capture continuous personality traits and thus compromising emotional richness and psychological realism. To bridge this gap, we propose the Trait and Emotion Aware Motion generation framework (TEAMo), a psychologically grounded approach that explicitly integrates personality traits into the motion synthesis pipeline.	B. Tang; D. Zhu; S. -G. Kuai; C. -L. Deng;
116	QCA-RAG: Efficient Retrieval for LLMs Via Query Complexity Awareness Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To adress thie issue, we propose QCA-RAG, a Query Complexity-Aware framework that dynamically adjusts retrieval behavior based on query complexity, enabling the LLM to generate high-quality responses with reduced retrieval overhead.	Y. Zhu; L. Li; J. Liu; H. Chen; Z. Chen; L. Xi;
117	Decision Fusedconv: Efficient Offline Reinforcement Learning Via Fused State-Reward Encoding and Hybrid Temporal Convolution Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Decision Transformer (DT) represents return-to-go, state, and action as independent tokens, resulting in inflated sequence length and quadratic attention cost. To address this inefficiency, we propose Decision FusedConv (DFC), which jointly encodes return and state to shorten sequences and employs a gated hybrid convolutional module that integrates global uniform and local heterogeneous convolutions.	Z. Tian;
118	GraDeRAG: Black-Box Semantic Path Injection Attacks on Graph Rag Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing attacks are either graph-agnostic, failing to manipulate reasoning paths, or limited to data poisoning, unsuitable for inference-time scenarios. Motivated by these limitations, we propose GraDeRAG, a black-box evasion attack framework targeting Graph RAG.	G. Zhao;
119	Dual-Criterion Sample Selection for Noisy Labels: Integrating Neighborhood Prediction Divergence and Loss Values Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a novel sample selection strategy that jointly considers Neighborhood Prediction Divergence (NPD) and loss values to more reliably identify clean samples.	Q. Rong; L. Zhang; L. Yuan; G. Li;
120	LPCVAE: A Conditional VAE with Long-Term Dependency and Probabilistic Time-Frequency Fusion for Time Series Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a Conditional Variational AutoEncoder with Long-term dependency and Probabilistic time-frequency fusion, named LPCVAE.	H. Cheng; W. Mu; F. Liu; W. Zhu; C. Ma;
121	Closed-Loop Co-Adaptive Retinal Coding with Joint Topological-Spectral Feature Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a closed-loop, co-adaptive framework that jointly optimizes spike train generation and decoding.	C. Qin;
122	OCTIP: Compact Geography-Aware IP Embeddings for Nearest-Neighbor IP Signal Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We study Nearest-Neighbor IP Signal Retrieval, a problem that retrieves semantically related IP signals directly in the IP address space without relying on precise geographic coordinates, traffic logs, or active measurements. To address this challenge, we propose OCTIP, a compact OCTet-level IP encoder for IPv4, together with IPSPRE, a reproducible framework for geography-preserving evaluation.	H. Feng; C. Wang; F. Niu;
123	Noise-Robust Video Salient Object Detection in Spike Streams Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In spike camera shooting, stray light accumulation, thermal noise, and light intensity loss make pixel-generated spike intervals inaccurately reflect original light source info, posing two video salient object detection (VSOD) challenges in spike streams: 1) Noisy spikes hinder long-term video info capture; 2) Noise makes sparse spikes struggle to capture salient object texture details. To address these issues, we propose a noise-robust VSOD model based on spiking neural networks (SNNs) with a gradual fusion strategy: the spatial-channel cross-perception module (SCPM) enhances attention to salient regions and filters spatial noise; the local deformable cross-attention module (LDCM) strengthens local feature correlations for temporal denoising; the global information-enhanced self-attention module (GISM) models global context and extracts fine-grained textures.	A. Mao; Y. Fang; J. Yan; P. L. Callet; Z. Liu;
124	UMV: A Mixture-Of-Experts Vision Transformer with Multi-Spectrogram Fusion for Underwater Ship Noise Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose the Underwater Mixture-of-Experts Vision Transformer (UMV), which integrates Short-Time Fourier Transform (STFT) spectrograms, Mel spectrograms, and MFCCs through a convolutional fusion module, and incorporates a Top-k sparse Mixture-of-Experts mechanism into the Vision Transformer encoder.	H. Zhang; G. Wu;
125	Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, cross-style synthesis combining both dialect and emotion remains challenging and largely unexplored, mainly due to the scarcity of dialectal data with emotional labels. To address this, we propose Hierarchical Expressive Vector (HE-Vector), a two-stage method for Emotional Dialectal TTS.	P. Feng;
126	Beyond Global Emotion: Fine-Grained Emotional Speech Synthesis with Dynamic Word-Level Modulation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While effective for global emotion expression, these approaches fail to capture dynamic shifts within a sentence. To address this limitation, we introduce Emo-FiLM, a fine-grained emotion modeling framework for LLM-based TTS.	S. Wang; A. Chen; T. Zhao;
127	MFF-NET: Image Manipulation Localization Method Based on Multi-Scale Feature Fusion Network Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents a multi-scale feature fusion network (MFF-Net) that integrates global noise features via Transformers and local artifact cues via CNNs.	S. Wang; S. Chen; Q. Wu; L. Cao; Y. Xing;
128	CausalRAP: Causal Graph-Driven Retrieval Augmented Long-Horizon Task Planning for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Retrieval-Augmented Generation (RAG) can provide task examples as non-parametric knowledge, but action-level semantic retrieval often brings irrelevant task trajectories since it lacks action-state causal relationships. To address this, we propose CausalRAP, a Causal graph–driven Retrieval Augmented framework for long-horizon task Planning.	M. Ye; Y. Gao; M. Liu; S. Li; N. Guan;
129	FMSP-IR: Frequency Modulation and Structure Priors for All-in-One Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite recent advances, most methods remain restricted to spatial-domain modeling, thereby limiting their effectiveness in representing frequency-domain characteristics under complex degradations. To address this limitation, we propose an integrated framework, FMSP-IR, which incorporates two core modules: the adaptive frequency decoupling and modulation module (AFDM) and the structure-aware gating module (SAGM).	Y. Tu; T. Hu; Q. Yan;
130	A Data-Informed Adaptive Convolution Kernel Learning Method for Image Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel data-informed adaptive convolution kernel learning method.	L. Dai;
131	PTSE-T: Presentation Target Speaker Extraction Using Unaligned Text Cues Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Differently, in this paper, we condition the TSE algorithm on semantic cues extracted from limited and unaligned text contents, such as condensed points from a presentation slide.	Z. Jiang;
132	Sequential Multiple Testing with Three Hypotheses and Known Number of Streams Following Each Hypothesis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we consider the problem of testing the marginal distributions of multiple independent, sequentially observed data streams, where for each stream there are three hypotheses to select from.	Y. Xing; Y. Chen; T. Qu;
133	Dual-Perspective Multimodal Sentiment Analysis with MoE Fusion: Representation Learning Via Semantic Resonance and Divergence Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, challenges remain in handling redundant inter-modal noise and the lack of flexible fusion strategies. To address these issues, we propose a dual-perspective multimodal sentiment analysis framework with mixture of experts fusion, based on semantic resonance and divergence(DPMSA-MoE).	K. Sun; Y. Guo; J. Wang; X. Deng;
134	TF-MAMBANET: A Temporal and Frequency Fused Bidirectional Mamba Architecture for PPG Foundation Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes TF-MambaNet, a foundation model for PPG signals.	Z. Bao; Y. Benezeth; F. Yang; Y. Zhang; H. Wang; C. Li;
135	Nethira: A Heterogeneity-Aware Hierarchical Pre-Trained Model for Network Traffic Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing pre-trained models struggle with the gap between traffic heterogeneity (i.e., hierarchical traffic structures) and input homogeneity (i.e., flattened byte sequences). To address this gap, we propose Nethira, a heterogeneity-aware pre-trained model based on hierarchical reconstruction and augmentation.	C. Lin; W. Zhang; H. Luo; X. Meng; Y. Zhang;
136	Exploiting Scatterers for Sensing Security in ISAC Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Specifically, we propose an Angle-of-Arrival (AoA) deception scheme in which scatterers are deliberately illuminated with higher probing power than the targets of interest, aiming to deceive potential Eavesdroppers (Eves) with sensing capability into misidentifying scatterers as targets.	J. Chen; X. Lei; C. Masouros;
137	TIWNet: A Template-Based Real-Time Image Watermarking Method Using Invertible Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, image-dependent watermarking networks suffer from high embedding costs, although encoder-decoder-based template methods reduce these costs and introduce excessive redundancy that affects image visual quality. To address these limitations, we propose a template-based image watermarking method using the Invertible Neural Network (INN).	P. Zhou; Y. Li; Y. Zhao; Y. Wu; S. Liu;
138	Quantum-Inspired Frequency Attenuation for Enhanced Targeted Fabrication Attacks in Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel Quantum-Inspired Frequency Attenuation (QIFA) method, drawing inspiration from the barrier-penetration view of quantum tunneling effect.	H. Han; Q. Song; J. Qi; R. Wan;
139	Joint Learning of Deterministic and Stochastic Parameters of Sparse Bayesian Neural Networks for Probabilistic Image Registration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes a sparse BNN framework for probabilistic image registration.	Y. Hua; X. Yang; Y. Zhao;
140	SDR-STE: Synergistic Disentanglement and Refinement for Photorealistic Scene Text Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods often handle content and style in an implicit and entangled manner, leading to mutual interference between background reconstruction and text rendering, which can result in structural distortions and texture artifacts, particularly in scenes with complex backgrounds and high-frequency details. To address these issues, we propose a disentangled generation framework that explicitly separates structure repair from texture synthesis in the latent space, and decomposes coarse-grained generation and fine-grained enhancement in the image space, thereby establishing clear division of labor and synergy between background and foreground rendering.	Z. Jia; J. Wang; R. Jin; K. Song; Z. Wang;
141	Inverse Halftoning Via Weighted Sobel Conditioned Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes DiffSo, a weighted Sobel-conditioned diffusion model for high-fidelity inverse halftoning.	S. Shen; J. Yao; D. Zhang; K. Tang; D. Zhao; Z. Gu;
142	Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This mismatch is particularly problematic for dysarthric speech, where articulatory imprecision and disfluencies can cause severe semantic distortions. To bridge this gap, we introduce a Large Language Model(LLM)-based agent for post-ASR correction: a Judge–Editor over the top-k ASR hypotheses that keeps high-confidence spans, rewrites uncertain segments, and operates in both zero-shot and fine-tuned modes.	X. Zheng; S. Dong; B. Phukon; M. Hasegawa-Johnson; C. D. Yoo;
143	Amplitude Optimization Driven Multi-OFDM Waveform Design with Good PMEPR and ISL Performances for Joint Radar and Communications Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, We formulate a trade-off objective that balances the ISL and PMEPR of OFDM and enforce constraints on IE accuracy, total transmit power, and per-subcarrier amplitude bounds.	X. Xu; Y. Li; R. Tao; T. Shan;
144	HFGNet: Mitigating Boundary Distortion for Sonar Image Segmentation with High Frequency Guidance Strategy Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Further, in segmentation tasks, high frequency components have recently proved vulnerable to distortion during downsampling at early stages. To overcome these limitations, we propose a High Frequency Guidance (HFG) strategy for the sonar image segmentation task.	H. Zhu;
145	Beyond Attention: Adapting Segment Anything with Frequency and Structural Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While parameter-efficient adapters have emerged as a promising solution, they often perform generic feature injection, failing to address a fundamental limitation of Vision Transformers (ViTs): their inherent bias towards low-frequency global information, which leads to suboptimal performance on tasks requiring perception of high-frequency details like fine boundaries and textures. To bridge this gap, we propose FDPAdapter that explicitly enriches SAM with crucial frequency-domain and structural priors.	Y. Gao; B. Fu; Y. Shi; Y. Cao; L. Shi;
146	FastAV: Efficient Token Pruning for Audio-Visual Large Language Model Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present FastAV, the first token pruning framework tailored for audio-visual large language models (AV-LLMs).	C. Jung; Y. Jang; S. Lee; J. S. Chung;
147	UNMIXX: Untangling Highly Correlated Singing Voices Mixtures Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce UNMIXX, a novel framework for multiple singing voices separation (MSVS).	J. Jung; J. -H. Kim; D. Kwak; J. Lee; J. Nam; J. S. Chung;
148	LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The goal of this paper is to provide a new perspective on speech modeling by incorporating perceptual invariances such as amplitude scaling and temporal shifts.	D. Kwak; Y. Jang; J. S. Chung;
149	OptimUS: Optimization-Based Unlimited Sampling Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, a general optimization framework for arbitrary regularizers has been lacking. We address this by casting USF recovery as an auto-regressive problem using first-order differences and signal-prior regularization.	J. Bacca; B. Monroy;
150	AERIS-RTDetR: Ultrasound-Aware Real-Time Detection with Orthogonal Aniso-Scale Blocks And Echogenicity-Guided Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Anisotropic Echo-Reliability Integrated Sonographic Real-Time DETR, AERIS-RTDetR, which fuses anisotropy-aware priors with echo-calibrated reliability.	F. Liu; J. Wang; Q. Zhang; Y. Zhang; H. Pan;
151	Symphony Rendering: Midi and Composer-Conditioned Auto Orchestration with Flow-Matching Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce symphony rendering, a conditional orchestration system that converts a melody —either from MIDI or a piano solo— into full symphonic audio in the style of a target composer.	J. Lei; Q. Kong;
152	A Universal Framework for Disentangling Subject-Specific Signatures in EEG Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a universal neural framework that disentangles subject-specific features from state-dependent components in raw EEG signals.	Z. Pei; Z. Li; Q. Li; X. Wu;
153	Exploring Confidence As A Reward to Advance LLMS Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we systematically investigate Confidence-as-a-Reward (CRew), a simple, training-free method that utilizes token-level confidence in model’s final answers as a reward signal, especially suitable for closed-ended tasks.	H. Du; B. Li; C. Xie; C. Gao; K. Chen; D. Tao;
154	PGSENet: Prior-Guided Spectrum Enhancement Network Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This disrupts the continuity of high- and low-frequency information along certain directions in the spectrum, which often leads to insufficient detail recovery. To address this issue, we propose a Prior-Guided Spectrum Enhancement Network (PGSENet).	T. Mei; Y. Hu; L. Chen; Y. Fang; Q. Lin; Y. Wu;
155	ConfMamba-SAM: Structured State Space Modeling with Memory-Augmented Prompting for Automatic Brain Lesion Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Accurate and consistent brain lesion segmentation from clinical MRI and CT volumes remains challenging due to microscale lesions, low contrast, anisotropic resolution, and interslice discontinuities. To address these issues, we propose ConfMamba-SAM, an end-to-end, fully automatic segmentation framework that leverages a frozen foundation model backbone with lightweight, trainable adapters for efficient adaptation.	Z. Cheng;
156	Principle-Guided Multimodal Reasoning with Minimal Human Demonstrations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite their impressive capabilities, MLLMs still face challenges in fine-grained tasks, such as OCR in multilingual contexts or recognizing small or occluded objects, which limits their reliability in real-world applications. To overcome these limitations, we propose PrinM, a principle-guided multimodal reasoning framework that enhances MLLMs with specialized tool experts.	C. Ji;
157	SCI-GR: Sequential Controllable Inpainting-Based Generative Replay For Class-Incremental Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While generative replay (GR) avoids storing real data by synthesizing old samples, existing GR approaches often fail in multi-object generation scenarios, yielding artifacts and semantic inconsistency. For this issue, we propose SCI-GR, the first sequential controllable inpainting-based generative replay framework for CIOD.	N. Xue;
158	PSQ-PMC: A Hardware-Friendly Quantization Scheme for Spike-Based Neural Radiance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite significantly reducing energy consumption during inference, the membrane potential of spike neurons occupies a large portion of memory resources. To address this, we propose a hardware-friendly quantization scheme that is tailored for the spike-based NeRF model.	R. Lin; J. Li; Z. Meng; P. Zhou;
159	Cancer of Unknown Primary Prediction Via Semantic Prompting and Tumor Environment-Aware Patch Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Additionally, in weakly supervised settings, these methods focus solely on high-attention tumor-containing patches, overlooking cellular characteristics in non-affected regions. To overcome these limitations, we propose a novel framework for predicting lymph node metastasis of unknown primary, which explores both tumor and non-tumor patches under the guidance of textual descriptions of the overall tumor environment.	Q. Jia; Q. Bo; S. Yao; Y. Liu; L. Sun; Y. Zhu;
160	Asymmetric Region Denoising and Rotation Equivariant for Image Reflection Symmetry Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Asymmetric regions act as background clutter that disrupts symmetric pattern matching, while convolutional neural networks fail to preserve consistent transformations for symmetric features under image variations. To address these issues, we propose Asymmetric Region Denoising Module (ARD) and Rotation Equivariant Feature Similarity Matching (REFSM) that effectively suppress asymmetric interference and extract refined symmetric patterns.	D. Yin; R. Su; C. Zhao; F. Yu;
161	Task-Aware LLM Council with Adaptive Decision Pathways for Complex Task Support Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Task-Aware LLM Council (TALC), a decision framework that organizes multiple LLMs into a profiled expert council and combines specialization-aware routing with adaptive planning.	W. Zhu; L. Yu; H. -R. Yao; Z. Tang; K. Yue;
162	FedPLA: Prototype-Aligned Low-Rank Adaptation for Multimodal Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes FedPLA, a parameter-efficient framework that optimizes multimodal federated learning through global semantic prototype alignment.	C. Li; R. Gu;
163	BrainBLIP: Bootstrapping Language-Image Pretraining from MRI and Cognition Alignment to Diagnose Multiple Brain Disorders Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Thus, this study proposes a bootstrapping language-image pretraining model from brain structure and cognition alignment (BrainBLIP).	C. Zhao; J. Sui; R. Jiang; D. Zhang; V. D. Calhoun; S. Qi;
164	Optimizing Speech Language Models for Acoustic Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We study speech language models that use semantic initialization and planning losses for robust and consistent generation.	M. Rohanian; M. Krauthammer;
165	Graphmd: A Two-Module Diffusion Framework for Smooth and Consistent Molecular Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, their discrete noise-adding process limits the ability to capture smooth oscillatory behavior between consecutive frames and also poses challenges in maintaining spatial structural consistency and effectively processing molecular graph features. To address this, we propose a two-module approach: a molecular graph interaction module, enhanced with classical potential functions, and a diffusion module that uses the Discrete Cosine Transform (DCT) to better capture smooth molecular motions.	G. Chang; Z. Si; J. Hu; Z. Duan; D. Guo;
166	Clustering of Multisource Remote Sensing Data Via Low-Rank Tensor Learning with Spatial Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing clustering methods often struggle with limited spatial modeling, weak cross-source consistency, and poor scalability. To tackle these challenges, this paper proposes an innovative method called Clustering of Multisource Remote Sensing Data via Low-Rank Tensor Learning with Spatial Constraints (LRTSC).	Z. Cao;
167	EATS2: Enabling Efficient and Accurate Trajectory Similarity Computation Via Self-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, real-world datasets often suffer from sparsity in urban areas, further limiting the availability of sufficiently similar pairs to build robust training sets. To address these challenges, we propose EATS2, an Efficient and Accurate Trajectory Similarity Computation Framework via Self-training.	Z. Cao;
168	Score-Guided Motion Planning: Learning The Gradient Field of Promising Regions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, a score-guided sampling framework, ScorePlanner, is proposed to address the critical challenge of sampling inefficiency in sampling-based motion planning.	S. Wang; Q. Wu; Q. Huang; Z. Cheng;
169	Dual-Branch Spatial-Lighting Network for Photometric Stereo Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This study addresses the challenge of reconciling global contextual understanding with pixel-wise detail preservation in photometric stereo measurements.	X. Tian; Y. Jin; Z. Zhang; P. Liu; F. Ni; D. Cheng;
170	Prior Knowledge Driven Multi-View Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In multi-view clustering tasks, graph construction and information fusion typically lack effective prior knowledge, which makes it difficult to adequately capture the internal structure of multi-view data and thereby degrades clustering performance. To address this issue, Prior Knowledge Driven MultiView Clustering (PKDMVC) model is introduced, which incorporates the first-order neighbor relationships of samples as prior knowledge to guide the clustering process.	H. Xin;
171	UniKGLM: A Unified LLM-Driven Multi-Task Reasoning Framework for Knowledge Graph Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Nevertheless, confining LLMs to isolated pipeline stages fails to unleash their full-chain cognitive potential. To address this limitation, we propose UniKGLM, a full-chain, multi-task reasoning framework integrating type inference, path semantic retrieval, and triple reranking, leveraging a text-to-text approach within a unified structure that fine-tunes LLM with LoRA.	Z. Jiang; Z. Wang;
172	OrthoVAD: Weakly Supervised Video Anomaly Detection Via Prototype Orthogonality Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This leads to severe confusion between normal and abnormal features in the representation space. To address this challenge, this paper proposes a novel framework named OrthoVAD.	T. Zhu;
173	MCPO: Dynamic Masking and Multi-Comparison Policy Optimization Algorithm for LLM Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce Dynamic Masking and Multi-Comparison Policy Optimization (MCPO), a novel framework designed to enhance the reasoning robustness of LLMs.	F. Ding; B. Wang; Xiaoping-Zhang; W. Ding;
174	P-SAM: Parallel Semantic Decoding of SAM for Domain-Driven Prompt Generation in Pore Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While the Segment Anything Model (SAM) demonstrates powerful general segmentation capabilities, its reliance on domain-specific prompts limits its automated application in specialized scenarios. To address these challenges, we propose a Parallel-decoding SAM framework (P-SAM).	D. Li; H. Zhang; Q. Xia;
175	A Competition-Cooperation Graph Adversarial Augmentation Learning with Application to Brain Disease Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose the Signed Graph Adversarial Augmentation Contrastive Learning Network (SGA-CLNet), a competition–cooperation framework for brain disease detection.	M. Yuan; J. Wang; W. Xiong; J. Li; T. Xu; M. Shao;
176	MC-LExt: Multi-Channel Target Speaker Extraction with Onset-Prompted Speaker Conditioning Mechanism Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, DOA-based approaches depend on explicit direction estimation and are sensitive to microphone array geometry, while methods based on speaker embeddings model speaker identity in an implicit manner and may degrade in noisy-reverberant conditions. To address these limitations, we propose multi-channel listen to extract (MC-LExt), a simple but highly-effective framework for MC-TSE.	T. Ling; S. He; P. Shen; Z. -Q. Wang;
177	Sing2Song: An Accompaniment Generation System Based on Solo Singing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose Sing2Song, a hybrid accompaniment generation system that accepts solo singing audio and produces fully orchestrated accompaniment.	S. H. Choi;
178	Advancing Fine-Grained Sentiment Analysis in Complex Contexts: A New Benchmark and Interpretation-Enhanced Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper aims to advance research on fine-grained sentiment analysis (FSA) in complex contexts.	G. Xie;
179	ORSc: Object-Aware Reinforcement with Semantic Consistency for Hallucination Mitigation in MLLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in visual-language understanding but suffer significantly from object hallucinations, generating descriptions that contain objects inconsistent with the visual input. We present ORSc (Object-aware Reinforcement with Semantic Consistency), a novel framework that addresses this fundamental challenge through three key innovations: a self-supervised Object-aware Self-Verification (OSV) mechanism that eliminates external detector dependency by leveraging the model’s internal attention patterns and hidden state dynamics, providing formal guaranties on verification accuracy; A Semantic Consistency Reinforcement (SCR) module employing multi-relational Graph Attention Networks to explicitly model object relationships with theoretical guaranties on representation stability; A Dynamic Layer-wise Semantic Fusion (DLSF) strategy that integrates knowledge from preceding layers guided by information-theoretic measures of semantic consistency.	J. He; X. Shi; H. Xie; Y. Zhang; M. Shang;
180	Dynamic Attention-Aware Shaping for Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Dynamic Attention-Aware Shaping (DAAS), a post-hoc and dynamic method that enhances OOD detection performance.	J. He; H. Xie; X. Shi; Y. Wang; M. Shang;
181	Benchmarking Gaslighting Attacks Against Speech Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce gaslighting attacks, strategically crafted prompts designed to mislead, override, or distort model reasoning as a means to evaluate the vulnerability of Speech LLMs.	J. Wu; B. Zhu; X. Zou; Q. Zhang; X. Fang; P. Zhou;
182	Deep Spatio-Temporal Models for Decoding Purkinje Cell Activity in Tongue Movements Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we investigate whether spike activity from Purkinje cells can be used to classify targeted licking behavior in mice.	M. Zeeshan; L. Bina; L. W. J. Bosman; C. I. De Zeeuw; M. A. Siddiqi; M. Taj;
183	Discrepancy-Aware Disentangled Contrastive Learning for Multimodal Rumor Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose DMCL (Disentangled Contrastive Learning), a discrepancy-aware framework for multimodal rumor detection that explicitly models cross-modal inconsistencies through subspace disentanglement.	K. Lu; H. Zhang; Y. Yang; C. Meng; G. Yin; B. Fang;
184	Deepfake-HMDE: Hierarchical Mixture of Deepfake Experts For Deepfake Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the straightforward application of MLLMs faces two key issues: (i) data heterogeneity among various deepfake methods, and (ii) insufficient robustness for different deepfake methods. To address these issues, we propose a hierarchical mixture-of-experts framework tailored for deepfake detection, i.e, Deepfake-HMDE.	Z. Ren; J. Zhang; X. Feng; Y. Li; C. Chen;
185	TimeDiff: Leveraging Differential Domain Representations for Long Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, in many real-world scenarios, the detailed variations in time series (i.e., its differences) are critical for decision-making. To address this gap, we propose TimeDiff, a novel framework for long time series forecasting that enhances predictive accuracy by modeling in the differential domain.	Y. Tao;
186	Incremental Feature Analysis for Reading Pattern Detection: A Systematic Evaluation of Behavioral Indicators in L2 Eye-Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This study presents a feature ablation framework for empirical discovery of cognitive strategies in second-language (L2) reading.	M. S. Hossain; A. Tashk; C. M. A. Ilyas; F. Kabir; P. Bækgaard;
187	Multi-Scale Task-Aware EEG Representation Learning for Cognitive State Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although deep learning methods have achieved remarkable progress in this area, most existing approaches overlook the heterogeneous encoding patterns across different brain regions and the multi-scale dynamics of neural activity, which limits the reliability and generalization of EEG representations. To address this issue, we propose MSTA-EEGNet, a multi-scale task-aware network for EEG-based cognitive state recognition.	S. Wang;
188	Augment-And-Regularize: Toward Reliable Semi-Supervised Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: (2) leveraging scarce labeled samples together with abundant unlabeled data to learn transferable representations despite domain shifts. To address these issues, we propose ARise, an Augment-and-Regularize framework for SSDG.	S. Wang;
189	Task-Aware Modality-as-Experts Fusion of NIR and Microscopic Image for Textile Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We tackle the long-standing need for fast, non-destructive textile analysis by pairing a new method with a new resource.	J. Kim; M. Chi;
190	Bridging Academia and Industry: Large-Scale NIR Signal Foundation for Robust Multi-Task and Real-World Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present FabricSpectra, a large, real-world foundation dataset for fabric component analysis.	J. Kim; M. Chi;
191	A User-Item Aware Encoding Framework for Short Video Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Traditional encoding methods from the Professional Generated Content (PGC) era face significant challenges due to two unique characteristics of user-generated short videos: massive daily uploads (reaching hundreds of millions) and heterogeneous content-consumer relationships (varying video quality and diverse consumer contexts). To address these challenges, we propose UIAE (User-Item Aware Encoding), a novel multiple bitrate ladder group encoding method comprising three key components: 1) Establishing user-item relationships via rule-based or DNN-based models; 2) Developing local optimization models maximizing Quality of Experience (QoE) for sub-populations based on contextual consumption patterns; 3) Deriving globally optimal encoding strategies through hierarchical model integration.	W. Deng; H. Liu; B. Wang; X. Li; D. Fu; Z. Wang;
192	Qwen-Simplify: Exploring Sentence Simplification Via Qwen-Based Reinforcement Learning Paradigm Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we explore the Qwen-based RL paradigm in the context of sentence simplification.	P. Zhou; G. Li; X. Huang;
193	Towards Reliable Time Series Forecasting Under Future Uncertainty: Ambiguity and Novelty Rejection Mechanisms Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To enhance model reliability, we introduce a dual rejection mechanism combining ambiguity and novelty rejection.	N. Feng;
194	DTT-BSR: Gan-Based Dttnet With Rope Transformer Enhancement For Music Source Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The challenge lies in both separating overlapping sources and reconstructing signals degraded by production effects such as compression and reverberation. We therefore propose DTT-BSR, a hybrid generative adversarial network (GAN) combining rotary positional embeddings (RoPE) transformer for long-term temporal modeling with dual-path band-split recurrent neural network (RNN) for multi-resolution spectral processing.	S. Tan;
195	SVPO: A LLM Reinforcement Learning Method Based on Stepwise Value Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Methods like SPO and VAPO, which provide more granular supervision signals, introduce a significant amount of additional computational overhead. To address these limitations, we propose Stepwise Value Policy Optimization (SVPO), an efficient Reinforcement Learning (RL) algorithm based on step-level value estimation.	Z. Zeng; Z. Ding; B. Zhang; M. Wan; C. Jiang; N. Ding;
196	Generating Training Targets for Real-World Speech Enhancement Via Close-to-Distant Microphone Projection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While such data are often generated through simulation, the mismatch between simulated and real recordings significantly limits SE accuracy. To address this issue, we propose Close-to-Distant microphone Projection (C2D projection), a method that generates paired data from real recordings captured by close and distant microphones.	T. Nakatani; R. Ikeshita; N. Kamo; M. Delcroix; S. Araki;
197	HCL-CSC: Hierarchical Contrastive Learning with IDS-Aware Character Similarity for Chinese Spelling Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose HCL-CSC, a novel framework with three key innovations: (1) IDS tree-based character similarity modeling that combines structural decomposition with confusion dictionaries to capture precise morphological relationships, (2) multi-granularity contrastive learning operating simultaneously at character, sequence, and consistency levels for comprehensive representation learning, and (3) confusionaware dynamic hard negative mining that intelligently adapts sample selection based on model confidence and character confusion patterns.	S. Wang; C. Tong; L. Jiang;
198	Algebraic Covariance Matrix Reconstruction for Sparse Arrays Using Newton’s Identities Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces a fast reconstruction method using Newton’s identities (NI) that establishes a direct algebraic relationship between the known and missing entries of the covariance matrix’s first column.	X. Heng; B. Tang; Z. Chen; Y. Yang; L. Chen; Y. Sun;
199	Deep Dubbing: End-to-End Auto-Audiobook System with Text-to-Timbre and Context-Aware Instruct-TTS Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While TTS boosts efficiency, it struggles with emotional expression, intonation control, and contextual scene adaptation. To address these challenges, we propose DeepDubbing, an end-to-end automated system for multi-participant audiobook production.	Z. Dai;
200	HICT: High-Precision 3D CBCT Reconstruction From A Single X-ray Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose HiCT, a two-stage framework that first generates geometrically consistent multi-view projections from a single panoramic image using a video diffusion model, and then reconstructs high-fidelity CBCT from the projections using a ray-based dynamic attention network and an X-ray sampling strategy.	W. Ma; J. Liu; Z. Xiao; Z. Wang; F. Yang; Z. Liu;
201	Shared-Weights Extender and Gradient Voting for Neural Network Expansion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In parallel, we introduce the Steepest Voting Distributor (SVoD), a gradient-based method for allocating neurons across layers during deep network expansion.	N. Chatzis; I. Kordonis; E. Theodosis; P. Maragos;
202	Automatic Inter-Animal Alignment of Recorded Kinematic Trajectories Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a framework that couples bidirectional LSTM networks with an orthogonal Procrustes alignment to automatically detect movement onset and corrective turning points in non-human primate reaching tasks.	A. Markus; N. Sinha; Y. Prut; J. Goldberger;
203	Quadrature Over-the-Air-Computing for Multimodal Dual-Stream Signal Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a novel quadrature over-the-air computing (Q-OTAC) framework that enables the simultaneously computation of two independent functions and/or data streams within a single transmission.	H. S. Rou; K. Ando; G. T. Freitas de Abreu; D. González G.;
204	Privacy-Aware Design of Distributed MIMO ISAC Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: For this purpose, we introduce an adversarial model where a malicious user exploits the interference from ISAC signals to extract sensing information.	H. Åkesson; M. Gomes; D. P. M. Osorio;
205	Diffusion-Based Natural Adversarial Perturbations Towards Segment Anything Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although different adversarial attacks targeting SAM have been proposed, they have not paid sufficient attention to the stealthiness of malicious images crafted by attackers. In this paper, we introduce a diffusion-based approach that generates natural adversarial samples targeting SAM, such that the perturbed images remain imperceptibly natural to human observers while leading to incorrect segmentation.	H. Xiao;
206	CP-Guard: Continual Preference Alignment for Copyright Protection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Current safe aligning solutions frequently neglect out-of-distribution (OOD) scenarios crucial for copyright protection, and exacerbates catastrophic forgetting on copyright-restricted datasets. To address these concerns, we propose CP-Guard, an innovative framework that combines continual preference alignment with backdoor mechanisms for robust copyright protection.	M. Gou; Z. Yao; H. Ma; S. Zhan; F. He;
207	Lightweight Image Super-Resolution Via Efficient Shift Convolution and Edge-Enhanced Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the majority of these approaches are not practical yet for real-world applications due to their excessive computational complexity and heavy memory consumption. In this work, we propose a lightweight image super-resolution network, Efficient Shift Convolution and Edge-enhanced Attention (ESCEA), where the Efficient Shift Convolution (ESC) is a method to simulate a larger convolutional kernel by shifting smaller ones, and therefore improves the SR performance by expanding the receptive field without increasing the computational complexity.	R. Zuo; L. Chen; Z. Yang;
208	Query-Scalable Few-Shot Semantic Segmentation Via In-Context Variational Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Few-shot semantic segmentation (FSS) faces critical challenges in scaling to diverse query images, as existing methods often struggle to generalize across varying query distributions with limited support samples, especially when query sets exhibit large intra-class variations or increasing complexity. To address this, we propose a novel framework for Query-Scalable Few-Shot Semantic Segmentation via In-Context Variational Inference.	Z. Xing; S. Chen; W. Tan; B. Yan;
209	Jointly Conditioned Diffusion Model for Multi-View Pose-Guided Person Image Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present jointly conditioned diffusion model (JCDM), a jointly conditioned diffusion framework that exploits multi-view priors.	C. Xie;
210	Improving Anomalous Sound Detection with Attribute-Aware Representation from Domain-Adaptive Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address the challenge of missing attribute labels, this paper proposes an agglomerative hierarchical clustering method for the assignment of pseudo-attribute labels using representations derived from a domain-adaptive pre-trained model, which are expected to capture machine attribute characteristics.	X. Fang;
211	BadReasoner: Planting Tunable Overthinking Backdoors Into Large Reasoning Models for Fun or Profit Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we identify a previously unexplored attack vector against LRMs, which we term overthinking backdoors.	B. Yi;
212	SafeGrad: Gradient Surgery for Safe LLM Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We diagnose that this failure stems from conflicting gradients, where the user-task update directly undermines the safety objective. To resolve this, we propose SafeGrad, a novel method that employs gradient surgery.	B. Yi;
213	A Learning-Based Automotive Sound Field Reproduction Method Using Plane-Wave Decomposition and Multi-Position Constraint Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Achieving sound field reproduction (SFR) with high sound quality and accurate spatial localization in automotive cabins is particularly challenging due to complex acoustics and constrained loudspeaker layouts. This paper proposes a learning-based method to address this challenge, integrating a spatial domain physics-informed constraint based on plane-wave decomposition (PWD) with a multi-position control strategy.	Y. Qian; X. Wu; T. Qu;
214	Span Pruning and Syntactic Awareness for Aspect Sentiment Triplet Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose SPASA, a model that combines span pruning and syntax awareness, divided into two stages: Named Entity Recognition (NER) and Relation Extraction (RE).	B. Cui; W. Wang; S. Liu;
215	CVaR-Aware Network Slicing for Tail Latency Under Tiered Deadlines Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a conditional value-at-risk (CVaR)-aware network slicing framework that provides end-to-end resource isolation and explicitly optimizes the tail of the delay distribution while enforcing hard-deadline reliability targets.	S. Niu; Q. Peng; Z. He;
216	Constrained Local Point Cloud Perturbations Using Adaptive Curvature for 3D Adversarial Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: 3D point cloud recognition models are highly susceptible to adversarial perturbations, whereas existing approaches often introduce visible distortions, suffer from weak transferability, and achieve limited attack success. To address these challenges, we propose a novel adversarial framework that constrains point perturbations through reversible transformation, employs hierarchical sampling to preserve structural keypoints, and refines perturbations using gradient-guided updates.	Z. Xu;
217	ForgetMark: Stealthy Fingerprint Embedding Via Targeted Unlearning in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce ForgetMark, a stealthy fingerprinting framework that encodes provenance via targeted unlearning.	Z. Xu;
218	DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Dual-Layer Nested Fingerprinting (DNF), a black-box method that embeds a hierarchical backdoor by coupling domain-specific stylistic cues with implicit semantic triggers.	Z. Xu;
219	KinGuard: Hierarchical Kinship-aware Fingerprinting to Defend Against Large Language Model Stealing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Conventional backdoor fingerprinting, however, is flawed by a stealth-robustness paradox: to be robust, these methods force models to memorize fixed responses to high-perplexity triggers, but this targeted overfitting creates detectable statistical artifacts. We resolve this paradox with KinGuard, a framework that embeds a private knowledge corpus built on structured kinship narratives.	Z. Xu;
220	CZSRSSC: Continual Zero-Shot Remote Sensing Scene Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: Remote sensing scene classification, a core technology in fields such as disaster response, resource management, and urban planning, often faces challenges in real-world …	Z. Xu;
221	Patch-Based Active Source-Free Domain Adaptation for Annotation-Efficient Medical Image Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To improve annotation efficiency while preserving data privacy, we propose a novel Active Source-Free Domain Adaptation (ASFDA) framework.	J. Dong; Y. Zhang; Z. Zhang; L. Lin; Y. -W. Chen; R. Tong;
222	TrafficHTG: Revolutionizing Network Traffic Generation with Hierarchical Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, traditional simulation tools struggle to reproduce the detailed characteristics of real network traffic, while model-based generation approaches are often limited by their research objectives or model performance, typically resulting in either the inability to generate raw traffic or the production of low-quality synthetic traffic. To address these challenges, this paper proposes a hierarchical autoregressive architecture for traffic generation, named TrafficHTG, which leverages protocol-aware semantic segmentation and hierarchical encoder-decoder mechanism to enable effective transfer of autoregressive models to the task of traffic generation.	J. Qin;
223	TrafficMoE: Adaptive Multi-Perspective Feature Fusion for Enhancing Malicious Traffic General Detection Capability Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although pre-training techniques can alleviate the problem of data scarcity, most existing methods pre-train or fine-tune models from scratch using packet-level information, which restricts them to learning only one-dimensional packet-level features and thus limits the model’s general detection capability across diverse types of attacks. To address these challenges, we propose TrafficMoE, which enhances traffic understanding in attack scenarios by integrating cross-attention mechanisms and position-dependent gating to jointly analyze traffic features extracted from multiple perspectives.	J. Qin;
224	Adaptive World Model with Latent Generation Algorithm for Deep Reinforcement Learning in Portfolio Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Deep Reinforcement Learning (DRL) approaches to financial port-folio optimization often struggle with data inefficiency and a lack of strategic foresight due to the complex and non-stationary nature of markets. To overcome these limitations, we propose AWMLG, a novel DRL algorithm based on an Adaptive World Model with Latent Generation.	F. Gu; Z. Jiang; Á. F. García-Fernández; J. Su; H. Li;
225	MoSA: Motion-Guided Semantic Alignment for Dynamic Scene Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing methods struggle with fine-grained relationship modeling, semantic representation utilization, and the ability to model tail relationships. To address these issues, this paper proposes a motion-guided semantic alignment method for DSGG (MoSA).	X. Wang; B. Zhang; C. Wang; G. He;
226	Precision Neural Networks: Joint Graph and Relational Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To make precision estimation task-aware, we formulate an optimization problem that jointly learns the network parameters and the precision matrix, and solve it via alternating optimization, by sequentially updating the network weights and the precision estimate.	A. Cavallo; S. Rey; A. G. Marques; E. Isufi;
227	Rethinking Pseudo-Labeling: A Unified Dual-CCL Framework for Robust Semi-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Semi-Supervised Semantic Segmentation (SSS) aims to ease the annotation burden of dense pixel-wise labels, while still facing difficulties in maintaining accurate pseudo-labels and minimizing prediction uncertainty. To address these issues, we propose a unified framework, Dual-Cycle Consistency Learning (DUAL-CCL), which enhances both probabilistic stability and semantic reliability.	Q. Xiong; X. Li; K. Wang; C. Hao; K. Liu;
228	High-Frequency-Aware Omni-Aggregation Transformer for Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose the High-Frequency-Aware Omni-Aggregation Transformer (HFAT), which leverages high-frequency priors to enhance attention to fine details and performs omni-aggregation by incorporating features across multiple dimensions and scales.	W. Zhang; Y. Liu; J. Yang; T. Pan; D. Zeng;
229	MELT: Improve Composed Image Retrieval Via The Modification Frequentation-Rarity Balance Network Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address these limitations, we confront two key challenges: asymmetric rare semantic localization and robust similarity estimation under hard negative samples. To solve these challenges, we propose the Modification frEquentationrarity baLance neTwork (MELT).	G. Qiu;
230	Online Neural Fusion of Distortionless Differential Beamformers for Robust Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, ACC often fails in highly non-stationary scenarios, such as rapidly moving interference, since its adaptive updates cannot reliably track rapid changes. To overcome this limitation, we propose a frame-online neural method for multiple beams fusion, which estimates more efficiently the combination weights.	Y. Qian;
231	Efficient Multi-Lora Deployment Via Shared KV-Cache with Task-Adaptive Tokens Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose SKV-TAT (Shared KV-cache with Task-Adaptive Tokens), a novel framework that decouples context processing from task adaptation to enable efficient multi-LoRA deployment.	J. Ou; J. Guo; S. Jiang; W. Tian;
232	A Unified Four-Stage Dynamic Cycle for Robust Federated Fine-Tuning of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present SplitLETS, a unified four-stage cycle that combines LoRA, SAM, and LETS.	B. Tan; J. Ren; Y. Li; A. Chaddad;
233	Enhancing Post-Training Quantization Via Future Activation Awareness Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although this method is efficient, it suffers from quantization bias and error accumulation, resulting in suboptimal and unstable quantization, especially when the calibration data is biased. To overcome these issues, we propose Future-Aware Quantization (FAQ), which leverages future-layer activations to guide quantization.	Z. Lv; Z. Fan; Q. Tian; W. Zhang; Y. Zhuang;
234	MSBench: Can Speech Language Models Generate Multi-Speaker Dialogues in One Passƒ Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing evaluations that primarily focus on short, single-speaker utterances often neglect long-form, multi-speaker dialogues typical of podcasts and other real-world settings. To address this, we introduce MSBench, a benchmark designed to assess the ability of SLMs to generate natural, multi-speaker dialogues with semantic and paralinguistic cues.	Z. Xu; T. Liu; H. Shen; M. Liu; L. Duan;
235	PoemCraft: Multimodal Poetry Generation with Prosody-Guided Refinement and Biased Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose PoemCraft, a multimodal poetry generation framework with prosody-guided refinement and biased attention.	T. Lyu; L. Ge; Z. Song; Y. Zhu;
236	Target Speaker Anonymization in Multi-Speaker Recordings Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, current evaluation methodology does not allow us to accurately assess privacy protection and utility in this complex multi-speaker scenario. This work aims to bridge these gaps by exploring effective strategies for targeted speaker anonymization in conversational audio, highlighting potential problems in their development and proposing corresponding improved evaluation methodologies1.	N. Tomashenko; J. Yamagishi; X. Wang; Y. Liu; E. Vincent;
237	An Efficient Neural Network for Modeling Human Auditory Neurograms for Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a compact convolutional encoder that approximates the Bruce mean-rate pathway and maps audio to a multi-frequency neurogram.	E. Zohar; I. Nelken; B. Rafaely;
238	EuleroDec: A Complex-Valued RVQ-VAE for Efficient and Robust Audio Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work we introduce an end-to-end complex-valued RVQ-VAE audio codec that preserves magnitude-phase coupling across the entire analysis-quantization-synthesis pipeline and removes adversarial discriminators and diffusion post-filters.	L. Cerovaz; M. Mancusi; E. Rodolà;
239	Information-Seeking Transmit Beamforming for Cognitive Ultrasound Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here, we propose a fully parameterized adaptive transmit strategy that optimizes the (broadband) near-field response of the transducer array in situ.	B. Federici; R. J. van Sloun; M. Mischi;
240	Plug-and-Play Temporal Fourier Embedding for Robust Long-Horizon Traffic Flow Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present Temporal Fourier Embedding (TFE), a lightweight and model-agnostic feature augmentation that injects explicit periodic priors directly at the input level.	P. Wang; H. Sun; K. Hui;
241	Support-Conditioned Dynamic Convolution for Few-Shot Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose a Support-Conditioned Dynamic Convolution (SCDC) module within the Meta-RCNN framework, which injects support information into mid-level query features.	H. Ding; W. Zhuo; Z. Tang; L. Shen;
242	Scalable Bayesian Fine-Tuning of LLMs for Multi-Objective Bayesian Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As the default choice for surrogate modeling in multi-objective Bayesian optimization (MOBO), Gaussian processes (GPs) struggle with irregular high-dimensional variables and non-stationary spaces. To alleviate these challenges, we adapt the prevailing large language model (LLM) as the surrogate model given its powerful capability in feature extraction based on large-scale pre-training.	H. Xiang; H. Zhang; Q. Lu;
243	Fine-Grained Text-to-Image Synthesis with Semantic Refinement Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work presents a new diffusion-based method that favors fine-grained synthesis with semantic refinement.	X. Song; J. Sun; Y. Zhang; L. Wang; Q. Li; Z. Sun;
244	Learning Vocal-Tract Area And Radiation With A Physics-Informed Webster Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a physics-informed voiced backend renderer for singing-voice synthesis.	M. Lu; J. D. Reiss;
245	Gaussian-Grounded Contextual Hierarchical Inference for Weakly Supervised Video Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing multiple instance learning approaches often fail to capture the diversity and contextual dynamics of abnormal events, resulting in limited localization accuracy. To address this, we propose Gaussian-grounded Contextual Hierarchical Inference (GCHI), a novel framework that learns discriminative Gaussian-grounded feature representations via conditional normalizing flows, models long-range temporal dependencies through contextual aggregation, and performs joint coarse-to-fine anomaly inference by aligning visual features with textual semantics and anomaly prototypes.	W. Zheng; T. Zhang; Z. Cui; C. Xu;
246	Complex-Aware Semi-Supervised Modulation Recognition Via Latent Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a complex-valued framework that couples a multi-codebook VQ-GAN tokenizer with virtual adversarial training in the latent space.	X. Deng; K. Jin;
247	FW-VTON: Flattening-and-Warping for Person-to-Person Virtual Try-On Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present FW-VTON, a three-stage framework: (1) garment flattening to reconstruct a canonical, pose-agnostic garment from the source; (2) garment warping to align the flattened garment with the target pose; and (3) seamless integration onto the target person.	Z. Wang;
248	ICPO: Illocution-Calibrated Policy Optimization for Multi-Turn Conversation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We find that standard post-training techniques like Reinforcement Learning with Verifiable Rewards (RLVR) exacerbate this issue by rewarding confident, direct answers, thereby inducing overconfidence and discouraging the model from seeking clarification. To address this, we propose Illocution-Calibrated Policy Optimization (ICPO), a novel training framework that sensitizes the model to instruction ambiguity.	Z. Wang;
249	Decomposing Multilingual Representations: How Scale, Architecture, and Data Shape Functional Specialization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces a framework to dissect their internal representations, revealing a phenomenon we term Functional Specialization: the emergence of distinct neural circuits for language-specific form versus language-agnostic semantics.	Z. Wang;
250	CodEOE: A Benchmark for Jointly Extracting Cross-Document Events and Opinions From Social Media Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Event extraction and opinion/sentiment analysis have been extensively studied within recent decades, but their joint research remains an under-explored area. To bridge this gap, we introduce a challenge Cross-Document Event-Opinion Extraction (CodEOE) task, which requires a model extracting event triggers and arguments as well as their associated opinions or sentiments by understanding cross-document long contexts.	Z. Wang;
251	CTR-LoRA: Curvature-Aware and Trust-Region Guided Low-Rank Adaptation for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce CTR-LoRA, a framework guided by curvature trust region that integrates rank scheduling with stability-aware optimization.	Z. Wang;
252	Conjugate Relation Modeling For Few-Shot Knowledge Graph Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods, however, struggle to capture complex relational patterns and mitigate data sparsity. To address these challenges, we propose a novel FKGC framework for conjugate relation modeling (CR-FKGC).	Z. Wang; Q. Zeng; H. Duan; C. Cheng; M. Zou; Z. Wang;
253	Mongoose: Do We Need A Scanner for Vision Mamba? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these existing methods introduce computational overhead due to specialized scanning techniques. To address these issues, we propose Mongoose, a simplified SSM that eliminates scanning mechanisms.	B. N. Patro; V. S. Agneeswaran;
254	ADREC: Training An Autonomous Decision-Making Recommendation Agent Through Behavior Cloning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose ADRec (Autonomous Decision-making Recommendation agent), trained through behavior cloning from GPT-4 demonstrations.	Y. Yang; L. Li; D. Zeng;
255	A Multi-Dimensional Feature Fusion and Multi-Level Domain Adaptation Network for Cross-Subject EEG Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, inter-subject variability in EEG data distribution causes substantial domain shifts, severely limiting model generalization across subjects. To address this challenge, we propose M2Net, a novel network that integrates multi-dimensional feature fusion and multi-level domain adaptation for cross-subject EEG emotion recognition.	C. Xie; R. Chen; Z. Huang; J. Zhang; L. Qiu; J. Pan;
256	Tips Over Tricks: Simple Prompts for Effective Zero-Shot Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We revisit the backbone and use TIPS—a VLM trained with spatially aware objectives.	A. Salehi;
257	Active Inference Framework for Closed-Loop Sensing, Communication, and Control in UAV Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce the active inference framework into SCC-enabled uncrewed aerial vehicle systems for joint state estimation, control, and sensing resource allocation.	G. Pan; L. Bai; Z. Tian; H. Chen; M. Bennis; H. Wymeersch;
258	Towards Multi-View Hierarchical Video-to-Piano Generation with MIDI Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a hierarchical V2P framework that introduces MIDI as an intermediate representation, with progressive MIDI prediction (pitch, velocity, sustain) guiding waveform synthesis.	C. Liu; Z. Chen; G. Chen; C. Ding; N. Sebe;
259	Discrete-Continuous Fusion With Adaptive Hierarchical Features For Audio Deepfake Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing detection methods often rely on unimodal acoustic features, failing to capture nuanced synthetic patterns or generalize to unseen neural attacks. To address these limitations, we propose an end-to-end framework integrating Hybrid Audio Tagging (HAT) and Hierarchical Residual Connection (HRC) modules into the Whisper architecture.	J. Cui; B. Yu; S. Qin;
260	Secure Backscattering with Non-Colluding Jammer and Eavesdropper Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper investigates a novel physical-layer security framework for secure data collection in multi-tag monostatic BackCom systems under hybrid attacks.	T. Zhang; D. Mishra; J. Yuan; A. Seneviratne;
261	Linear Wmmse Mimo Precoding with One-Bit Dacs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose novel one-bit linear WMMSE precoders for sum rate maximization, applicable to both dynamic and fixed power reallocation scenarios.	R. Hou;
262	ProKWS: Personalized Keyword Spotting Via Collaborative Learning of Phonemes and Prosody Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents ProKWS, a novel framework integrating fine-grained phoneme learning with personalized prosody modeling.	J. Pan; Y. Zhang; K. Huang;
263	Non-Line-of-Sight Vehicle Detection Via Audio-Visual Fusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a scene-aware acoustic perception network that integrates audio and visual signals for occluded vehicle detection.	H. Wang; H. Yu; R. Zhang; W. Zhou; J. Xi;
264	DrivingScene: A Multi-Task Online Feed-Forward 3D Gaussian Splatting Method for Dynamic Driving Scenes Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose DrivingScene, an online, feed-forward framework that reconstructs 4D dynamic scenes from only two consecutive surround-view images.	Q. Hou; W. Sun; C. Zeng; C. Wang; H. Li; J. Cui;
265	Sparsity Induction for Accurate Post-Training Pruning of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Sparsity Induction, which promotes models toward higher sparsity at both distribution and feature levels before pruning, to push the limits of PTS.	M. Jiang; Z. Li; X. Liu; J. Zhang; M. Chen; Q. Gu;
266	Wavelet-Aware Anomaly Detection in Multi-Channel User Logs Via Deviation Modulation and Resolution-Adaptive Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These logs are often multi-channel, non-stationary, and anomalies are rare, making anomaly detection challenging. To address these issues, we propose a novel framework that integrates wavelet-aware modulation, multi-resolution wavelet decomposition, and resolution-adaptive attention for robust anomaly detection.	K. Kong; D. Liu; X. Jin; S. Xu; G. Geng;
267	Pull-Pushing Canny Edge Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Edge detection in a noisy setting is a hard-to-work-around problem in computer vision, where the core difficulty lies in discriminating the meaningful edge from the overwhelming noise. For this task, we propose an unadorned method, called PPCE, modifying the classic Canny Edge by using a Pull-Push non-local means (PP-NLM) denoising framework.In detail, PP-NLM approximates the NLM using a multi-scale filtering and its fine-to-coarse process makes a denoised thumbnail image.	Y. Shao; Y. Qian;
268	Mixture-of-Experts Framework for Field-of-View Enhanced Signal-Dependent Binauralization of Moving Talkers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a novel mixture of experts framework for field-of-view enhancement in binaural signal matching.	M. Mittal;
269	PRISM-UNET: A Physics-Guided, Projection-Regularized, Informed State-Space Multiresolution Unet for Medical Image Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The cross-modality image translation aids clinical capabilities: while low-to-high dose CT generation subsides the radiation exposure, synthesizing different MRI sequences (T1w, T2w, FLAIR) from a single acquisition, reduces the scan time and cost significantly. To address this critical need, we propose PRISM-UNet, a GAN architecture built on a UNet-inspired framework with Mamba-integrated generator that leverages state-space modeling and linear scaling.	P. K. Singh; I. Ul Haq Gulzar; S. Singh; A. Nigam;
270	Analytical Framework for Wireless Localisation Using Terahertz Backscattering Tags Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we proposed a novel bistatic range-based localisation framework using the THz Backscattering Communication (BackCom) technology, providing a device-free wireless tracking solution with low deployment cost and well-suited for indoor localisation applications.	S. Liang; Y. Deng; M. Ahmed; D. Mishra; S. Atakaramians; A. Seneviratne;
271	Adaptive Few-Shot Channel State Information Physical Layer Authentication for Leo Constellations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, this paper proposes a physical-layer CSI-based multi-stage dynamic trust authentication method.	H. Yang; D. Zou; X. Hou; S. Wang;
272	GT-ARN: A Graph-Tokenized Adaptive Retentive Network for EEG Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Conventional approaches rely solely on self-attention without domain priors or noise suppression, and construct tokens by convolution based on Euclidean proximity, which does not necessarily reflect true inter-channel dependencies. To address these limitations, we propose the Graph-Tokenized Adaptive Retentive Network (GT-ARN), a framework tailored for EEG decoding.	B. Long; T. Chen; Y. Hu; M. Wu; Q. Chen; L. Qiu;
273	SFGNet: Semantic and Frequency Guided Network for Camouflaged Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a novel Semantic and Frequency Guided Network (SFGNet), which incorporates semantic prompts and frequency-domain features to capture camouflaged objects and improve boundary perception.	D. Wang; H. Zhao; X. Shen; S. Miao;
274	DefenseMEL: Enhancing Adversarial Robustness of Multimodal Entity Linking with Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Experimental results show that current MLLMs generally lack sufficient robustness against visual perturbations in MEL tasks. Based on this finding, we propose an MEL method based on MLLMs and dynamic retrieval enhancement, DefenseMEL.	F. Wang; J. Xu; M. Tian; M. Hu; Z. Luo; X. Bai;
275	Dual-Path Latent Diffusion with Multi-Task Interaction for CT-to-PET Translation and Tumor Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose DuLDiff, a novel Dual-path Latent Diffusion framework for both CT-to-PET translation and CT-based tumor segmentation.	P. Zeng; X. Zeng; B. Zhang;
276	Ro-Bench: Large-Scale Robustness Evaluation of MLLMs with Text-Driven Counterfactual Videos Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce Ro-Bench, the first benchmark for evaluating MLLMs on dynamic out-of-distribution (OOD) counterfactual video test sets.	Z. Yang; J. Li; M. Diao; Y. Jing; K. Liang;
277	AVO-65: A Large-Scale Hierarchical Audio-Visual Object Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a large-scale, high-quality audiovisual dataset centered on audio-visual objects.	Z. Yao; G. Zhang; L. Wang; D. Zhu;
278	GCE-UQ: Quantifying and Decomposing Uncertainty in Graph Counterfactual Explanations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce GCE-UQ, a framework that quantifies, decomposes, and mitigates uncertainty for GCEs.	C. Guo; S. Xie; X. Zhang;
279	CLG-MSTS: Contrastive Learning-Guided Multi-Scale Temporal-Spatial Network for Cross-Subject Emotion Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose a novel method called Contrastive Learning-Guided Multi-Scale Temporal-Spatial Network (CLG-MSTS) to learn subject-invariant representations.	C. Li; J. Xin; Q. Shen; B. T. Dai; X. Liu; Z. Wang;
280	Efficient Exposure Fusion Via Fine-Tuning A Low-Light Enhancement Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address this challenge, we introduce transferable learning priors from ill-posed reconstruction tasks such as Low-light Image Enhancement (LLIE) and reformulate MEF as a high-exposure-prior–guided low-light enhancement paradigm, thereby achieving effective cross-exposure information fusion while preserving detail-recovery capability. Based on this paradigm, we propose a simple yet effective LoRA-MEF network that leverages a Low-Rank Adaptation (LoRA) strategy to fine-tune a pre-trained LLIE model.	H. Wang; T. Hu; Y. Zhang; Q. Yan;
281	Averaging Is Not Enough: Preserving Client-Specific Knowledge in Federated PEFT with One-Round Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we identify the root cause is the incompatibility between PEFT methods and FL’s aggregation mechanism, where conventional averaging fails to preserve personalized client knowledge, leading to suboptimal performance and slower convergence.	H. Cheng; J. Huang; Q. Liu; L. Zhang;
282	Progressively Injecting Structural Semantics from The Frequency Domain Into Mamba for Accurate Curvilinear Structure Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While Mamba effectively models global dependencies in curvilinear structures, its sequence-based state-space modeling introduces the issue of structural fragmentation. To address this, we propose High-Frequency Refinement VMamba (HR-VMamba), a method that progressively injects structural semantics into Mamba to refine its representation.	W. Cai;
283	Detecting Oscillating Singularities with The Weak Scaling Exponent Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose and study a criterion for detecting oscillating singularities.	H. Wendt; S. Jaffard; P. Abry;
284	SKIN-PAS: Skin Lesion Segmentation Through Parameter-Efficient Adaptation of SAMv2 Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a parameter-efficient framework that adapts frozen SAMv2 visual encoder using lightweight adapters and a multi-scale decoder, reducing encoder learnable parameters by 98.3%.	M. Moradi; M. Moradi; S. Palazzo; A. Borji; C. Spampinato;
285	S2Voice: Style-Aware Autoregressive Modeling with Enhanced Conditioning for Singing Style Conversion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present S2Voice, the winning system of the Singing Voice Conversion Challenge (SVCC) 2025 for both the in-domain and zero-shot singing style conversion tracks.	Z. Wang; X. Xia; C. Huang; L. Xie;
286	WTRSS: Unleashing The Power of Wavelet Transform in Radar Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the existing deep learning models are difficult to specifically handle the characteristics of radar frequency maps : the data is anisotropic with low signal-to-noise ratio (SNR). Therefore, we propose WTRSS, a RSS method inspired by wavelet transform.	F. Chen; T. Tan; T. Li; Z. Lu; Q. Liao;
287	STDiffusion: A Spatiotemporal Interpolation-Oriented Diffusion Model for Signal Series Latent Representation Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose STDiffusion, an interpolation-guided spatiotemporal diffusion framework that replaces forward noise addition with spatiotemporal attention interpolation and UNet denoising with a deterministic reverse predictor in latent space, explicitly coupling diffusion steps with physical time.	H. Xiong;
288	Semantic-Aware Discrete Online Cross-Modal Hashing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, most existing methods rely only on discrete labels, overlooking rich semantic information, and face challenges in discrete optimization and efficient updates. To address these issues, we propose a novel supervised OCMH method, Semantic-Aware Discrete Online Cross-Modal Hashing (SADOCH).	Z. Yao; R. Zhai; L. Wang; G. Gu;
289	Enhancing Debate Dialogue Generation Via Dual-Dimensional Reflection and Refinement Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a Dual-dIM ensional rEflection and Refinement (DIMER) framework that explicitly models argument engagements and iteratively improves responses.	Y. Sun; Y. Huang; B. Liang; M. Yang; R. Xu;
290	Physically Deployable 3D Omnidirectional Infrared Adversarial Patches Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a physically deployable 3D omnidirectional adversarial patch for vehicles to address the robustness limits of 2D infrared attacks under viewpoint changes.	W. Dong; B. Li; H. Wang; H. Chen; A. Peng;
291	Superpixel-Informed Continuous Low-Rank Tensor Representation for Multi-Dimensional Data Recovery Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, classical LRTR methods face two critical limitations: (1) they assume holistic data is low-rank, which is often violated in real-world scenarios with significant spatial variations; and (2) they are constrained to discrete meshgrid data, limiting flexibility. To overcome these limitations, we propose a Superpixel-informed Continuous Low-Rank Tensor Representation (SCTR) framework.	Z. Wang; J. Wang; R. Zheng; Z. Wu;
292	HREI: Hybrid Long-Short Retrieval and Efficient Inference for Knowledge Base Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose HREI, a novel low-resource KBQA framework that enhances retrieval accuracy and reasoning efficiency.	S. Liu; X. Su; J. Li; Z. Duo; G. Gao;
293	Space-Time ARC Abstraction for UAV Network Reconfiguration Under Adversarial Electro-Optical Disruption Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a reconnection framework featuring verifiable space-time arc abstraction and distributed convergence.	X. Wang; C. Zhang; Y. Zhang; X. Hou; S. Wang;
294	From Silent Flows to Speaking Guardians: LLM-Enhanced Framework for IoT Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents FlowGuard, an LLM-enhanced framework for graph-based IoT network anomaly detection.	H. Ding;
295	BeepBeep: Leveraging Structural Attenuation for Robust Device-to-Device Authentication Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose BeepBeep, a novel authentication method leveraging structural attenuation coefficients as physical fingerprints.	Z. Yan; Y. Zhou; Y. Li; W. Jin;
296	Diffusion-Aided Extreme Video Compression with Lightweight Semantics Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper propose a video compression framework that integrates generative priors to drastically reduce bit-rate while maintaining reconstruction fidelity.	M. Zhang; H. Wu; R. Jin; D. Gündüz; K. Mikolajczyk;
297	Adapting Diarization-Conditioned Whisper for End-to-End Multi-Talker Speech Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a speaker-attributed (SA) Whisper-based model for multi-talker speech recognition that combines target-speaker modeling with serialized output training (SOT).	M. Kocour; M. Karafiat; A. Polok; D. Klement; L. Burget; J. Černocký;
298	The Speech Analysis for Neurodegenerative Diseases Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Participants were required to design models capable of detecting and classifying the severity of voice disorders, i.e., dysarthria (Task 1), and predicting the progression of the neurodegenerative disease by forecasting the worsening of dysarthria (Task 2) based on vocal signal analysis.	G. Sannino;
299	Assessing Speech Quality Metrics for Evaluation of Neural Audio Codecs Under Clean Speech Conditions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, for neural codecs, it is often unclear which metrics provide reliable quality estimates. To address this, we evaluated 47 objective metrics by correlating their scores with subjective listening scores for clean speech across 17 codec conditions.	W. Mack;
300	Bringing Multimodal Foundation Models to Hearing Aids Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This would allow us to unleash the promise of FMs to the domain of hearing aids by offloading their execution to external device (e. g., smartphones) which are queried periodically to analyse the background acoustic scene. We present a series of experiments that aim to put this hypothesis to the test.	A. Triantafyllopoulos; I. Tsangko; B. Schuller;
301	Bayesian Uncertainty-Aware MRI Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a novel framework for joint magnetic resonance image reconstruction and uncertainty quantification using under-sampled k-space measurements.	A. K. Eldaly; M. Figini; D. C. Alexander;
302	Moments Matter: Posterior Recovery in Poisson Denoising Via Log-Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a new strategy for Poisson denoising based on training a log-network.	S. Shoushtari; E. P. Chandler; U. S. Kamilov;
303	Near-Optimal Online Gain Control for Modulo Analog-to-Digital Converters Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, practical operation may be hindered by imperfect analog gain control, an intrinsic aspect of the architecture. To address this limitation, we propose an online gain control algorithm that adaptively adjusts the scaling gain of the converter, achieving robust, near-optimal performance under gain uncertainty.	O. Lev; A. Weiss;
304	On Optimization of Poles for Adaptive Fourier Decomposition-Inspired Neural Layers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we integrate Adaptive Fourier Decomposition (AFD) with Blaschke-type bases into a neural operator architecture for solving forward and inverse PDEs.	Z. Song; Z. Jiang;
305	EHDN: An Enhanced Homography Decomposition Network for Robust Planar Object Tracking Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing tracking models often suffer from limited robustness when dealing with complex deformations or long-term tracking scenarios. To address this, we propose an Enhanced Homography Decomposition Network (EHDN).	X. Liu; L. Zhang; Y. Wu; C. Zhao;
306	Sequential and Simultaneous Optimization of Microphone Array Geometry and Region-of-Interest Beamforming Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents a sequential approach to simultaneously optimize a microphone array geometry and its corresponding beamformer weights.	G. Itzhak; S. Doclo; I. Cohen;
307	Plug-in-and-Play Dual-Domain Mixup Augmentation Network for PET Denoising at Multiple Dose Levels Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Some multi-dose methods can address this issue but require scans from multiple doses as training samples, which is impractical and resource-intensive. To address these limitations, we propose a Dual-Domain Mixup Augmentation Network (D2MA-Net) for robust multi-dose PET denoising using only training samples of a single dose level.	P. Zeng; B. Zhang; H. Zhang;
308	Prodistill: A Progressive Prompting Framework for Fine-Grained VLM Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This is suboptimal as it overlooks the prerequisite of adapting to the new domain’s general distribution before tackling the fine-grained task. To address this, we propose ProDistill, a progressive two-stage framework that explicitly models this dependency.	S. Luo; C. Meng; H. Zhang; Z. Gan; C. Ouyang;
309	T-Mimi: A Transformer-Based Mimi Decoder for Real-Time On-Phone TTS Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, Mimi’s decoder, which employs a hybrid transformer and convolution architecture, introduces significant latency bottlenecks on edge devices due to the the compute intensive nature of deconvolution layers which are not friendly for mobile-CPUs, such as the most representative framework XNNPACK [1]. This paper introduces T-Mimi, a novel modification of the Mimi codec decoder that replaces its convolutional components with a purely transformer-based decoder, inspired by the TS3-Codec architecture.	H. Wu;
310	SP-UNet: Robust Single-Snapshot DOA Estimation Via Signal Manifold Recovery Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: Single-snapshot Direction of Arrival (DOA) estimation for low-cost, imperfect arrays in low SNR environments remains a critical challenge. Model-based methods, such as …	Z. Cao; C. Lin; F. Bu;
311	Joint Calibration and Direction-of-Arrival Estimation for Sparse Linear Arrays: Identifiability and Array Design Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we investigate the parameter identifiability problem for sparse linear arrays (SLAs) under certain stochastic assumptions.	W. Zheng; Z. Yang;
312	Heterogeneous Self-Supervised Acoustic Pre-Training With Local Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a new self-supervised pre-training approach to dealing with heterogeneous data.	X. Cui; A. F. M. Saif; B. Kingsbury; T. Chen;
313	Demystifying The Roles of LLM Layers in Retrieval, Knowledge, and Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present a systematic study of depth utilization across diverse dimensions, including evaluation protocols, task categories, and model architectures.	X. Song; K. Wang; P. Li; L. Yin; S. Liu;
314	Proficiency-Aware Adaptation and Data Augmentation for Robust L2 ASR Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Using the Common European Framework of Reference for Languages (CEFR)-graded Speak & Improve corpus, we show that naïve fine-tuning of Whisper reduces the average word error rate (WER) but simultaneously widens performance disparities and disproportionately harms lower-proficiency learners. To address this, we propose two strategies: (i) proficiency-aware multitask learning, jointly optimizing ASR with proficiency classification, and (ii) targeted augmentation, applying spectrogram masking to low-proficiency speech to counter imbalance.	L. Sun; C. Zhu; S. Shi;
315	Unsupervised UAV Detection from Sparse Lidar Via Temporal Dispersion Signatures Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose SLTDS, a training-free framework that uses a Temporal Dispersion Signature (TDS) to separate moving targets from static and noisy background: static structures exhibit a wide temporal spread across scans, whereas UAVs form compact, transient spatiotemporal clusters.	S. Yuan; Z. Qi; Z. Duan; Y. Li; B. Lou;
316	DFFNet: Combining Similar and Different Dual Feature Flows to Achieve Multiple Weather Removal Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose DFFNet, a dual-stream network for all-in-one weather image restoration.	S. Liu; K. Zuo; W. Xu; H. Xiao;
317	DELNet: Continuous All-in-one Weather Removal Via Dynamic Expert Library Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose DELNet, a continual learning framework for weather image restoration.	S. Liu; K. Zuo; H. Xiao;
318	Multi-Course Integration Framework Based on Subject Knowledge Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing studies have optimized individual course structures but remain limited in modeling cross-course semantic associations, detecting content overlaps, and achieving integration, thereby hindering large-scale curriculum reconstruction. To address this, this paper proposes a multi-course integration framework based on disciplinary knowledge graphs.	D. Yu; Y. Zhuang;
319	Residual Diffusion with Fused Accelerated Shared Distribution and Frequency-Adaptive Selection for Unified Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, their iterative sampling process still leaves considerable room for acceleration and efficiency improvements. To overcome these limitations, we propose RDiFAS-FA, which is a novel diffusion-based unified restoration framework.	C. Li;
320	AISHELL6-Whisper: A Chinese Mandarin Audio-Visual Whisper Speech Dataset with Speech Recognition Baselines Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present AISHELL6-Whisper, a large-scale open-source audio-visual whisper speech dataset, featuring 30 hours each of whisper speech and parallel normal speech, with synchronized frontal facial videos.	C. Li;
321	ICASSP 2026 Urgent Speech Enhancement Challenge Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: The ICASSP 2026 URGENT Challenge advances the series by focusing on universal speech enhancement (SE) systems that handle diverse distortions, domains, and input conditions. This …	C. Li;
322	Homomorphic Convolution Reimagined: Eliminating Rotation Bottlenecks for Practical Privacy-Preserving CNN Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a homomorphic convolution framework that substantially reduces rotation cost.	C. Li;
323	CP Loss: Channel-Wise Perceptual Loss for Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This often leads to fail to capture channel-specific dynamics such as sharp fluctuations or trend shifts. To address this, we propose a Channel-wise Perceptual Loss (CP Loss).	Y. Zha;
324	The Achilles’ Heel of Angular Margins: A Chebyshev Polynomial Fix for Speaker Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Furthermore, the formulation fails to generate a sufficiently sharp gradient for hard-to-classify examples. We address these issues by proposing ChebyAAM, a loss that replaces the arccos operation with its Chebyshev polynomial approximation.	Y. Wang; Y. Liu; C. Xiao; C. Lin;
325	Enhancing Domain Generation Through Pluggable Style Randomization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the limited diversity of styles in the source domain often fails to capture real-world distributions. To address this, we propose Feature Style Randomization (FSR) to sensitivity to style-domain information.	B. Xiao; J. Xu; M. Yang; M. Wang; X. Zhang;
326	Temporal-Aware Heterogeneous Graph Reasoning with Multi-view Fusion for Temporal Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a novel framework with temporal-aware question encoding, multi-hop graph reasoning, and multi-view heterogeneous information fusion.	W. Wen;
327	A Data-Driven Framework for Personal Sound Zone Control Addressing Loudspeaker Nonlinearities Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This nonlinearity corrupts the conventionally measured acoustic transfer functions (ATFs) and invalidates the linear control assumptions upon which these systems are built. To address these dual failure points, we propose a complete, two-stage, data-driven framework.	L. Zhou; C. Gong; C. Huang; H. Liu; L. Gan; L. Shi;
328	MBE4D: Multi-View Background-Editable 4D Generation from A Single Image Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present MBE4D, a novel framework for producing dynamic 4D content from a single input image.	X. Chen; D. Yin; F. Yu; X. Li; Z. Tian;
329	DMP-TTS: Disentangled Multi-Modal Prompting for Controllable Text-to-Speech with Chained Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present DMP-TTS, a latent Diffusion Transformer (DiT) framework with explicit disentanglement and multi-modal prompting.	K. Yin;
330	TEXTS-Diff: Texts-Aware Diffusion Model for Real-World Text Image Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In addition, datasets consisting of isolated text samples limit the quality of background reconstruction. To address these limitations, we construct Real-Texts, a large-scale, high-quality dataset collected from real-world images, which covers diverse scenarios and contains natural text instances in both Chinese and English.	H. He; X. Zhan; Y. Bai; R. Lan; L. Sun; X. Chu;
331	Listening To UAV: 3d Trajectory Estimation Via Acoustic Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Traditional multi-stage methods suffer from error accumulation and require extensive parameter tuning, while existing deep learning approaches often use suboptimal acoustic representations that discard essential physical information. To address these limitations, we propose an end-to-end framework that integrates physical principles into the learning process.	L. Yin;
332	MSCT: Differential Cross-Modal Attention for Deepfake Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the traditional multi-modal forgery detection method has the problem of insufficient feature extraction and modal alignment deviation. To address this, we propose a multi-scale cross-modal transformer encoder (MSCT) for deepfake detection.	F. Wei; M. Liu; Y. Wang; J. Wang; S. Zhao; N. Li;
333	Training-Free Multimodal Guidance for Video to Audio Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a novel training-free multimodal guidance mechanism for V2A diffusion that leverages the volume spanned by the modality embeddings to enforce unified alignment across video, audio, and text.	E. Grassucci; G. Galadini; G. Cicchetti; A. Uncini; F. Antonacci; D. Comminiello;
334	Balancing Efficiency and Fidelity In Image Super-Resolution Via Attention-Enhanced Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Prevailing Transformer-based models provide strong global modeling capabilities but still struggle to capture spatial and channel dependencies as well as multi-scale textures, whereas lightweight CNNs and distillation methods reduce complexity at the cost of degraded reconstruction quality. To overcome these challenges, we propose the Efficient Separable Distillation Attention Network (ESDANet) that unifies blueprint separable convolutions with dual residual distillation for efficient feature compression.	Y. Niu; X. Chen; J. Hua; H. Li; Z. Wang;
335	Acoustic Non-Stationarity Objective Assessment with Hard Label Criteria for Supervised Learning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, a novel Hard Label Criteria (HLC) algorithm is proposed to generate global non-stationarity labels for acoustic signals, enabling supervised learning strategies to be trained as stationarity estimators.	G. Zucatelli; R. Barioni; G. Dantas;
336	GRNet: Graph Reconstruction Network for Robust Multimodal Sentiment Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, in real-world scenarios, missing modalities pose significant challenges. To address this, we propose a novel framework, Graph Reconstruction Network (GR-Net), which leverages temporal and neighbor alignment relationships in multimodal data to reconstruct missing information.	Z. Xu; L. Tian; P. Zhang; X. Peng; H. Yao;
337	RGSC: Retrieve and Then Generate Image-Text Pairs from Semantic Concepts for Unsupervised Vision-Language Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing UVLP approaches are mainly generation-based or retrieval-based: the former produces well-aligned but overly simplistic pairs, while the latter provides richer samples but suffers from weak alignment. To tackle these problems, we propose a method to Retrieve and then Generate image-text pairs from Semantic Concepts (RGSC).	Z. Xu; W. Zhao; S. Ji; P. Zhang; K. Zhang; H. Yao;
338	Pianoroll-Event: A Novel Score Representation for Symbolic Music Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Discrete-event representations achieve compact encoding but fail to adequately capture structural invariance and spatial locality. To address these complementary limitations, we propose Pianoroll-Event, a novel encoding scheme that describes pianoroll representations through events, combining structural properties with encoding efficiency while maintaining temporal dependencies and local spatial patterns.	L. Qian; H. Gu; D. Li; B. Cao; Q. Liu;
339	A Novel Multiscale Order-Frequency Spectral Correlation Estimator for Angle-Time Cyclostationary Signals Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, a fast and reliable multiscale OF-SC estimator, termed the order-frequency wavelet cyclic modulation spectrum (OF-WCMS), is proposed by incorporating the continuous wavelet transform.	H. Ren; Z. Zhong; R. -B. Sun; X. Chen;
340	See What You Need: Query-Aware Visual Intelligence Through Reasoning-Perception Loops Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Mimicking human cognition, we propose CAVIA, a training-free framework that closes the loop between reasoning and perception through adaptive bidirectional dialogue.	Z. Dong;
341	Endocaver: Handling Fog, Blur and Glare in Endoscopic Images Via Joint Deblurring-Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose EndoCaver, a lightweight transformer with a unidirectional-guided dual-decoder architecture, enabling joint multi-task capability for image deblurring and segmentation while significantly reducing computational complexity and model parameters.	Z. Wu;
342	IRIS: Low-Complexity High-Efficiency Neural Network Codec for Real-Time Audio Transmission Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces IRIS (Internet Real-time Intelligent Streaming Codec): an end-to-end, low-complexity, low-latency neural audio codec.	Z. Wu;
343	Intrinsic Semantic Consistency Enhancement for Robust Hierarchical Understanding in VLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While effective, our work reveals that this approach often disrupts generalization on coarse-grained concepts and fails to correct the models’ inherent pre-training biases. To address this, we introduce the Intrinsic Semantic Consistency Enhancement (InCoe) framework.	Z. Wu;
344	MAGF-UIENet: A Multiscale Attention Guided Fusion Network for Underwater Image Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose MAGF-UIENet, a hierarchical enhancement network that explicitly aligns with underwater degradation characteristics.	C. Zhao; X. Chen;
345	Physics-Driven 3D Gaussian Rendering for Zero-Shot MRI Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a zero-shot MRI SR framework using explicit Gaussian representation to balance data requirements and efficiency.	S. Liu; L. Zhang; W. Huang; Z. Zhang; Z. Wang;
346	Learning Light Field Implicit Neural Representations for Arbitrary-Scale Spatial-Angular Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The practical utility of Light Field Super-Resolution(LFSR) is fundamentally limited by the rigid, fixed-scale design of existing methods. To break this limitation, we propose Light Field Arbitrary-Scale Spatial-Angular Super-Resolution(LF-ASSR), a novel framework that learns a continuous representation for arbitrary-scale decoding.	W. Xia;
347	High-Resolution Contrastive Framework for Generalizable AI-Generated Image Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A critical challenge for current detectors is their limited generalization ability, which is primarily caused by the loss of high-frequency information during down-sampling and the entanglement of features in the latent space. To address these issues, we propose the High-resolution Contrastive Framework (HCF).	W. Wei;
348	Robust Open-World Object Detection Through Evidential Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A fundamental challenge is the semantic ambiguity where unknown objects are absorbed into the background class during training, impairing their subsequent identification. To address this, we propose the Decoupled Evidential Detector (DEED), which is based on a novel decouple-and-unify strategy.	G. Zhao; H. Xia; X. Li; S. Xia;
349	Synergistic Alignment Network for Robust Cross-Subject RSVP-EEG Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose the Synergistic Alignment Network (SynAlign-Net), a novel framework for robust cross-subject decoding.	B. Fu; W. Gu; F. Li; X. Cai; Y. Niu;
350	EHLM-Gen: Explicit Hierarchical And Localized Visual Context Modeling For Histopathology Report Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, integrating multi-scale information and bridging the semantic gap between pathological patterns and diagnostic texts remain challenging. To address these issues, we propose EHLM-Gen, a novel HRG framework featuring a Explicit Hierarchical and Localized Modeling (EHLM) module.	C. Ye;
351	Accelerating Vehicular Federated Learning Via Convergence-Aware Hierarchical Scheduling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a Hierarchical Decoupling Scheduling (HDS) framework that aligns network decisions with long-term learning objectives.	J. Zhu; X. Gao; Y. Wang; H. Yang; S. Wang;
352	Privacy-Concealing Cooperative Perception for BEV Scene Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel Privacy-Concealing Cooperation (PCC) framework for Bird’s Eye View (BEV) semantic segmentation.	S. Wang; L. Li; M. Santos; G. Wang;
353	Continual Neural Network Retrieval for Ever-Expanding Model Zoo Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Given a library of pre-trained deep learning models, it is hard to find models appropriate to a task with a specific query dataset.	Z. Shang; Y. Liu; E. Liu; A. Argyriou; H. Li; X. Gu;
354	SHAPE: A Submodular-Homotopic Atlas Parcellation Encoder with Contrastive Learning for Individualized Brain Mapping Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing deep learning parcellation methods often lack principled seed selection and overlook neuroscientific priors such as inter-hemispheric homotopy. To address these limitations, we propose a submodular-homotopic atlas parcellation encoder (SHAPE) with contrastive learning.	D. Wen; T. Adali; D. Zhang; V. D. Calhoun; S. Qi;
355	TRM-UNet: An Efficient Event-Guided Motion Deblurring Network Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present TRM-UNet, a compact yet powerful U-shaped architecture for event-based deblurring.	D. Fan; X. Tang; Q. Chen; F. Xu;
356	Motionbeat: Motion-Aligned Music Representation Via Embodied Contrastive Learning and Bar-Equivariant Contact-Aware Encoding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose MotionBeat, a framework for motion-aligned music representation learning.	X. Wang; H. Wang; W. Cai;
357	Robust In-Bed Human Pose and Shape Estimation from Pressure Images with Clinical Awareness Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a clinically grounded dataset collected in simulated intensive care settings, featuring diverse bed configurations and precise 3D annotations.	C. Fang;
358	TTCE: Tracing Time Cycles for Temporal Knowledge Graph Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing TKGE models face two key limitations: they often fail to adequately capture periodic temporal patterns, and they tend to overlook the independence between semantic and temporal information in facts, which restricts their modeling accuracy. To address these limitations, we propose TTCE, a novel TKGE method that maps time to angles to better capture periodic patterns.	Q. Jiang; X. Su; G. Gao;
359	SSG-DIT: A Spatial Signal Guided Framework for Controllable Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose SSG-DiT (Spatial Signal Guided Diffusion Transformer), an efficient framework for high-fidelity controllable video generation.	P. Hu; Y. Gu; L. Luo; F. Ren;
360	AURA: A Stegaformer-Based Scalable Deep Audio Watermark with Extreme Robustness Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Current deep learning-based audio watermarking are suffering from multiple limitations: insufficient robustness, audible noise, and limited capacity. To address these limitations, we propose Adaptive Universal Robust Audiomark(AURA), an audio watermark framework utilizing our novel Stegaformer module.	L. Li; L. Jin; Y. Wang; H. Sun; Z. Hu; C. Maple;
361	2I-Instruct: Generative Joint Empathy Detection and Empathy Intent Classification Via Inter-Task and Inter-Instance Interactions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the differing label sets of the two tasks prevent them from sharing the same decoder, limiting knowledge sharing during decoding. A generative method can fundamentally address this issue.	L. Jiang; D. Wu; Z. Li; Y. Li; H. Huang;
362	TAML: Task-Aware Metric-Driven Meta Learning for Few-Shot Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: With limited samples and high-capacity backbones, adapting these models to learn transferable metric features remains challenging. To address this, we propose TAML, a task-level meta-learning paradigm that constructs meta-metric tasks to explicitly learn metric-relevant representations and enhance generalization.	W. Wu; Y. Zhang; S. Zheng; X. Zhu; Y. Chen; Y. Dang;
363	Compressing Kv Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose POD (Proximal tokens over Distant tokens), a novel framework that allocates memory based on token importance: proximal tokens are fully preserved, while distant tokens are retained in a compact, shared form rather than discarded.	D. Ma;
364	FoodClip: Advancing Food Analysis Via Large-Scale Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Intelligent food analysis is a key technology for health management, yet the application of large-scale pre-training is hindered by critical gaps: the scarcity of quality food data, the inability of standard architectures to capture fine-grained details, and noise from generic vocabularies. To overcome these barriers, we introduce a comprehensive solution.	D. Ma;
365	Print2Volume: Synthetic OCT-based 3D Fingerprint Volume Generator Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces Print2Volume, a novel framework for generating synthetic OCT-based 3D fingerprints.	Q. Miao; H. Wang; K. Qian; H. Sun; Y. Zhang; Y. Dang;
366	Entropy-Guided Data-Efficient Training for Multimodal Reasoning Reward Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we identify a strong correlation between response entropy and accuracy, indicating that entropy can serve as a reliable and unsupervised proxy for annotation noise and sample difficulty. Based on this insight, we propose a novel Entropy-Guided Training (EGT) approach for multimodal reasoning reward models, which combines two strategies: (1) entropy-guided data curation to mitigate the impact of unreliable samples, and (2) an entropy-guided training strategy that progressively introduces more complex examples.	S. Yang; T. Huang; H. Wen; Y. Wang; L. Chen; X. Chu;
367	Bayesian Channel Estimation with Diffusion Probabilistic Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a generalized maximum a posteriori (MAP) channel estimation framework that exploits diffusion probabilistic priors for downlink massive multiple-input multiple-output (MIMO) systems.	C. Jin; Q. Shi; Y. Gu;
368	MeshRF: Residual Fusion of Vertices, Edges, and Faces for Mesh Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the inherent complexity and irregularity of the mesh data present significant challenges for neural networks. To address these challenges, we propose MeshRF, a novel lightweight network that extracts local topological features from multiple dimensions, including vertices, edges, and faces of a mesh, and then performs cross-dimensional fusion by specialized residual convolution and fusion modules.	G. Zheng; L. Yuan; Y. Han; H. Duan; J. Zhang; G. Zhai;
369	Learnable Instance Attention Filtering for Adaptive Detector Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, existing attention filtering mechanisms are typically heuristic or teacher-driven, rather than learned with the student. To address these limitations, we propose Learnable Instance Attention Filtering for Adaptive Detector Distillation (LIAF-KD), a novel framework that introduces learnable instance selectors to dynamically evaluate and reweight instance importance during distillation.	C. Liu; Q. Lan; Z. Ding; X. Chu; Q. Tian;
370	Dual-Graph: Protocol Interaction-Aware Flow Representation for Accurate Unidirectional Encrypted Traffic Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose Dual-Graph, a protocol interaction-aware representation framework for accurate unidirectional ETC.	Z. Gu;
371	A Data-Centric Framework for Scientific Natural Language Inference Via LLM-Driven Information-Theoretic Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose DCA, a data-centric framework that prioritizes better data over more data.	Z. Cheng; H. Cai; H. Sun; Y. Li; Y. Zhang;
372	Parametric Modeling and Localization of Spatially Distributed Targets in OFDM-MIMO Radar Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes a parametric modeling and localization framework for spatially distributed targets in orthogonal frequency-division multiplexing multiple-input multiple-output (OFDM-MIMO) radar systems.	Y. Liu; H. Gao; M. S. Greco; F. Gini;
373	Non-Asymptotic Performance Analysis of DOA Estimation Based on Real-Valued Root-Music Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents a systematic theoretical performance analysis of the Real-Valued root-MUSIC (RV-root-MUSIC) algorithm under non-asymptotic conditions.	J. Liu; W. Zhao; Q. Wang; X. Meng; M. Greco; F. Gini;
374	Ziv-Zakai Bound For Distributed-Array-Based Doa Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, the high sidelobes in the beam-pattern make the approximations used in conventional ZZB derivations overly relaxed. In this work, we address these challenges and derive the ZZB for multi-target distributed DOA estimation.	Z. Zhang; Z. Shi; M. S. Greco; F. Gini;
375	Align to The Pivot: Dual Alignment with Self-Feedback for Multilingual Math Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We attribute the decline to the model’s inconsistent multilingual understanding and reasoning alignment. To address this, we present Pivot-Aligned Self-Feedback Multilingual Reasoning (PASMR), aiming to improve the alignment of multilingual math reasoning abilities in LLMs.	C. Zhao; X. Huang; X. Han; S. Huang; C. Deng; J. Feng;
376	Semantic Token-Guided Generative Latent Coding for Ultra-Low Bitrate Image Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing image compression methods tend to suffer from texture distortion, structural blurring, and loss of semantic details at ultra-low bitrates. To address these issues, this paper proposes a Semantic Token-guided Generative Latent Coding (STGLC) framework for ultra-low bitrate image compression, which leverages high-level semantic information to guide latent representations toward preserving semantic structures.	P. He; D. Gao; Y. Wang; J. Li; M. Yang; G. Shi;
377	Cross-Representation Benchmarking in Time-Series Electronic Health Records for Clinical Outcome Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present the first systematic benchmark to compare EHR representation methods, including multivariate time-series, event streams, and textual event streams for LLMs.	T. Chen; M. Zhu; Z. Luo; T. Zhu;
378	Beyond Lips: Integrating Gesture and Lip Cues for Robust Audio-Visual Speaker Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we move beyond lip-centric approaches and propose SeLG, a model that integrates both lip and upper-body gesture information for robust speaker extraction.	Z. Pan; X. Qian; S. Zhao; K. Zhou; B. Ma;
379	Low-Resource Speech-Based Early Alzheimers Detection Via Cross-Lingual and Few-Shot Transfer Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address the urgent need for early Alzheimer’s disease detection in low-resource speech settings, we present the first cross-lingual screening study covering English, Mandarin Chinese, Spanish, and Greek. By integrating these datasets, we establish a multilingual benchmark and propose a parameter-efficient fine-tuning framework that combines layer-wise analysis of Wav2Vec2.0 with Low-Rank Adaptation (LoRA).	Y. Shao; B. Mei; H. Huo; T. Fang;
380	Structured Pruning Via Multi-Observation Iterative Hard Thresholding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Furthermore, these methods typically exhibit limited architectural flexibility, restricting their applicability beyond convolutional neural networks (CNNs). To address these limitations, we propose a unified structured pruning framework based on Multi-Observation Iterative Hard Thresholding (MO-IHT), which enables consistent and architecture-agnostic pruning across both CNNs and Vision Transformers (ViTs).	H. Yang;
381	Toward Robust Node-Level Graph OOD Generalization with Semantic Awareness Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Out-of-distribution (OOD) generalization has advanced in handling unknown distribution shifts, but it remains relatively unexplored due to 1) the distribution shifts on graphs often occurring simultaneously on node attributes and graph topology and 2) capturing invariant information amid diverse distribution shifts proving a formidable challenge. To overcome these obstacles, in this paper, we introduce a novel framework named GLIDER, comprising two key components.	Q. Tian; C. Zhao; M. Shao; W. Wang; D. Li;
382	Dual-Domain 3D Mesh Watermarking with Adaptive Vertex Grouping Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes a dual-domain 3D mesh watermarking algorithm using adaptive vertex grouping to tackle challenges in topological sensitivity, geometric invariance, and perceptual balance.	S. Ren; W. Deng; M. Liu; T. Cong; S. Wu;
383	Unsupervised Lexicon Learning from Speech Is Limited By Representations Rather Than Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We combine a range of self-supervised speech features (continuous/discrete, frame/word-level) with different clustering methods (k-means, hierarchical, graph-based) on English and Mandarin data.	D. Slabbert; S. Malan; H. Kamper;
384	Towards Noise-Robust Speech Inversion Through Multi-Task Learning with Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a unified framework that integrates Speech Enhancement (SE) and SI models through shared SSL-based speech representations.	S. Tabatabaee; C. Espy-Wilson;
385	Class-Imbalanced Multi-view Clustering Via Synthetic Minority Over-Sampling Technique Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In addition, these methods fail to achieve coordinated alignment when integrating clustering distributions from different views. To address these issues, we propose a framework, called Class-imbalanced Multi-view Clustering via Synthetic Minority OverSampling Technique (CMC-SMOTE).	W. Liu; J. Zhu; J. Tan; Y. Zhang; M. Miao;
386	IADP-SNN: Integer Activation Dropping Spiking Neural Network for Underwater Acoustic Communication Signal Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes an energy-efficient spiking neural network (SNN) for underwater acoustic communication (UAC) signal recognition.	Y. Gong; Y. Fang; M. Huang; L. Ma; Y. Tu; H. Feng;
387	Multi-Agent Deep Reinforcement Learning-Based IoV Secure Data Transmission Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a multi-agent reinforcement learning-based secure transmission framework for IoV systems against a smart attacker that can perform various attack patterns.	X. Lu; Z. Liu; D. Ren; Z. Liu; Y. Bu;
388	Repeater-Assisted Massive MIMO Full-Duplex Communications Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We consider a wireless network comprising multiple single-antenna repeaters that amplify and instantaneously re-transmit received signals in a full-duplex (FD) communication setting.	M. Mohammadi; D. Kudathanthirige; H. A. Suraweera; H. Quoc Ngo; M. Matthaiou;
389	Asynchronous SSVEP-BCI Recognition Via Multi-Start-Point Slice Ensembles and Hard Voting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing asynchronous SSVEP detection approaches face two major limitations: (1) dependence on a single-window decision that fails to utilize temporal information fully; and (2) a single-output mechanism that inadequately distinguishes between intentional control (IC) and non-intentional control (NC) states, resulting in high false positive rates (FPRs). This paper proposes Multi-Start-Point Slice Ensembles with Hard Voting (MSPHV) to alleviate these issues.	H. Wu; Y. Tu; D. Wu;
390	AMGHI-CR: Adaptive Mask-Guided High-Order Interaction Network for Cloud Removal Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing methods do not fully utilize the characteristics of the two modalities during feature extraction and fusion, leading to suboptimal performance. To address this issue, we design a cloud removal network that employs cloud mask to guide feature extraction from both optical and SAR images, while leveraging a fusion module to integrate complementary information.	Y. Zhang; T. Hu; H. Wang; Q. Yan;
391	Memory Footprint Images: A U-Net Approach for Advanced Cache Prefetching Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper reframes prefetching as a fine-grained visual prediction task, representing memory access history as 2D intra-page footprint images to explicitly capture complex spatial locality. We employ a Micro-UNet architecture to learn these visual textures and predict future accesses.	J. Ren; H. Ren; X. Li;
392	M2FNet: Multi-Level Modality-Fused Network for Robust Fingerprint and Finger Vein Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, most existing methods rely on shallow or single-layer fusion, which limits complementarity and is prone to noise. To address this, we propose M2FNet, a hierarchical fusion framework.	W. Miao; X. Zhao; H. Ren; X. Li; J. Ren;
393	Irregular Multivariate Time Series Modeling Via Latent Graph-Guided Gaussian Process Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these approaches neglect correlations among variables, which are intrinsic to multivariate data. To address this limitation, we propose a probabilistic matrix factorization model that jointly captures temporal dynamics and inter-variable correlations through Gaussian processes and graph-based priors.	S. Li; S. Fang; L. Cheng; A. Thiéry; S. Theodoridis;
394	SDGF: Fusing Static and Multi-Scale Dynamic Correlations for Multivariate Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods are limited in modeling these multi-scale dependencies and struggle to capture their intricate and evolving nature. To address this challenge, this paper proposes a novel Static-Dynamic Graph Fusion network (SDGF), whose core lies in capturing multi-scale inter-series correlations through a dual-path graph structure learning approach.	S. Wang; X. Zhangx; Q. Li; J. Cao; Z. Tan;
395	Sustainable Incentive for Model Trading in Decentralized and Personalized Federated Learning Via DAG-Blockchain Consensus Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a Directed Acyclic Graph (DAG)-blockchain-based trading framework with sustainable incentives.	P. Hao; Z. Liu; J. Liu; G. Sun;
396	Sample Efficient Experience Replay in Non-Stationary Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Traditional Experience replay (ER) methods that prioritize solely by temporal difference error (TD-error) conflate policy-induced distribution shift with exogenous environment changes, yielding suboptimal sampling and slow adaptation. To address this limitation, we introduce the Discrepancy of Environment (DoE), a metric that isolates the impact of environment transitions on value estimation.	T. Duan;
397	SURE-Med: Systematic Uncertainty Reduction for Enhanced Reliability in Medical Report Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Third, contextual uncertainty from unverified historical reports causes factual hallucinations, impacting reliability. To address these issues, we propose SURE-Med, a unified framework that reduces uncertainty across three key areas: visual, distributional, and contextual.	Y. Gu; Y. Fan; L. Xu; P. Peng; X. Hu;
398	Towards Practical Differential Privacy for Diffusion-Based Dataset Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Diffusion models excel in dataset distillation but pose significant privacy risks due to their strong memorization of training data.	P. Yao; Y. Qian; Y. Hao;
399	Disentangled Structure Prior Propagation for Guided Depth Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the redundant texture details in color images often interfere with the super-resolution process. To address this challenge, we propose the Disentangled Structure Prior Propagation Network (DSPPNet), which includes the Structure-Texture Disentangler (STD) module that isolates and purifies structure-specific features, and the Structure Prior Propagation (SPP) module that propagates purified structural priors across the network to guide depth reconstruction.	X. Sun; H. Li; X. Ye; R. Xu;
400	Covariance Filters and Neural Networks Over Hilbert Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we take a first step by introducing a novel convolutional learning framework for signals defined over infinite-dimensional Hilbert spaces, centered on the (empirical) covariance operator.	C. Battiloro; A. Cavallo; E. Isufi;
401	Unsupervised TBD-MIG Detectors in Nonhomogeneous Clutter Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper develops an unsupervised manifold projection method for enhancing total Bregman divergence (TBD)based matrix information geometry (MIG) detectors in nonhomogeneous clutter environments.	X. Hua; J. Zhou; H. Wu; Z. Yang; Y. Cheng; H. Wang;
402	Beyond Action Segmentation: An IMU-based Fitness Tracking System with Cycle-Level Granularity Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While existing IMU-based Human Action Recognition (HAR) systems achieve reliable inter-action segmentation, their capability to identify intra-action cycle transitions remains limited, despite the critical need for such cycle-level temporal granularity in many applications like fitness tracking. To fill this gap, this paper introduce a novel IMU-based fitness tracking framework integrating: (1) Duration-Adaptive Heatmaps (DAH) for effective motion transition modeling, (2) transition-consistency data augmentation (TCDA) and (3) multi-task-learning architecture that jointly optimizes human action recognition, human action segmentation, and the targeted fine-grained action cycle recognition (ACR).	T. Li;
403	SLM-SS: Speech Language Model for Generative Speech Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose SLM-SS, a novel approach that applies speech language models to SS, aiming to enhance the intelligibility and coherence of the separated signals.	T. Li;
404	GVNP-GS: Geometry-Anchored and View-Aware Neural Proxies for Sparse-View Gaussian Splatting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents GVNP-GS, a novel neural proxy framework for sparse-view 3D Gaussian reconstruction.	T. Bai; H. Chen; Z. Qiu; Z. Dai;
405	Activity Recognition Using Inaudible Acoustic FMCW Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we leverage Frequency-Modulated Continuous Wave (FMCW) of inaudible acoustic signals to achieve accurate and generalizable activity recognition as well as static target detection.	R. Zhou;
406	DiffDMCA-Net: A Difference-Aware Registration Framework with Depthwise Coordinate Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes DiffDMCA-Net, a difference-aware 3D medical image registration framework designed to enhance the alignment of anatomical structures across subjects and modalities.	Z. Xue; K. He; D. Xu; J. Gong;
407	Analytic Incremental Learning for Sound Source Localization with Imbalance Rectification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These often lead to catastrophic forgetting, significantly degrading the localization accuracy. To mitigate these issues, we propose a unified framework with two key innovations.	Z. Fan; Y. Chen; Q. Zhang; K. Chen; X. Qian;
408	Design of Differential Microphone Arrays Via A 3D Spatial Difference Operator Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents a more flexible 3D DMA and beamformer design method according to the principle of finite difference approximations, offering new insights into the microphone design and soundfield measurement.	K. Zhao; X. Zhao; G. Huang; J. Chen; J. Benesty; Z. Cvetković;
409	MIRAGE: Noise-Aware Bayesian Calibration with Mutual Information for Reliable RAG Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose MIRAGE, a robust uncertainty calibration framework for RAG.	L. Xie; J. Cai; W. Yang; H. -N. Dai; H. Wang;
410	From ECG Signals to Diagnostic Reports: A Unified Framework with Multimodal Encoder and Fine-Tuned LLM for Automated Report Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We thus develop ECG-ReportGen, an ECG-focused MLLM that integrates our dual-channel encoder.	S. Wang; Z. Pang; S. Wang; X. Xu;
411	EDITS: Enhancing Dataset Distillation with Implicit Textual Semantics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose EDITS, a novel framework that exploits the implicit textual semantics within the image data to achieve enhanced distillation.	Q. Xia; J. Du; G. Lu; Z. Shu; J. Wang;
412	Melos: Sentence-To-Section Training with Multi-Task Learning for LLM-Driven Song Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address these, we propose a large language model (LLM)-based framework with a novel two-stage training strategy that progresses from sentence-level to section-level.	D. Wu; J. Lu; B. Su; S. Lei; X. Cai; Z. Wu;
413	An Adaptive Sampling Method Based on Reinforcement Learning for Wind Power Forecasting Under Extreme Weather Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing WPF methods often compromise overall model performance when adapting to extreme events, as they tend to overfit to rare, extreme samples. To address this challenge, we propose an adaptive sampling method (ARS-WPF) based on reinforcement learning (RL), which enhances forecasting for extreme events without sacrificing accuracy under normal weather.	R. Guo;
414	Dual Contrastive Document Clustering with Multi-Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address those problems, we propose the Dual Contrastive Document Clustering with Multi-Representation (DCCMR) framework, which is designed to learn complementary representations and achieve structure consensus, thereby enhancing clustering quality.	W. Ding; R. Huang; R. Bai; Z. Cheng; H. Ding;
415	Respire-Mamba C-UNet: Consistency-Trained Autoencoder for High-Fidelity Respiratory Sound Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Respire-Mamba C-UNet, a unified framework that integrates a physiology-aware SincConv frontend with power-law scaling, a Pyramid-UNet encoder for multi-scale representation, and a consistency-trained UNet encoder–decoder equipped with a Temporal Mamba bottleneck, further enhanced by variance-preserving rescaling and per-band frequency gating.	Rishabh; Y. Meena; D. Kumar; K. Singh; Nidhi;
416	GSTNET: A Geospatial-Temporal Graph Network for Group Person Re-Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here, we propose the Geospatial-temporal Graph Network (GstNet).	P. Hu; J. Li; F. Hong; Y. Peng; J. Wu; R. Hu;
417	Dynamic Balanced Cross-Modal Attention with Gated Sequence Restoration: Towards Robust Multimodal Sentiment Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, practical MSA applications suffer from accuracy degradation due to modality incompleteness and modality imbalance. To address these challenges, we propose the Dynamic Balanced Cross-modal Attention with Gated Sequence Restoration (DBCA-GSR) framework.	R. Geng; Q. Sun; H. Cao; X. Wang;
418	GRASP: Group-Shapley Feature Selection for Patients Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce GRASP, a novel framework that couples Shapley value–driven attribution with group-L21 regularization to extract compact and non-redundant feature sets.	Y. Luo; S. Li; Z. Cao;
419	ALFM: Adaptive Local Feature Mining of Vision-Language Models for Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Most existing zero-shot OOD detection methods rely on the matching similarity between global visual features and in-distribution (ID) class prompts, and exhibit limited utilization of local features. To address this, we propose a post-hoc method—Adaptive Local Feature Mining(ALFM), which dynamically provides more critical local information to help distinguish ID samples from OOD samples.	Y. Ge; S. Feng; B. Zhang; S. Li; B. Bao; C. Wang;
420	Pixel-Patch Graph Regularized Group Sparse Representation for Single-Image Denoising Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Most GSR-based methods focus on preserving similarity but often ignore noise with spatial correlations, resulting in over-smoothing. In this paper, pixel-patch graph regularized group sparse representation (PPGR-GSR) is proposed to address this limitation.	X. Hou; X. -Q. Jiang; S. Zhou; H. Feng;
421	Seeing You in The Noise: Achieving Degraded Object Detection with Positive Text Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To conduct a robust degradation-oriented detector, we propose SYNO, i.e., Seeing You in the NOise, which introduces the positive text guidance to an improved DETR-based detection unit.	X. Lv; Y. Liu; H. Yang; Y. Guo; F. Wang;
422	CADD: Condition-Anchor Dataset Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose CADD (Condition Anchor-based Dataset Distillation), a novel framework that shifts the distillation process into the condition space of an image-to-image diffusion model.	J. Cao; Y. Liu; C. Lang; L. Jiang;
423	Vision KAN: Towards An Attention-Free Backbone for Vision with Kolmogorov-Arnold Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce Vision KAN (ViK), an attention-free backbone inspired by the Kolmogorov-Arnold Networks.	Z. Yang; J. Zhang; X. Luo; X. Wu; Z. Lu; L. Shen;
424	Latent Temporal Discrepancy As Motion Prior: A Loss-Weighting Strategy for Dynamic Fidelity in T2V Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Unfortunately, existing diffusion models rely on static loss for all scenarios, constraining their ability to capture complex dynamics. To address this issue, we introduce Latent Temporal Discrepancy (LTD) as a motion prior to guide loss weighting.	M. Wu;
425	Feasibility of Ectopic Beat Detection and Count Estimation from Smartwatch-Based Photoplethysmography Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we developed a pipeline using free-living smartwatch PPG data for ectopic beat detection and count estimation.	B. Lu;
426	AccelGS: An Acceleration Framework for Large-Scale 3D Gaussian Splatting Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose AccelGS, an acceleration framework for large-scale 3DGS training.	Y. Kou;
427	ALMA-Chor: Leveraging Audio-Lyric Alignment with Mamba for Chorus Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose ALMA-Chor, an end-to-end framework that jointly models audio and lyrics for chorus detection.	R. Bao;
428	Q4Q: Quantum for Quantization in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Q4Q (Quantum-computing for Quantization), a quantum-based method for efficient bitwidth allocation in model quantization.	G. Li; Y. Li; J. Jia; T. Deng; Y. Tao;
429	Dithered 1-bit Quantization and Sparse Reconstruction for Near-Field 3D Millimeter-Wave Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In contrast, 1-bit quantization is attractive due to its simple hardware and reduced data volume, yet the severe information loss significantly limits imaging performance. To address this issue, this paper proposes a near-field 1-bit 3D MMW imaging method that integrates dithering quantization with sparse recovery.	S. Ge; A. Yang; X. Zhang; N. Jiang; X. Huang;
430	Specular-Aware Lambert Reconstruction for Monocular Endoscopic Depth Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose an optical-flow-guided Lambert surface reconstruction for robust highlight suppression and depth recovery.	J. Shan; S. Zhang; T. Wang; J. Wu; X. Qin;
431	Temporal Distillation for Music Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Developing music foundation models faces key challenges: acquiring massive datasets, designing effective learning methods, and capturing long-term dependencies. To address these issues, we propose Harmonia, a novel self-distillation framework designed to introduce a strong temporal inductive bias.	S. Wei; B. Zhu;
432	Integrating Speaker Embeddings and LLM-Derived Semantic Representations for Streaming Speaker Diarization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Given the superior capability of large language models (LLMs) in long-context understanding, we employ an LLM for long-range semantic modeling.	T. Cheng;
433	TRJSCC: Text-Guided ROI-Aware Deep Joint Source-Channel Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing methods are unable to guarantee the fidelity of Regions of Interest (ROI) under severe bandwidth constraints and adverse channel conditions. To address this issue, we propose a text-guided ROI-aware JSCC framework, leveraging textual cues to direct the encoder’s focus on the ROI and incorporate semantic priors for improved robustness.	R. Fan; S. Ma; X. Xie; G. Shi;
434	When Mamba Meets KAN: A Hybrid Learning Network for Electric Vehicle Charging Demand Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose HyKANet, a novel hybrid spatio-temporal prediction framework that integrates multi-scale temporal modeling with advanced nonlinear decoding.	M. Hao; Q. Ren;
435	A Data Driven Design for Optimal Sampled Synchronization of Chaotic Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, a novel sample-and-partial-hold synchronization scheme is considered, in which the sampling period and synchronization hold duration are jointly optimized.	B. Tu; Z. Qu; M. A. Simaan;
436	Bridging Legal Expertise and LLMs: A Cooperative Logical Reasoning Framework for Sentencing Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Sentencing Recommendation Logical Reasoning, a novel framework that integrates key factual elements with computable mathematical logic to enhance the adaptability of LLMs to legal reasoning tasks.	W. Shen; H. Liu; J. Yin; J. Deng;
437	GazeFormer-MoE: Context-Aware Gaze Estimation Via Clip and MoE Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present a semantics modulated, multi scale Transformer for 3D gaze estimation.	X. Zhao; X. Chen; A. Chaddad;
438	A Centralized Planning With Decentralized Execution Framework for Counter-UAV Operations in Urban Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents a Centralized Planning with Decentralized Execution (CPDE) framework for counter-UAV operations in complex urban environments.	S. Cong; Z. Jiang; Y. Huang; T. Pan; C. Yu;
439	VARDet: Visual Autoregressive Multi-Scale Prediction and Clip-Guided Semantics for UAV Small-Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Detecting small, crowded, and semantically similar objects in Unmanned Aerial Vehicle (UAV) imagery remains challenging. To address these issues, we present VARDet, a transformer-based detector that performs visual autoregressive prediction across scales and detection heads.	S. Wang; Z. Ou; H. Zhang;
440	Foreground-Enhanced Coarse-to-Fine Detection for UAV Small Objects Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Object detection is pivotal for enhancing Unmanned Aerial Vehicle (UAV) remote sensing capabilities, yet challenges persist in detecting objects under sparse foreground regions, small and crowded instances. To address these issues, we propose One Stage Detects Twice(OSDT), which employs a one-stage detection strategy augmented by two key components: a Foreground Heatmap Generation(FHG) module and a Learnable Foreground Enhancement(LFE) module.	S. Wang; Z. Ou; H. Zhang;
441	Dynabits: Token Aware Weight-Activation Quantization for Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Post-training quantization (PTQ) reduces memory and latency, yet existing methods struggle with three challenges: (i) activation outliers that are hard to quantize, (ii) weight degradation under low-bit quantization, and (iii) distinct activation ranges of visual and textual tokens. We present Dynabits, a PTQ framework that addresses these issues.	S. Wang; Z. Ou; H. Zhang;
442	Magnet Tracking By A Magnetic Sensor Array with Interactive Multiple Model Estimation For Small-Scale Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes a passive magnetic source tracking method for potential medical applications that can operate in environments without line-of-sight.	H. Hou; S. Xu; K. C. Ho; M. Cai; K. Doğançay; T. Xu;
443	EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose EntroLLM, a compression framework combining mixed quantization and entropy coding to reduce storage while preserving accuracy.	A. Sanyal; G. Datta; P. Mukherjee; S. P. Chinchali; M. Orshansky;
444	FWF-Net: A Learnable Fourier-Wavelet Fusion Network for PDE Operator Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While frequency domain techniques reduce the computational burden of integral operators, existing methods remain insufficient for complex and varying physical fields because they focus solely on network architecture improvements. To bridge this gap, we propose the Fourier-Wavelet Fusion Network (FWF-Net).	X. Meng; M. Zou; Z. Gan; S. Leng;
445	RiskFuzz: Risk-Guided Fuzzing for Deep Learning Libraries Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing fuzzing methods for deep learning libraries mainly focus on generating inputs that satisfy API constraints or trigger bugs, while neglecting the varying risks across APIs. Under limited time budgets, uniform testing leads to insufficient exploration of high-risk APIs and wasted effort on low-risk ones.To address this challenge, we present RiskFuzz, the first fuzzing approach that allocates different testing time and mutation strategies across APIs according to their risk scores.	Z. Pan; X. Chen; Z. Zhang; L. Chen; G. Shi;
446	ARCHI-TTS: A Flow-Matching-Based Text-to-Speech Model with Self-Supervised Semantic Aligner and Accelerated Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although diffusion-based, non-autoregressive text-to-speech (TTS) systems have demonstrated impressive zero-shot synthesis capabilities, their efficacy is still hindered by two key challenges: the difficulty of text-speech alignment modeling and the high computational overhead of the iterative denoising process. To address these limitations, we propose ARCHI-TTS that features a dedicated semantic aligner to ensure robust temporal and semantic consistency between text and audio.	C. Wu; J. Deng; Z. Liu; Z. Dai; H. He; Q. Kong;
447	A Medical Multimodal Diagnostic Framework Integrating Vision-Language Models and Logic Tree Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a diagnostic framework built upon LLaVA that combines vision-language alignment with logic-regularized reasoning.	Z. Zang;
448	The Muse Benchmark: Probing Music Perception and Auditory Relational Reasoning in Audio LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce the Music Understanding and Structural Evaluation (MUSE) Benchmark, consisting of 10 tasks designed to probe fundamental music perception skills.	B. J. Carone; I. R. Roman; P. Ripollés;
449	Simplicial Gaussian Models: Representation and Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose the simplicial Gaussian model (SGM), which extends Gaussian PGMs to simplicial complexes.	L. Marinucci; G. D’Acunto; P. Di Lorenzo; S. Barbarossa;
450	Learning Consistent Causal Abstraction Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose an efficient search procedure, solving the local problems with SPECTRAL, our iterative method with closed-form updates and suitable for positive definite and semidefinite covariance matrices.	G. D’Acunto; P. Di Lorenzo; S. Barbarossa;
451	Topological Signal Processing for 3D Point Cloud Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our goal in this paper is to apply the topological signal processing (TSP) framework to the analysis of 3D Point Clouds (PCs) represented on simplicial complexes.	T. Cattai; S. Sardellitti; S. Colonnese; S. Barbarossa;
452	AR&D: A Framework for Retrieving and Describing Concepts for Interpreting AudioLLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce the first mechanistic interpretability framework for AudioLLMs, leveraging sparse autoencoders (SAEs) to disentangle polysemantic activations into monosemantic features.	T. Faisal; T. D. Huy; S. Pan; J. Stoddard; Z. Liao;
453	DIMO: Dual-Strategy Learning for Ambiguous Samples in Class-Imbalanced Facial Expression Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing approaches often overlook this issue, thereby compromising model performance. To address this issue, we propose the DIscriminative Margin Optimizer (DIMO), which enhances the discriminative capability for ambiguous samples and improves overall robustness through a dual-strategy design.	L. Zeng; L. Luo; Y. Gu; F. Ren;
454	Self-Supervised Learning with Efficient On-Device Training For Intra-Patient Cardiac Arrhythmia Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In cardiac arrhythmia detection, labeled electrocardiography (ECG) data is limited especially at the individual level. To address this, we propose a subject-dependent VICReg self-supervised learning framework with on-device training.	Z. Zhong; C. Park; J. Gu;
455	Coarse-to-fine Trajectory Prediction Via Time-Aware Interaction Predictor and Conditional Diffusion-based Refiner Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods typically rely on history-based interaction modeling, which often fails to capture potential future impacts among agents, leading to inaccurate predictions. To address this, we propose TAIDR, a two-stage framework that combines Time-Aware future Interaction modeling with Diffusion-based Refinement.	G. Zheng; J. Lin; Z. Liu; Z. Li; F. Rong; S. Su;
456	Quadratic Flow: Constant Acceleration As A Prior for Learning Better Velocity Field Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While recent single-step generation approaches achieve efficient sampling by modeling the average velocity field, their global uniform-velocity constraint limits generalization to complex generation tasks. To mitigate this limitation, we propose Quadratic Flow, a generative method that dynamically models sampling trajectories through average velocity fields under the assumption of constant acceleration.	Z. Wu; B. Sun; J. He;
457	CERF: Communication-Efficient and Retraining-Free Collaborative Perception Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, most existing methods rely on transmitting and fusing dense feature maps for collaboration, which incurs inevitable communication overhead and heterogeneity challenges, limiting their practicality for real-world deployment. To address these challenges, we propose CERF, a novel Communication-Efficient and Retraining-Free framework for open heterogeneous collaborative perception.	J. Hao;
458	PSCC-Net: A Siamese Network Framework for Pseudo-Video Temporal Modeling and Spatiotemporal Fusion in Remote Sensing Change Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose PSCC-Net, a novel framework that reformulates change detection as a video understanding problem.	H. Dong; Y. Wang; X. Li; Y. Zhang;
459	Contextual Relationship Feature-Enhanced Steganalysis for Social Texts Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Most existing research focuses on analyzing individual media features, neglecting the social connectivity aspects that are crucial for identifying steganographic texts in fragmented social media environments. To address this gap, we propose a feature-enhanced linguistic steganalysis model that integrates improved node features, optimized message passing, and robust feature fusion.	K. Huang; X. Zhang; W. Wu; Y. Wei; Z. Yang;
460	HuntingLLM: Risk-Driven Automated Red Teaming with Adaptive Attack Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a risk-driven, end-to-end automated red-teaming framework for LLMs that unifies three components: a Risk Prototype Generator that compiles a dual-axis taxonomy (content/application risks) and system goals into goal-aligned prototypes; a feedback-adaptive attack engine that tunes strategy operators to improve success and efficiency under small query budgets; and a two-stage evaluation pipeline (rule priors + fine-tuned LLM judge) with a unified Red Team Evaluation Record (RTER).	Q. Feng; H. Wang;
461	Inverse Rendering for High-Genus 3D Surface Meshes from Multi-View Images with Persistent Homology Priors Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Reconstructing 3D objects from images is inherently an ill-posed problem due to ambiguities in geometry, appearance, and topology. This paper introduces collaborative inverse rendering with persistent homology priors, a novel strategy that leverages topological constraints to resolve these ambiguities.	X. Gao;
462	From Past To Future: Leveraging Event Causality For Explainable Prediction With Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduces CAPE (Causal-Aware event Prediction with Explanation), a comprehensive framework that uses natural-language events related to the target entity for open-ended prediction and generates causal explanations.	X. Gao;
463	Refgen: Reference-Guided Synthetic Data Generation for Anomalous Sound Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Building upon recent advances in text-to-audio (TTA) generation, we propose RefGEN, a reference-guided generative framework that introduces two key innovations.	W. Liang;
464	HACG: Contribution-Based Dynamic Grouping with Hierarchical Graph Attention for Multi-Agent Cooperation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Conventional multi-agent reinforcement learning (MARL) methods that rely on flat cooperative structures often struggle to achieve intricate, hierarchical coordination, which is critical for effective multi-agent collaboration. To address this, we propose HACG, a framework for Hierarchical Graph Attention with Contribution-based Dynamic Grouping.	T. Wei; Z. Wang; C. Yi; S. Chen; L. Lu; X. Gu;
465	Hadamard Tensor Ring for Efficient Low-Rank Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Hadamard Tensor Ring (HTR), a novel parameter-efficient fine-tuning method that leverages tensor ring decomposition with Hadamard product operations for large-scale pre-trained models.	H. Tong; G. Xu; Y. Chen;
466	Robust Supervised Learning for Ballistocardiogram Quality Assessment Under Limited Inter-Rater Agreement Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a machine learning framework for ordinal regression of BCG signal quality (poor, unreliable, borderline, good, and excellent), leveraging time-frequency features from empirical mode decomposition and Welch’s method.	M. S. Islam;
467	Emotional Dimension Control in Language Model-Based Text-To-Speech: Spanning A Broad Spectrum of Human Emotions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Emotional text-to-speech (TTS) systems struggle to capture the full spectrum of human emotions due to the inherent complexity of emotional expressions and the limited coverage of existing emotion labels. To address this, we propose a language model-based TTS framework that synthesizes speech across a broad range of emotional styles.	K. Zhou; Y. Zhang; D. Ng; S. Zhao; H. Wang; B. Ma;
468	Aerogspnet: Graph Signal Processing for Multi-Task Aerodynamic Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose AeroGSPNet, a graph signal processing framework that models vehicle surface point clouds or meshes as graph signals for efficient multi-task aerodynamic prediction.	J. Wu; X. Feng; Z. Ji;
469	Localizing Speech Deepfakes Beyond Transitions Via Segment-Aware Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Thus, we propose Segment-Aware Learning (SAL), a framework that encourages models to focus on the internal structure of segments.	Y. Mao; W. Huang; Y. Qian;
470	ILSA: Information Loss-Guided Sparsity Allocation for Pruning Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose ILSA, an Information Loss-guided Sparsity Allocation framework that employs virtual pruning to estimate perturbations and evaluates layer sensitivity via KL divergence, cosine similarity, and L2 distance.	L. Li; Y. Wang; Z. Wang; F. Bao;
471	M2DP: A Multi-Scale Association Learning Framework for Multi-Category Demand Prediction Under Public Health Emergencies Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose M2DP, a novel multi-scale association learning framework for multi-category demand prediction under PHEs.	X. Zhang; L. Lin; K. Xia; Y. Feng; Q. Zhang; S. Wang;
472	DLCRR: Differential Learning and Causal Representation Restoration Model for Event Causality Identification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing methods often perform poorly due to the lack of explicit causal words and are easily misled by textual interference information. To address this issue, we propose a Differential Learning and Causal Representation Restoration Model for Event Causality Identification(DLCRR).	R. Zhao;
473	Spiking Neural Networks for Ordinal Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our primary contribution is the successful deployment of SNNs for ordinal regression tasks, demonstrating competitive performance comparable to traditional deep learning approaches while achieving substantial reductions in energy consumption.	W. Ma; A. F. Sequeira; J. S. Cardoso;
474	S3-3DGS: Steering Spherical-Harmonic Subspaces for Secure 3DGS Watermarking Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As 3D Gaussian Splatting (3DGS) gains adoption in VR and digital twins, its point-based design is prone to unauthorized manipulation, and existing 2D and NeRF watermarking methods offer limited structural protection. To address these challenges, we propose a Steering Spherical-harmonic Subspace framework (S3-3DGS).	W. Ma; Y. He; Y. Guo; L. Shen; J. Wang;
475	Prototype-Guided Cross-Modal Contrastive Learning for Continual Audio-Visual Sound Separation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This task is challenged by catastrophic forgetting, where new learning overwrites prior knowledge, and cross-modal interference, where features from different categories or modalities entangle in the representation space, reducing discriminability. In this paper, we propose Prototype-guided Cross-modal Contrastive Learning (PGCCL) to address these issues.	W. Ma; H. Wen; Z. Gao; Q. Xu; K. Xu;
476	TLD-PGD: Two-Stage Low Frequency Degradation Adversarial Attack in Hyperspectral Image Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes the Two-Stage Low-Frequency Degradation Projected Gradient Descent (TLD-PGD) framework to address this.	Z. You; Y. Liu; S. Xiao; W. Li; Y. Zhang;
477	Adaptively Weighted Multi-Modal Joint Entropy with Dynamic Allocation and Fault-Tolerant Fusion for Industrial Diagnostics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes the Adaptive Weighted Multi-modal Joint Entropy (AWMJE), which quantifies the complexity of joint signals through dynamic weighting of modality discriminability, signal-to-noise ratio(SNR), and operational parameters.	Y. Niu;
478	ACAVCaps: Enabling Large-Scale Training for Fine-Grained and Diverse Audio Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, progress in this domain is hindered by existing datasets, which lack the scale and descriptive granularity required to train truly versatile models. To address this gap, we introduce ACAVCaps, a new large-scale, fine-grained, and multi-faceted audio captioning dataset.	Y. Niu;
479	A Robust Method for Gear Failure Detection and Severity Estimation Based on Multi-Sensor Physical Feature Fusion and Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This study proposes a novel robust fault severity regression algorithm tailored for cross-domain conditions.	Y. Niu;
480	DMM-JA: A Dynamic Multimodal Fusion and Multi-Scale Modeling Framework with Jump-Awareness for Industrial Equipment RUL Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose DMM-JA, a dynamic multimodal fusion and jump-aware model for RUL prediction.	Y. Niu;
481	Diffemotalk: Audio-Driven Facial Animation with Fine-Grained Emotion Control Via Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose DiffEmoTalk, a novel framework that generates diverse and expressive animations with detailed emotion control.	K. Gao; Y. Zhu; J. Liu; X. Wang; X. Jin; J. Nie;
482	EEG and Eye-Tracking Driven Dynamic Target Speaker Extraction with Spontaneous Attention Switching Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Previous studies have primarily focused on static target speaker scenarios, ignoring the dynamic target speaker switching common in multi-speaker environments. To address this limitation, we propose a novel neuro-inspired multimodal framework that effectively tackles dynamic target speaker extraction.	X. Wang;
483	Up to 36x Speedup: Mask-Based Parallel Inference Paradigm for Key Information Extraction in MLLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, their reliance on autoregressive inference, which generates outputs sequentially, creates a significant efficiency bottleneck, especially as KIE tasks often involve extracting multiple, semantically independent fields. To overcome this limitation, we introduce PIP: a Parallel Inference Paradigm for KIE.	X. Wang;
484	Enhancing Cross-View Geo-Localization Generalization Via Global-Local Consistency and Geometric Equivariance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose EGS, a novel CVGL framework designed to enhance cross-domain generalization.	X. Wang;
485	FoRSe: A Retrieval-Augmented Framework for Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing methods still face challenges in fully exploiting historical information and optimizing the retrieval module. To overcome this limitation, we propose FoRSe, a retrieval-augmented forecasting framework.	X. Wang;
486	Where, Not What: Compelling Video LLMs to Learn Geometric Causality for 3D-Grounding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose What Where Representation Reforming (W2R2), a training framework that reshapes internal representations without modifying the inference time architecture.	Y. Zhong;
487	PINDEFECTNET: A Transformer Framework for Detecting Defects in Millimeter-Scale Power Line Locking Pins Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing unmanned aerial vehicle (UAV)-based inspection methods struggle to detect these tiny components due to extreme scale changes and cluttered backgrounds. To tackle these issues, we propose PinDefectNet, a transformer-based framework for accurate pin detection and defect classification.	X. Ge; H. Gao; C. Wang; X. Xu; J. Xiong; H. Hu;
488	Parallax-Aware Spatial Transformer: Fusing Physics and Learning for Terahertz Near-Field Localization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Simulation results demonstrate PAST achieves millimeter-level and 0.06-degree accuracy, defining a new and highly efficient approach for THz near-field localization.	Z. Zeng; C. Han;
489	Rotationally-Invariant Amp for Compressed Sensing with Multiple Measurement Vectors Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce RIAMP-DF-MMV, an approximate message passing (AMP) algorithm whose Onsager terms are linear combinations of divergence-free (DF) components from past iterates.	S. Luo; S. Liu; J. Ma; C. Xu; X. Wang;
490	IBPCodec : A Low-Bitrate Lightweight Speech Codec With Inter-Band Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a low-bitrate lightweight speech codec with inter-band prediction, called IBPCodec.	P. Zhou; X. Chen; P. Lu; J. Wang; S. Zhao;
491	Dual Correlation Adaptive Hierarchical Spatio-Temporal Transformer for Stock Price Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose DCAH, a Dual Correlation Adaptive Hierarchical Spatio-Temporal Transformer, which introduces decomposition into stock price forecasting.	S. Wang; W. Yan; Y. Tan;
492	Gaussian Locality Prior For Contrast–Reconstruction Learning: State–Space Model-Based Time–Series Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We revisit unsupervised TSAD with a simple recipe: inject a learnable Gaussian Locality Prior into similarity/attention logits to encode temporal clustering [3], [4], and train with two signals—a two-view KL consistency aligning dependency distributions [5], [6] and a lightweight reconstruction loss providing amplitude/local-shape cues [7].	T. Han; Y. Li; Q. Xiong; S. Zheng; J. Guo;
493	Low-Bandwidth High-Fidelity Speech Transmission with Generative Latent Joint Source-Channel Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This poses severe challenges for high-fidelity speech transmission over bandwidth-limited weak networks. To tackle this, we propose a new Generative Latent JSCC (GL-JSCC) framework.	G. Li; S. Yao; S. Wang; Z. Liu; K. Niu; J. Dai;
494	Sparkling Together: Joint Editing for Multi-Accessory Virtual Try-On Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a unified diffusion framework for accessory try-on.	Z. Xu; X. Li; J. Zhang; J. Wan; C. Chen; J. Wu;
495	Neuro-Symbolic Reachability Reasoning for Physically Grounded Embodied Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present a neuro-symbolic framework for physically grounded EQA that incorporates physical constraints into the reasoning process through differentiable fuzzy logic.	X. Qi; J. Cao; C. Fan; H. Luo; J. A. McCann; H. Wang;
496	Message Passing-Based Parallel Multi-Target Joint Detection and Estimation in Distributed Passive MIMO Radar Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes a message passing-based parallel inference method for multi-target joint detection and parameter estimation in distributed passive multiple-input multiple-output (MIMO) radar.	B. Li; J. Li; Q. Guo; H. Kang; X. Wang;
497	DPFAN: Dual-Path Feature-Adaptive Network for KPI Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a Dual-Path Feature-Adaptive Network (DPFAN) that resolves feature heterogeneity through separate processing paths.	Y. Zhang; H. Zheng; S. Jian; Y. Yuan; K. Lu;
498	Lagrangian Deep Learning for Private RIS-aided Localization: An Active Sensing Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we study user localization in the presence of adversarial position estimation.	G. Stamatelis; G. C. Alexandropoulos;
499	Integrating Stacked Intelligent MetaSurfaces and Power Control for Dynamic Edge Inference Via Over-the-Air Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces a novel framework for Edge Inference (EI) that bypasses the conventional practice of treating the wireless channel as noise.	K. Stylianopoulos; G. C. Alexandropoulos;
500	Joint Active RIS Configuration and User Power Control for Localization: A Neuroevolution-Based Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A novel multi-agent algorithm for the joint control of the RIS phase configuration and the user transmit power is presented, which is based on a hybrid approach integrating NeuroEvolution (NE) and supervised learning.	G. Stamatelis; H. Chen; H. Wymeersch; G. C. Alexandropoulos;

This table only includes 500 papers selected by our daily digest algorithm. To continue with the full list (~4,500 papers), please visit Paper Digest: ICASSP-2026 (Full List).