4, TITLE: MemRL: Self-Evolving Agents Via Runtime Reinforcement Learning on Episodic Memory AUTHORS: SHENGTAO ZHANG et. al. CATEGORY: cs.CL [cs.CL] HIGHLIGHT: While Large Language Models possess strong reasoning capabilities, they struggle to emulate this self-evolution: fine-tuning is computationally expensive and prone to catastrophic forgetting, while existing memory-based methods rely on passive semantic matching that often retrieves noise. To address these challenges, we propose MemRL, a framework that enables agents to self-evolve via non-parametric reinforcement learning on episodic memory. [Save to Library]
5, TITLE: UniCorn: Towards Self-Improving Unified Multimodal Models Through Self-Generated Supervision AUTHORS: RUIYAN HAN et. al. CATEGORY: cs.CV [cs.CV, cs.AI] HIGHLIGHT: We formalize this discrepancy as Conduction Aphasia, a phenomenon where models accurately interpret multimodal inputs but struggle to translate that understanding into faithful and controllable synthesis. To address this, we propose UniCorn, a simple yet elegant self-improvement framework that eliminates the need for external data or teacher supervision. [Save to Library]
Tracking Results: 1*, TITLE: Current Agents Fail to Leverage World Model As Tool for Foresight AUTHORS: CHENG QIAN et. al. CATEGORY: cs.AI [cs.AI, cs.CL, cs.LG] HIGHLIGHT: Agents built on vision-language models increasingly face tasks that demand anticipating future states rather than relying on short-horizon reasoning. Generative world models offer a promising remedy: agents could use them as external simulators to foresee outcomes before acting. [Save to Library]
5*, TITLE: Rethinking Table Pruning in TableQA: From Sequential Revisions to Gold Trajectory-Supervised Parallel Search AUTHORS: YU GUO et. al. CATEGORY: cs.CL [cs.CL] HIGHLIGHT: However, existing pruning methods typically rely on sequential revisions driven by unreliable critique signals, often failing to detect the loss of answer-critical data. To address this limitation, we propose TabTrim, a novel table pruning framework which transforms table pruning from sequential revisions to gold trajectory-supervised parallel search. [Save to Library]
8*, TITLE: SoK: Privacy Risks and Mitigations in Retrieval-Augmented Generation Systems AUTHORS: Andreea-Elena Bodea ; Stephen Meisenbacher ; Alexandra Klymenko ; Florian Matthes CATEGORY: cs.CR [cs.CR, cs.CL] HIGHLIGHT: Numerous recent works have explored various aspects of privacy risks in RAG systems, from adversarial attacks to proposed mitigations. With the goal of surveying and unifying these works, we ask one simple question: What are the privacy risks in RAG, and how can they be measured and mitigated? [Save to Library]
9*, TITLE: FLEx: Language Modeling with Few-shot Language Explanations AUTHORS: Adar Avsian ; Christopher Richardson ; Anirudh Sundar ; Larry Heck CATEGORY: cs.CL [cs.CL, cs.LG] HIGHLIGHT: Natural language explanations can help correct these errors, but collecting them at scale may be infeasible, particularly in domains where expert annotators are required. To address this issue, we introduce FLEx ($\textbf{F}$ew-shot $\textbf{L}$anguage $\textbf{Ex}$planations), a method for improving model behavior using a small number of explanatory examples. [Save to Library]
10*, TITLE: Doc-PP: Document Policy Preservation Benchmark for Large Vision-Language Models AUTHORS: Haeun Jang ; Hwan Chang ; Hwanhee Lee CATEGORY: cs.CL [cs.CL] HIGHLIGHT: In this paper, we introduce Doc-PP (Document Policy Preservation Benchmark), a novel benchmark constructed from real-world reports requiring reasoning across heterogeneous visual and textual elements under strict non-disclosure policies. [Save to Library]
13*, TITLE: PCoA: A New Benchmark for Medical Aspect-Based Summarization With Phrase-Level Context Attribution AUTHORS: BOHAO CHU et. al. CATEGORY: cs.CL [cs.CL] HIGHLIGHT: Verifying system-generated summaries remains challenging, as effective verification requires precise attribution to the source context, which is especially crucial in high-stakes medical domains. To address this challenge, we introduce PCoA, an expert-annotated benchmark for medical aspect-based summarization with phrase-level context attribution. [Save to Library]
17*, TITLE: Role of AI Recommendation on Consumer Behavior AUTHORS: Vineesh A R, Mr. Rahul K R, Reshma S SOURCE: International Journal of Advanced Research in Science Communication and Technology HIGHLIGHT: The present study aims to examine the role of AI recommendation systems in influencing consumer behavior, with special emphasis on consumer decision-making, purchase intention, perceived usefulness, satisfaction, personalization, and trust in digital marketplaces. [Save to Library]
19*, TITLE: IntroLM: Introspective Language Models Via Prefilling-Time Self-Evaluation AUTHORS: Hossein Hosseini Kasnavieh ; Gholamreza Haffari ; Chris Leckie ; Adel N. Toosi CATEGORY: cs.CL [cs.CL, cs.AI, cs.LG] HIGHLIGHT: We propose IntroLM, a method that enables causal language models to predict their own output quality during the prefilling phase without affecting generation using introspective tokens. [Save to Library]
Daily Papers (sorted by potential impact and then category): 22, TITLE: Training-Free Adaptation of New-Generation LLMs Using Legacy Clinical Models AUTHORS: SASHA RONAGHI et. al. CATEGORY: cs.CL [cs.CL, cs.AI] HIGHLIGHT: We propose Cross-Architecture Proxy Tuning (CAPT), a model-ensembling approach that enables training-free adaptation of state-of-the-art general-domain models using existing clinical models. [Save to Library]
25, TITLE: Vat Photopolymerization‐based Bioprinting: Shaping Next‐generation Tissues with Light AUTHORS: Wei Long Ng ; Carlos T. B. Paula ; Arménio C. Serra ; Jorge F. J. Coelho ; Paulo Bartolo SOURCE: Interdisciplinary Medicine HIGHLIGHT: This review presents a comprehensive overview of recent advances in VP‐based bioprinting, organized around core themes of photopolymerization chemistry, printing modalities, bio‐ink design, and biomedical applications. [Save to Library]
28, TITLE: CPGPrompt: Translating Clinical Guidelines Into LLM-Executable Decision Support AUTHORS: RUIQI DENG et. al. CATEGORY: cs.AI [cs.AI] HIGHLIGHT: Previous approaches, such as rule-based systems, face significant limitations, including poor interpretability, inconsistent adherence to guidelines, and narrow domain applicability. To address this, we develop and validate CPGPrompt, an auto-prompting system that converts narrative clinical guidelines into large language models (LLMs). [Save to Library]
29, TITLE: HearSay Benchmark: Do Audio LLMs Leak What They Hear? AUTHORS: JIN WANG et. al. CATEGORY: cs.CL [cs.CL] HIGHLIGHT: This paper takes the first step to investigate whether ALLMs inadvertently leak user privacy solely through acoustic voiceprints and introduces $\textit{HearSay}$, a comprehensive benchmark constructed from over 22,000 real-world audio clips. [Save to Library]
31, TITLE: Unveiling The Potential of Spin–orbit Torque in A Magnetic Single Layer for Advancing Spintronics Application AUTHORS: ZEYU HAN et. al. SOURCE: Applied Physics Reviews HIGHLIGHT: However, conventional SOT devices face efficiency constraints like interfacial spin scattering, limited spin-diffusion lengths, and complexity, driving interest in single-layer SOT switching. Given that research on single-layer SOT systems is still in its early stages and the underlying physical mechanisms remain complex and not fully understood, this review aims to consolidate recent key advances in the field. [Save to Library]
33, TITLE: Strategic Management of Urban Services Using Artificial Intelligence in The Development of Sustainable Smart Cities—Managerial and Legal Challenges AUTHORS: Tomáš Peráček ; Michal Kaššaj SOURCE: Sustainability HIGHLIGHT: At the same time, the question arises as to how legal and strategic frameworks can support the use of artificial intelligence in a way that contributes to environmental, social and economic sustainability in line with the objectives of the European Union. The aim of this scientific study is to examine the interdisciplinary use of artificial intelligence, data management and sustainability at the European Union level, including support instruments such as regulatory initiatives and funding programs, and to assess their implementation in relation to smart cities. [Save to Library]
34, TITLE: E5-omni: Explicit Cross-modal Alignment for Omni-modal Embeddings AUTHORS: Haonan Chen ; Sicheng Gao ; Radu Timofte ; Tetsuya Sakai ; Zhicheng Dou CATEGORY: cs.CL [cs.CL, cs.AI, cs.CV] HIGHLIGHT: In practice, this causes three common issues: (i) similarity logits have modality-dependent sharpness, so scores are not on a consistent scale; (ii) in-batch negatives become less effective over time because mixed-modality batches create an imbalanced hardness distribution; as a result, many negatives quickly become trivial and contribute little gradient; and (iii) embeddings across modalities show mismatched first- and second-order statistics, which makes rankings less stable. To tackle these problems, we propose e5-omni, a lightweight explicit alignment recipe that adapts off-the-shelf VLMs into robust omni-modal embedding models. [Save to Library]
36, TITLE: Improving Robustness in X-ray Image Classification Through Attention Mechanisms in Convolutional Neural Networks AUTHORS: ZAENAB ALAMMAR et. al. SOURCE: PeerJ Computer Science HIGHLIGHT: However, interpreting these images reliably is challenging due to a lack of labelled data, inherent image noise, and the lack of explainable artificial intelligence (AI). This research aims to improve the robustness against noise, accuracy, and interpretability of musculoskeletal radiograph classification by addressing these key challenges. [Save to Library]
40, TITLE: Soft Contextualized Encoder For User Defined Text Classification AUTHORS: Charu Maheshwari ; Vyas Raina CATEGORY: cs.LG [cs.LG, cs.AI] HIGHLIGHT: We propose a soft-contextualized encoder architecture for UDTC which contextualizes each candidate label with the label set and a static soft prompt representation of the input query. [Save to Library]
41, TITLE: Causal Data Augmentation for Robust Fine-Tuning of Tabular Foundation Models AUTHORS: Magnus Bühler ; Lennart Purucker ; Frank Hutter CATEGORY: cs.LG [cs.LG] HIGHLIGHT: We propose CausalMixFT, a method that enhances fine-tuning robustness and downstream performance by generating structurally consistent synthetic samples using Structural Causal Models (SCMs) fitted on the target dataset. [Save to Library]
42, TITLE: CLAP: Contrastive Latent Action Pretraining for Learning Vision-Language-Action Models from Human Videos AUTHORS: CHUBIN ZHANG et. al. CATEGORY: cs.RO [cs.RO, cs.CV] HIGHLIGHT: Existing Latent Action Models attempt to leverage video data but often suffer from visual entanglement, capturing noise rather than manipulation skills. To address this, we propose Contrastive Latent Action Pretraining (CLAP), a framework that aligns the visual latent space from videos with a proprioceptive latent space from robot trajectories. [Save to Library]
44, TITLE: LinkD: AutoRegressive Diffusion Model for Mechanical Linkage Synthesis AUTHORS: Yayati Jadhav ; Amir Barati Farimani CATEGORY: cs.LG [cs.LG] HIGHLIGHT: We introduce an autoregressive diffusion framework that exploits the dyadic nature of linkage assembly by representing mechanisms as sequentially constructed graphs, where nodes correspond to joints and edges to rigid links. [Save to Library]
46, TITLE: Unlocking The Pre-Trained Model As A Dual-Alignment Calibrator for Post-Trained LLMs AUTHORS: Beier Luo ; Cheng Wang ; Hongxin Wei ; Sharon Li ; Xuefeng Du CATEGORY: cs.LG [cs.LG] HIGHLIGHT: In particular, we show that calibration errors arise from two regimes: (i) confidence drift, where final confidence inflates despite largely consistent intermediate decision processes, and (ii) process drift, where intermediate inference pathways diverge. Guided by this diagnosis, we propose Dual-Align, an unsupervised post-hoc framework for dual alignment in confidence calibration. [Save to Library]
48, TITLE: IndexTTS 2.5 Technical Report AUTHORS: YUNPEI LI et. al. CATEGORY: cs.SD [cs.SD, cs.AI] HIGHLIGHT: In prior work, we introduced IndexTTS 2, a zero-shot neural text-to-speech foundation model comprising two core components: a transformer-based Text-to-Semantic (T2S) module and a non-autoregressive Semantic-to-Mel (S2M) module, which together enable faithful emotion replication and establish the first autoregressive duration-controllable generative paradigm. [Save to Library]
49, TITLE: O-Researcher: An Open Ended Deep Research Model Via Multi-Agent Distillation and Agentic RL AUTHORS: YI YAO et. al. CATEGORY: cs.CL [cs.CL, cs.AI] HIGHLIGHT: The performance gap between closed-source and open-source large language models (LLMs) is largely attributed to disparities in access to high-quality training data. To bridge this gap, we introduce a novel framework for the automated synthesis of sophisticated, research-grade instructional data. [Save to Library]
51, TITLE: Simulated Students in Tutoring Dialogues: Substance or Illusion? AUTHORS: Alexander Scarlatos ; Jaewook Lee ; Simon Woodhead ; Andrew Lan CATEGORY: cs.CL [cs.CL, cs.CY] HIGHLIGHT: Surprisingly, little work has been done to ensure or even measure the quality of simulated students. In this work, we formally define the student simulation task, propose a set of evaluation metrics that span linguistic, behavioral, and cognitive aspects, and benchmark a wide range of student simulation methods on these metrics. [Save to Library]
54, TITLE: Local Gradient Regulation Stabilizes Federated Learning Under Client Heterogeneity AUTHORS: Ping Luo ; Jiahuan Wang ; Ziqing Wen ; Tao Sun ; Dongsheng Li CATEGORY: cs.LG [cs.LG, cs.DC] HIGHLIGHT: Here, we show that client heterogeneity destabilizes FL primarily by distorting local gradient dynamics during client-side optimization, causing systematic drift that accumulates across communication rounds and impedes global convergence. [Save to Library]
56, TITLE: ImLoc: Revisiting Visual Localization with Image-based Representation AUTHORS: Xudong Jiang ; Fangjinhua Wang ; Silvano Galliani ; Christoph Vogel ; Marc Pollefeys CATEGORY: cs.CV [cs.CV] HIGHLIGHT: In this work, we revisit visual localization with a 2D image-based representation and propose to augment each image with estimated depth maps to capture the geometric structure. [Save to Library]
63, TITLE: Towards A Mechanistic Understanding of Propositional Logical Reasoning in Large Language Models AUTHORS: Danchun Chen ; Qiyao Yan ; Liangming Pan CATEGORY: cs.AI [cs.AI, cs.LG] HIGHLIGHT: While prior mechanistic studies focus on identifying taskspecific circuits, they leave open the question of what computational strategies LLMs employ for propositional reasoning. We address this gap through comprehensive analysis of Qwen3 (8B and 14B) on PropLogic-MI, a controlled dataset spanning 11 propositional logic rule categories across one-hop and two-hop reasoning. [Save to Library]
64, TITLE: VisionSpeak Object Detection and Narration System AUTHORS: Prof. Plasin Francis Dias ; K P Chinmayi ; Mahima Hanchinal ; Anurag Dindalkopp ; Neha Khan SOURCE: International Journal of Scientific Research in Engineering and Management HIGHLIGHT: Using a laptop webcam and the pre trained YOLOv8s model, our system achieves 22 FPS on consumer hardware (Intel Core i3) with 81% average precision across seven common indoor objects. [Save to Library]
68, TITLE: Image‐Based Deep Learning Models for Stock Predictions: Combining Line, Candlestick, and Bar Charts AUTHORS: Wei‐Chao Lin ; Ming‐Chang Wang ; Chih‐Fong Tsai ; Jui‐Pin Hsu SOURCE: Journal of Forecasting HIGHLIGHT: In this paper, three types of image patterns are compared, specifically, line charts with trading volume information represented by a bar chart, candlestick charts with trading volume information, and a mixed type of image with two other related technical indicators, that is, MACD and RSI. [Save to Library]
72, TITLE: AirNav: A Large-Scale Real-World UAV Vision-and-Language Navigation Dataset with Natural and Diverse Instructions AUTHORS: HENGXING CAI et. al. CATEGORY: cs.CL [cs.CL] HIGHLIGHT: Existing Unmanned Aerial Vehicle (UAV) Vision-Language Navigation (VLN) datasets face issues such as dependence on virtual environments, lack of naturalness in instructions, and limited scale. To address these challenges, we propose AirNav, a large-scale UAV VLN benchmark constructed from real urban aerial data, rather than synthetic environments, with natural and diverse instructions. [Save to Library]
73, TITLE: PointWorld: Scaling 3D World Models for In-The-Wild Robotic Manipulation AUTHORS: WENLONG HUANG et. al. CATEGORY: cs.RO [cs.RO, cs.AI, cs.CV] HIGHLIGHT: We introduce PointWorld, a large pre-trained 3D world model that unifies state and action in a shared 3D space as 3D point flows: given one or few RGB-D images and a sequence of low-level robot action commands, PointWorld forecasts per-pixel displacements in 3D that respond to the given actions. [Save to Library]
74, TITLE: RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models AUTHORS: Quy-Anh Dang ; Chris Ngo ; Truong-Son Hy CATEGORY: cs.CL [cs.CL] HIGHLIGHT: However, existing red teaming datasets suffer from inconsistent risk categorizations, limited domain coverage, and outdated evaluations, hindering systematic vulnerability assessments. To address these challenges, we introduce RedBench, a universal dataset aggregating 37 benchmark datasets from leading conferences and repositories, comprising 29,362 samples across attack and refusal prompts. [Save to Library]
75, TITLE: ComfySearch: Autonomous Exploration and Reasoning for ComfyUI Workflows AUTHORS: JINWEI SU et. al. CATEGORY: cs.AI [cs.AI] HIGHLIGHT: However, the large number of components in ComfyUI and the difficulty of maintaining long-horizon structural consistency under strict graph constraints frequently lead to low pass rates and workflows of limited quality. To tackle these limitations, we present ComfySearch, an agentic framework that can effectively explore the component space and generate functional ComfyUI pipelines via validation-guided workflow construction. [Save to Library]
76, TITLE: PaperAudit-Bench: Benchmarking Error Detection in Research Papers for Critical Automated Peer Review AUTHORS: SONGJUN TU et. al. CATEGORY: cs.CL [cs.CL] HIGHLIGHT: In this paper, we introduce PaperAudit-Bench, which consists of two components: (1) PaperAudit-Dataset, an error dataset covering both errors identifiable within individual sections and those requiring cross-section reasoning, designed for controlled evaluation under long-context settings; and (2) PaperAudit-Review, an automated review framework that integrates structured error detection with evidence-aware review generation to support critical assessment. [Save to Library]
78, TITLE: Numerical Investigation of Double Diffusion of NEPCM Around Oscillating Cylinders in A Curved Cavity Using ISPH and Machine Learning AUTHORS: Munirah Aali Alotaibi ; Weaam Alhejaili ; Samiyah Almalki ; Abdelraheem M. Aly SOURCE: International Journal of Numerical Methods for Heat & Fluid Flow HIGHLIGHT: Purpose This paper aims to investigate transient double-diffusive convection and phase change of nano-encapsulated phase-change materials (NEPCM) in a porous curved cavity with two oppositely oscillating cylinders and to quantify how oscillatory actuation and boundary conditions govern heat and mass transfer. [Save to Library]
80, TITLE: Challenges and Opportunities in Machine Learning for Light‐Emitting Polymers AUTHORS: Tian Tian ; Yinyin Bao SOURCE: Macromolecular Rapid Communications HIGHLIGHT: Yet this multiscale flexibility also creates a vast and complex design space, where the interplay of monomer choice, polymer architecture, and processing methods makes it impossible to exhaustively map their structure–property relationships by empirical means. In this perspective, we review the development of recent design strategies in LEPs, highlighting the key experimental challenges they reveal, and discuss how data‐driven approaches, particularly machine learning, can help navigate this complexity and accelerate the discovery and optimization of next‐generation LEPs. [Save to Library]
84, TITLE: IDESplat: Iterative Depth Probability Estimation for Generalizable 3D Gaussian Splatting AUTHORS: WEI LONG et. al. CATEGORY: cs.CV [cs.CV, cs.AI] HIGHLIGHT: Existing methods typically rely solely on a single warp to estimate depth probability, which hinders their ability to fully leverage cross-view geometric cues, resulting in unstable and coarse depth maps. To address this limitation, we propose IDESplat, which iteratively applies warp operations to boost depth probability estimation for accurate Gaussian mean prediction. [Save to Library]
85, TITLE: Neural Network–based Approach for Improving The Evaluation of Antibody–antigen Docking Poses AUTHORS: Alessandro Meta ; Giancarlo Ruocco ; Edoardo Milanetti SOURCE: Frontiers in Physics HIGHLIGHT: Here, we present a protocol based on multiple minimal neural network (NN)–based approaches, trained on a set of carefully selected physicochemical features, to discriminate docking decoy poses (structurally distant from the experimental complex) from native-like poses (structurally close to the native conformation) within a specific class of biologically relevant protein–protein complexes, namely antibody–antigen systems in which the antigen is a protein. [Save to Library]
88, TITLE: I2E: From Image Pixels to Actionable Interactive Environments for Text-Guided Image Editing AUTHORS: JINGHAN YU et. al. CATEGORY: cs.CV [cs.CV] HIGHLIGHT: This paradigm is severely limited by 1) the implicit coupling of planning and execution, 2) the lack of object-level control granularity, and 3) the reliance on unstructured, pixel-centric modeling. To address these limitations, we propose I2E, a novel "Decompose-then-Action" paradigm that revisits image editing as an actionable interaction process within a structured environment. [Save to Library]
91, TITLE: Physics-Informed Gaussian Process Regression for The Constitutive Modeling of Concrete: A Data-Driven Improvement to Phenomenological Models AUTHORS: CHENYANG LI et. al. CATEGORY: cs.LG [cs.LG, cond-mat.mtrl-sci] HIGHLIGHT: Understanding and modeling the constitutive behavior of concrete is crucial for civil and defense applications, yet widely used phenomenological models such as Karagozian \& Case concrete (KCC) model depend on empirically calibrated failure surfaces that lack flexibility in model form and associated uncertainty quantification. This work develops a physics-informed framework that retains the modular elastoplastic structure of KCC model while replacing its empirical failure surface with a constrained Gaussian Process Regression (GPR) surrogate that can be learned directly from experimentally accessible observables. [Save to Library]
93, TITLE: Performance Analysis of Explainable Deep Learning-Based Intrusion Detection Systems for IoT Networks: A Systematic Review AUTHORS: Taiwo Blessing Ogunseyi ; Gogulakrishan Thiyagarajan ; Honggang He ; Vinay Bist ; Zhengcong Du SOURCE: Sensors HIGHLIGHT: Although explainable artificial intelligence (XAI) has been increasingly adopted to enhance interpretability, its impact on detection performance and computational efficiency in resource-constrained IoT environments remains insufficiently understood. This systematic review investigates the performance of an explainable deep learning-based IDS for IoT networks by analyzing trade-offs among detection accuracy, computational overhead, and explanation quality. [Save to Library]
95, TITLE: Computational Learning Theories: A Mixed-Methods Framework for AI Enhanced Educational Research AUTHORS: David Gibson ; Dirk Ifenthaler SOURCE: International Journal of Technology in Teaching and Learning HIGHLIGHT: This article proposes a computational mixed-methods approach as a necessary evolution in educational research methodology, encompassing a three-level hierarchical framework that integrates individual, social, and cultural learning processes through network-based modeling. [Save to Library]
99, TITLE: GAMBIT: A Gamified Jailbreak Framework for Multimodal Large Language Models AUTHORS: Xiangdong Hu ; Yangyang Jiang ; Qin Hu ; Xiaojun Jia CATEGORY: cs.CV [cs.CV] HIGHLIGHT: If a model can think like a human, can we influence its cognitive-stage decisions so that it proactively completes a jailbreak? To validate this idea, we propose GAMBI} (Gamified Adversarial Multimodal Breakout via Instructional Traps), a novel multimodal jailbreak framework that decomposes and reassembles harmful visual semantics, then constructs a gamified scene that drives the model to explore, reconstruct intent, and answer as part of winning the game. [Save to Library]