Paper Digest: NAACL 2025 Papers & Highlights

May 4, 2025March 17, 2026 admin

The North American Chapter of the Association for Computational Linguistics (NAACL) is one of the top natural language processing conferences in the world. To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights to quickly get the main idea of each paper.

Note: NAACL-2025 accepts more than 700 papers, this page only includes 500 of them selected by our daily paper digest algorithm. Interested users can choose to read All ~800 NAACL-2025 papers in a separate page, which takes quite some time to load.

To search for papers presented at NAACL-2025 on a specific topic, please make use of the search by venue (NAACL-2025) service. To summarize the latest research published at NAACL-2025 on a specific topic, you can utilize the review by venue (NAACL-2025) service. If you are interested in browsing papers by author, we have a comprehensive list of ~ 3,800 authors (NAACL-2025). Additionally, you may want to explore our “Best Paper” Digest (NAACL), which lists the most influential NAACL papers since 2000.

We’ve developed a service – NAACL-2025 Research that synthesizes the latest findings from NAACL 2025 into comprehensive reports. For instance, we’ve generated a report on Advances in Question Answering: Insights from NAACL 2025 Papers. We encourage interested users to utilize our service to create tailored reports on other emerging topics.

As a pioneer in the field since 2018, Paper Digest has curated thousands of such lists, drawing on years of accumulated data across decades of conferences and research topics.To ensure you never miss a breakthrough, our daily service sifts through tens of thousands of new papers, clinical trials, news articles, community posts every day – delivering only what matters most to your specific interests. Beyond discovery, Paper Digest offers built-in research tools to help users read articles, write articles, get answers, conduct literature reviews, and generate research reports more efficiently.

Paper Digest Team
New York City, New York, 10017

TABLE 1: Paper Digest: NAACL 2025 Papers & Highlights

	Paper	Author(s)
1	Vision-Language Models Can Self-Improve Reasoning Via Reflection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose a simple yet effective self-training framework, R3V, which iteratively enhances the model’s Vision-language Reasoning by Reflecting on CoT Rationales.	Kanzhi Cheng; Li YanTao; Fangzhi Xu; Jianbing Zhang; Hao Zhou; Yang Liu;
2	FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we study the use of instructions in IR systems.	Orion Weller; Benjamin Chang; Sean MacAvaney; Kyle Lo; Arman Cohan; Benjamin Van Durme; Dawn Lawrie; Luca Soldaini;
3	Benchmarking Distributional Alignment of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This notion of distributional alignment is complex, as there is significant variation in the types of attributes that are simulated. Prior works have underexplored the role of three critical variables—the question domain, steering method, and distribution expression method—which motivates our contribution of a benchmark explicitly addressing these dimensions.	Nicole Meister; Carlos Guestrin; Tatsunori Hashimoto;
4	Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: While previous studies have explored using large multimodal models (LMMs) as reward models for guiding preference modeling, their ability to accurately assess the quality of generated responses and their alignment with video content has not been conclusively demonstrated. This paper introduces a novel framework that utilizes detailed video captions as a proxy of video content, enabling language models to incorporate this information as supporting evidence for scoring video Question Answering (QA) predictions.	Ruohong Zhang; Liangke Gui; Zhiqing Sun; Yihao Feng; Keyang Xu; Yuanhan Zhang; Di Fu; Chunyuan Li; Alexander G Hauptmann; Yonatan Bisk; Yiming Yang;
5	The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This limits our understanding of LLM performance variability in real-world applications. Our study addresses this issue by exploring key questions about the performance differences between greedy decoding and sampling, identifying benchmarks’ consistency regarding non-determinism, and examining unique model behaviors.	Yifan Song; Guoyin Wang; Sujian Li; Bill Yuchen Lin;
6	The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Additionally, these benchmarks tend to focus disproportionately on specific capabilities such as instruction following, leading to coverage bias. To overcome these limitations, we introduce the BiGGen Bench, a principled generation benchmark designed to thoroughly evaluate nine distinct capabilities of LMs across 77 diverse tasks.	Seungone Kim; Juyoung Suk; Ji Yong Cho; Shayne Longpre; Chaeeun Kim; Dongkeun Yoon; Guijin Son; Yejin Cho; Sheikh Shafayat; Jinheon Baek; Sue Hyun Park; Hyeonbin Hwang; Jinkyung Jo; Hyowon Cho; Haebin Shin; Seongyun Lee; Hanseok Oh; Noah Lee; Namgyu Ho; Se June Joo; Miyoung Ko; Yoonjoo Lee; Hyungjoo Chae; Jamin Shin; Joel Jang; Seonghyeon Ye; Bill Yuchen Lin; Sean Welleck; Graham Neubig; Moontae Lee; Kyungjae Lee; Minjoon Seo;
7	Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This can enable a new paradigm of front-end development in which multimodal large language models (MLLMs) directly convert visual designs into code implementations. In this work, we construct Design2Code – the first real-world benchmark for this task.	Chenglei Si; Yanzhe Zhang; Ryan Li; Zhengyuan Yang; Ruibo Liu; Diyi Yang;
8	Cross-lingual Transfer of Reward Models in Multilingual Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we investigate the cross-lingual transfer of RMs trained in diverse languages, primarily from English.	Jiwoo Hong; Noah Lee; Rodrigo Martínez-Castaño; César Rodríguez; James Thorne;
9	Unfamiliar Finetuning Examples Control How Language Models Hallucinate Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Large language models are known to hallucinate, but the underlying mechanism that govern how models hallucinate are not yet fully understood. In this work, we find that unfamiliar examples in the models’ finetuning data – those that introduce concepts beyond the base model’s scope of knowledge – are crucial in shaping these errors.	Katie Kang; Eric Wallace; Claire Tomlin; Aviral Kumar; Sergey Levine;
10	In-Context Learning with Long-Context Models: An In-Depth Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We study the behavior of in-context learning (ICL) at this extreme scale on multiple datasets and models.	Amanda Bertsch; Maor Ivgi; Emily Xiao; Uri Alon; Jonathan Berant; Matthew R. Gormley; Graham Neubig;
11	CartesianMoE: Boosting Knowledge Sharing Among Experts Via Cartesian Product Routing in Mixture-of-Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, inspired by collective matrix factorization to learn shared knowledge among data, we propose CartesianMoE, which implements more effective knowledge sharing among experts in more like a multiplication manner.	Zhenpeng Su; Xing W; Zijia Lin; Yizhe Xiong; Minxuan Lv; Guangyuan Ma; Hui Chen; Songlin Hu; Guiguang Ding;
12	Stronger Models Are Not Always Stronger Teachers for Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing approaches typically assume that larger or stronger models are stronger teachers for instruction tuning, and hence simply adopt larger models as response generators to the synthetic instructions. In this paper, we challenge this commonly-adopted assumption.	Zhangchen Xu; Fengqing Jiang; Luyao Niu; Bill Yuchen Lin; Radha Poovendran;
13	Mitigating Hallucinations in Multi-modal Large Language Models Via Image Token Attention-Guided Decoding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we delve into the intrinsic characteristics of hallucination from the perspective of interaction between input and output tokens.	Xinhao Xu; Hui Chen; Mengyao Lyu; Sicheng Zhao; Yizhe Xiong; Zijia Lin; Jungong Han; Guiguang Ding;
14	Representing Rule-based Chatbots with Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we propose using ELIZA, a classic rule-based chatbot, as a setting for formal, mechanistic analysis of Transformer-based chatbots.	Dan Friedman; Abhishek Panigrahi; Danqi Chen;
15	Towards Knowledge Checking in Retrieval-augmented Generation: A Representation Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work aims to provide a systematic study on knowledge checking in RAG systems.	Shenglai Zeng; Jiankun Zhang; Bingheng Li; Yuping Lin; Tianqi Zheng; Dante Everaert; Hanqing Lu; Hui Liu; Hui Liu; Yue Xing; Monica Xiao Cheng; Jiliang Tang;
16	ComPO: Community Preferences for Language Model Personalization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recent studies have raised concerns that aggregating such diverse and often contradictory human feedback to finetune models results in generic models that generate outputs not preferred by many user groups, as they tend to average out styles and norms. To address this issue, we draw inspiration from recommendation systems and propose ComPO, a method to personalize preference optimization in LMs by contextualizing the probability distribution of model outputs with the preference provider.	Sachin Kumar; Chan Young Park; Yulia Tsvetkov; Noah A. Smith; Hannaneh Hajishirzi;
17	From Distributional to Overton Pluralism: Investigating Large Language Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We analyze two aspects of post-alignment distributional shift of LLM responses.	Thom Lake; Eunsol Choi; Greg Durrett;
18	Extracting and Understanding The Superficial Knowledge in Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This leads to the question: Is alignment predominantly superficial? In this paper, we delve into this question and provide a quantitative analysis.	Runjin Chen; Gabriel Jacob Perin; Xuxi Chen; Xilun Chen; Yan Han; Nina S. T. Hirata; Junyuan Hong; Bhavya Kailkhura;
19	A Probabilistic Framework for LLM Hallucination Detection Via Belief Tree Propagation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We describe Belief Tree Propagation (BTProp), a probabilistic framework for LLM hallucination detection.	Bairu Hou; Yang Zhang; Jacob Andreas; Shiyu Chang;
20	Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To this end, we propose FRAMES (Factuality, Retrieval, And reasoning MEasurement Set), a high-quality evaluation dataset designed to test LLMs’ ability to provide factual responses, assess retrieval capabilities, and evaluate the reasoning required to generate final answers.	Satyapriya Krishna; Kalpesh Krishna; Anhad Mohananey; Steven Schwarcz; Adam Stambler; Shyam Upadhyay; Manaal Faruqui;
21	ResearchAgent: Iterative Research Idea Generation Over Scientific Literature with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Meanwhile, novel, impactful research often stems from both a deep understanding of prior work, and a cross-pollination of ideas across domains and fields. To enhance the productivity of researchers, we propose ResearchAgent, which leverages the encyclopedic knowledge and linguistic reasoning capabilities of Large Language Models (LLMs) to assist them in their work.	Jinheon Baek; Sujay Kumar Jauhar; Silviu Cucerzan; Sung Ju Hwang;
22	Racing Thoughts: Explaining Contextualization Errors in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: , a model may incorrectly respond “Yes” if it has not properly contextualized “bank” as a geographical feature, rather than a financial institution. We propose the LLM Race Conditions Hypothesis as an explanation of contextualization errors of this form.	Michael A. Lepori; Michael Curtis Mozer; Asma Ghandeharioun;
23	CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, on challenging coding tasks with extremely large search space, current agentic approaches still struggle with multi-stage planning, generating, and debugging. To address this problem, we propose CodeTree, a framework for LLM agents to efficiently explore the search space in different stages of the code generation process.	Jierui Li; Hung Le; Yingbo Zhou; Caiming Xiong; Silvio Savarese; Doyen Sahoo;
24	IdentifyMe: A Challenging Long-Context Mention Resolution Benchmark for LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Recent evaluations of LLMs on coreference resolution have revealed that traditional output formats and evaluation metrics do not fully capture the models’ referential understanding. To address this, we introduce IdentifyMe, a new benchmark for mention resolution presented in a multiple-choice question (MCQ) format, commonly used for evaluating LLMs.	Kawshik Manikantan; Makarand Tapaswi; Vineet Gandhi; Shubham Toshniwal;
25	AEGIS2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Currently, there is a clear lack of high-quality, human-annotated datasets that address the full spectrum of LLM-related safety risks and are usable for commercial applications. To bridge this gap, we propose a comprehensive and adaptable taxonomy for categorizing safety risks, structured into 12 top-level hazard categories with an extension to 9 fine-grained subcategories.	Shaona Ghosh; Prasoon Varshney; Makesh Narsimhan Sreedhar; Aishwarya Padmakumar; Traian Rebedea; Jibin Rajan Varghese; Christopher Parisien;
26	What Did I Do Wrong? Quantifying LLMs’ Sensitivity and Consistency to Prompt Engineering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Developers that want to include these models in their software stack, however, face a dreadful challenge: debugging LLMs’ inconsistent behavior across minor variations of the prompt. We therefore introduce two metrics for classification tasks, namely sensitivity and consistency, which are complementary to task performance.	Federico Errica; Davide Sanvito; Giuseppe Siracusano; Roberto Bifulco;
27	On The Analysis and Distillation of Emergent Outlier Properties in Pre-trained Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We show that emergent outlier dimensions contribute significantly more to zero-shot performance than non-outlier dimensions. Based on this, we propose the Emergent Outlier Focused Distillation (EOFD) method, which prioritizes critical outlier dimensions in distillation using a weighted MSE loss.	Tianyang Zhao; Kunwar Yashraj Singh; Srikar Appalaraju; Peng Tang; Ying Nian Wu; Li Erran Li;
28	Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we explore the extent to which LLMs share representations of morphsyntactic concepts such as grammatical number, gender, and tense across languages.	Jannik Brinkmann; Chris Wendler; Christian Bartelt; Aaron Mueller;
29	KMMLU: Measuring Massive Multitask Language Understanding in Korean Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose KMMLU, a Korean benchmark with 35,030 expert-level multiple-choice questions across 45 subjects ranging from humanities to STEM.	Guijin Son; Hanwool Lee; Sungdong Kim; Seungone Kim; Niklas Muennighoff; Taekyoon Choi; Cheonbok Park; Kang Min Yoo; Stella Biderman;
30	DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students’ Hand-Drawn Math Images Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: For example, K-12 educators using digital learning platforms may need to examine and provide feedback across many images of students’ math work. To assess the potential of VLMs to support educators in settings like this one, we introduce DrawEduMath, an English-language dataset of 2,030 images of students’ handwritten responses to K-12 math problems.	Sami Baral; Li Lucy; Ryan Knight; Alice Ng; Luca Soldaini; Neil Heffernan; Kyle Lo;
31	Are We Done with MMLU? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: For example, we find that 57% of the analysed questions in the Virology subset contain errors. To address this issue, we introduce a comprehensive framework for identifying dataset errors using a novel error annotation protocol.	Aryo Pradipta Gema; Joshua Ong Jun Leang; Giwon Hong; Alessio Devoto; Alberto Carlo Maria Mancino; Rohit Saxena; Xuanli He; Yu Zhao; Xiaotang Du; Mohammad Reza Ghasemi Madani; Claire Barale; Robert McHardy; Joshua Harris; Jean Kaddour; Emile Van Krieken; Pasquale Minervini;
32	A Distributional Perspective on Word Learning in Neural Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Thus, we propose an array of signatures that improve on earlier approaches by capturing knowledge of both where the target word can and cannot occur as well as gradient preferences about the word’s appropriateness.	Filippo Ficarra; Ryan Cotterell; Alex Warstadt;
33	Is In-Context Learning A Type of Error-Driven Learning? Evidence from The Inverse Frequency Effect in Structural Priming Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce a new way of diagnosing whether ICL is functionally performing error-driven learning.	Zhenghao Zhou; Robert Frank; R. Thomas McCoy;
34	Incremental Sentence Processing Mechanisms in Autoregressive Transformer Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, the features they use to incrementally process their linguistic input are not well understood. In this paper, we fill this gap by studying the mechanisms underlying garden path sentence processing in LMs.	Michael Hanna; Aaron Mueller;
35	High-Dimension Human Value Representation in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose UniVar, a high-dimensional neural representation of symbolic human value distributions in LLMs, orthogonal to model architecture and training data.	Samuel Cahyawijaya; Delong Chen; Yejin Bang; Leila Khalatbari; Bryan Wilie; Ziwei Ji; Etsuko Ishii; Pascale Fung;
36	Towards Automatic Evaluation for Image Transcreation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Attempts to define this as a formal Machine Learning (ML) problem have been impeded by the lack of automatic evaluation mechanisms, with previous work relying solely on human evaluation. In this paper, we seek to close this gap by proposing a suite of automatic evaluation metrics inspired by machine translation (MT) metrics, categorized into: a) Object-based, b) Embedding-based, and c) VLM-based.	Simran Khanuja; Vivek Iyer; Xiaoyu He; Graham Neubig;
37	Rationale-Guided Retrieval Augmented Generation for Medical Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this study, we present RAG2 (RAtionale-Guided RAG), a new framework for enhancing the reliability of RAG in biomedical contexts.	Jiwoong Sohn; Yein Park; Chanwoong Yoon; Sihyeon Park; Hyeon Hwang; Mujeen Sung; Hyunjae Kim; Jaewoo Kang;
38	ImgTrojan: Jailbreaking Vision-Language Models with ONE Image Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose a novel jailbreaking attack against VLMs, aiming to bypass their safety barrier when a user inputs harmful instructions.	Xijia Tao; Shuai Zhong; Lei Li; Qi Liu; Lingpeng Kong;
39	Language Models Encode Numbers Using Digit Representations in Base 10 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: A natural hypothesis is that these errors stem from how LLMs represent numbers, and specifically, whether their representations of numbers capture their numeric values. We tackle this question from the observation that LLM errors on numerical tasks are often distributed across the digits of the answer rather than normally around its numeric value.	Amit Arnold Levy; Mor Geva;
40	From Generating Answers to Building Explanations: Integrating Multi-Round RAG and Causal Modeling for Scientific QA Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by findings from the social sciences, we present an implemented causal QA approach that combines iterative RAG with guidance from a formal model of causation.	Victor Barres; Clifton James McFate; Aditya Kalyanpur; Kailash Karthik Saravanakumar; Lori Moon; Natnael Seifu; Abraham Bautista-Castillo;
41	XLAM: A Family of Large Action Models to Empower AI Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce xLAM, a series of large action models designed for AI agent tasks.	Jianguo Zhang; Tian Lan; Ming Zhu; Zuxin Liu; Thai Quoc Hoang; Shirley Kokane; Weiran Yao; Juntao Tan; Akshara Prabhakar; Haolin Chen; Zhiwei Liu; Yihao Feng; Tulika Manoj Awalgaonkar; Rithesh R N; Zeyuan Chen; Ran Xu; Juan Carlos Niebles; Shelby Heinecke; Huan Wang; Silvio Savarese; Caiming Xiong;
42	K-Level Reasoning: Establishing Higher Order Beliefs in Large Language Models for Strategic Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by the Level-K framework from game theory and behavioral economics, which extends reasoning from simple reactions to structured strategic depth, we propose a novel framework: “K-Level Reasoning with Large Language Models (K-R). ”	Yadong Zhang; Shaoguang Mao; Tao Ge; Xun Wang; Yan Xia; Man Lan; Furu Wei;
43	Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Current methods for generating these suffixes are computationally expensive and have low Attack Success Rates (ASR), especially against well-aligned models like Llama2 and Llama3. To overcome these limitations, we introduce ADV-LLM, an iterative self-tuning process that crafts adversarial LLMs with enhanced jailbreak ability.	Chung-En Sun; Xiaodong Liu; Weiwei Yang; Tsui-Wei Weng; Hao Cheng; Aidan San; Michel Galley; Jianfeng Gao;
44	Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of Stereotypes Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we leverage the zero-shot capabilities of LLMs to reduce stereotyping in a technique we introduce as zero-shot self-debiasing.	Isabel O. Gallegos; Ryan Aponte; Ryan A. Rossi; Joe Barrow; Mehrab Tanjim; Tong Yu; Hanieh Deilamsalehy; Ruiyi Zhang; Sungchul Kim; Franck Dernoncourt; Nedim Lipka; Deonna Owens; Jiuxiang Gu;
45	Decoding Hate: Exploring Language Models’ Reactions to Hate Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: 5, GPT-4, and Gemini Pro) to hate speech. Through qualitative analysis, we aim to reveal the spectrum of responses these models produce, highlighting their capacity to handle hate speech inputs.	Paloma Piot; Javier Parapar;
46	From Evidence to Belief: A Bayesian Epistemology Approach to Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper investigates the knowledge of language models from the perspective of Bayesian epistemology.	Minsu Kim; Sangryul Kim; James Thorne;
47	UFO: A UI-Focused Agent for Windows OS Interaction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce UFO, a UI-Fcused agent designed to fulfill user requests tailored to Windows OS applications by observing and analyzing the GUI and control information of these applications.	Chaoyun Zhang; Liqun Li; Shilin He; Xu Zhang; Bo Qiao; Si Qin; Minghua Ma; Yu Kang; Qingwei Lin; Saravan Rajmohan; Dongmei Zhang; Qi Zhang;
48	Is Your LLM Outdated? A Deep Look at Temporal Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces the concept of temporal generalization in LLMs, including bias in past and future generalizations.	ChenghaoZhu ChenghaoZhu; Nuo Chen; Yufei Gao; Yunyi Zhang; Prayag Tiwari; Benyou Wang;
49	REL-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce an interaction-centered evaluation approach called Rel-A.	Kaitlyn Zhou; Jena D. Hwang; Xiang Ren; Nouha Dziri; Dan Jurafsky; Maarten Sap;
50	A Top-down Graph-based Tool for Modeling Classical Semantic Maps: A Case Study of Supplementary Adverbs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel graph-based algorithm that automatically generates conceptual spaces and SMMs in a top-down manner.	Zhu Liu; Cunliang Kong; Ying Liu; Maosong Sun;
51	DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce DreamSync, a simple yet effective training algorithm that improves T2I models to be faithful to the text input.	Jiao Sun; Deqing Fu; Yushi Hu; Su Wang; Royi Rassin; Da-Cheng Juan; Dana Alon; Charles Herrmann; Sjoerd Van Steenkiste; Ranjay Krishna; Cyrus Rashtchian;
52	MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose MEDA, a novel approach specifically designed for the complexities of multimodal settings, dynamically allocating KV cache sizes based on attention entropy to better adapt to multimodal interactions.	Zhongwei Wan; Hui Shen; Xin Wang; Che Liu; Zheda Mai; Mi Zhang;
53	Navigating The Path of Writing: Outline-guided Text Generation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose WritingPath, a framework that uses explicit outlines to guide LLMs in generating goal-oriented, high-quality text.	Yukyung Lee; Soonwon Ka; Bokyung Son; Pilsung Kang; Jaewook Kang;
54	Lost in Inference: Rediscovering The Role of Natural Language Inference for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we investigate if NLI tasks, that are rarely used for LLM evaluation, can still be informative for evaluating LLMs.	Lovish Madaan; David Esiobu; Pontus Stenetorp; Barbara Plank; Dieuwke Hupkes;
55	Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper introduces a sequence-level one-forward-one-backward (1F1B) PP method, named Seq1F1B, tailored for training LLMs on long sequences with high training throughput and memory efficiency.	Sun Ao; Weilin Zhao; Xu Han; Cheng Yang; Xinrong Zhang; Zhiyuan Liu; Chuan Shi; Maosong Sun;
56	CRMArena: Understanding The Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, deploying and evaluating these agents is challenging due to the lack of realistic benchmarks that reflect the complexity of real-world CRM tasks. To address this issue, we introduce CRMArena, a novel benchmark designed to evaluate AI agents on realistic tasks grounded in professional work environments.	Kung-Hsiang Huang; Akshara Prabhakar; Sidharth Dhawan; Yixin Mao; Huan Wang; Silvio Savarese; Caiming Xiong; Philippe Laban; Chien-Sheng Wu;
57	Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we systematically explore the abilities of open LLMs with less than ten billion parameters to handle multilingual machine translation (MT) tasks.	Menglong Cui; Pengzhi Gao; Wei Liu; Jian Luan; Bin Wang;
58	Generative Prompt Internalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Prompts used in recent large language model based applications are often fixed and lengthy, leading to significant computational overhead. To address this challenge, we propose Generative Prompt Internalization (GenPI), a lightweight method that employs a joint training approach.	Haebin Shin; Lei Ji; Yeyun Gong; Sungdong Kim; Eunbi Choi; Minjoon Seo;
59	ParaICL: Towards Parallel In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Moreover, varying combinations of few-shot demonstration examples can significantly boost accuracy across different test samples. To address this, we propose a novel method named parallel in-context learning (ParaICL) that effectively utilizes all demonstration examples without exceeding the manageable input context length.	Xingxuan Li; Xuan-Phi Nguyen; Shafiq Joty; Lidong Bing;
60	CVE-Bench: Benchmarking LLM-based Software Engineering Agent’s Ability to Repair Real-World CVE Vulnerabilities Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we introduce CVE-Bench, an evaluation framework consisting of 509 Common Vulnerabilities and Exposures (CVEs) from four programming languages and 120 popular open-source repositories.	Peiran Wang; Xiaogeng Liu; Chaowei Xiao;
61	EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Current LLMs exhibit satisfactory instruction-following capabilities based on instruction-following fine-tuning process. Motivated by this, in this paper, we introduce EASYTOOL, a framework transforming diverse and lengthy tool documentation into a unified and concise tool instruction to fully leverage instruction-following capabilities of LLMs for easier tool usage.	Siyu Yuan; Kaitao Song; Jiangjie Chen; Xu Tan; Yongliang Shen; Kan Ren; Dongsheng Li; Deqing Yang;
62	EvoAgent: Towards Automatic Multi-Agent Generation Via Evolutionary Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we introduce EVOAGENT, a generic method to automatically extend specialized agents to multi-agent systems via the evolutionary algorithm, thereby improving the effectiveness of LLM-based agents in solving tasks.	Siyu Yuan; Kaitao Song; Jiangjie Chen; Xu Tan; Dongsheng Li; Deqing Yang;
63	An Efficient Gloss-Free Sign Language Translation Using Spatial Configurations and Motion Dynamics with LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: By contrast, we emphasize the importance of capturing the spatial configurations and motion dynamics in sign language. With this in mind, we introduce Spatial and Motion-based Sign Language Translation (SpaMo), a novel LLM-based SLT framework.	Eui Jun Hwang; Sukmin Cho; Junmyeong Lee; Jong C. Park;
64	Towards Reliable and Practical Phishing Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: With recent advances in AI, we discuss how to construct a reliable and practical phishing detection system using language models. For this system, we introduce the first large-scale Korean dataset for phishing detection, encompassing six types of phishing attacks.	Hyowon Cho; Minjoon Seo;
65	FedSpaLLM: Federated Pruning of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address the challenge of pruning LLMs in privacy-preserving settings, we propose FedSpaLLM, the first federated learning framework designed specifically for pruning LLMs.	Guangji Bai; Yijiang Li; Zilinghan Li; Liang Zhao; Kibaek Kim;
66	Babysit A Language Model From Scratch: Interactive Language Learning By Trials and Demonstrations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce a trial-and-demonstration (TnD) learning framework that incorporates three distinct components: student trials, teacher demonstrations, and a reward conditioned on language competence at various developmental stages.	Ziqiao Ma; Zekun Wang; Joyce Chai;
67	Breaking Down Power Barriers in On-Device Streaming ASR: Insights and Solutions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We found that the influence of these parameters on power consumption varies depending on factors such as invocation frequency and memory allocation. Leveraging these insights, we propose design principles that enhance on-device speech recognition models by reducing power consumption with minimal impact on accuracy.	Yang Li; Yuan Shangguan; Yuhao Wang; Liangzhen Lai; Ernie Chang; Changsheng Zhao; Yangyang Shi; Vikas Chandra;
68	Model Surgery: Modulating LLM’s Behavior Via Simple Parameter Editing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we observe that surprisingly, directly editing a small subset of parameters can effectively modulate specific behaviors of LLMs, such as detoxification and resistance to jailbreaking, with only inference-level computational resources.	Huanqian Wang; Yang Yue; Rui Lu; Jingxin Shi; Andrew Zhao; Shenzhi Wang; Shiji Song; Gao Huang;
69	Prompt Compression for Large Language Models: A Survey Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Leveraging large language models (LLMs) for complex natural language tasks typically requires long-form prompts to convey detailed requirements and information, which results in increased memory usage and inference costs. To mitigate these challenges, multiple efficient methods have been proposed, with prompt compression gaining significant research interest.	Zongqian Li; Yinhong Liu; Yixuan Su; Nigel Collier;
70	CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research Repositories Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To evaluate the effectiveness of LLMs in handling complex code development tasks of research projects, particularly for NLP/CV/AI/ML/DM topics, we introduce CSR-Bench, a benchmark for Computer Science Research projects.	Yijia Xiao; Runhui Wang; Luyang Kong; Davor Golac; Wei Wang;
71	Complete Chess Games Enable LLM Become A Chess Master Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here, we propose the Large language model ChessLLM to play full chess games.	Yinqi Zhang; Xintian Han; Haolong Li; Kedi Chen; Shaohui Lin;
72	HIGGS: Pushing The Limits of Large Language Model Quantization Via The Linearity Theorem Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present a “linearity theorem” establishing a direct relationship between the layer-wise reconstruction error and the model perplexity increase due to quantization.	Vladimir Malinovskii; Andrei Panferov; Ivan Ilin; Han Guo; Peter Richtárik; Dan Alistarh;
73	GuideLLM: Exploring LLM-Guided Conversation with Applications in Autobiography Interviewing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: For automatic evaluation, we derive user proxies from multiple autobiographies and employ LLM-as-a-judge to score LLM behaviors.	Jinhao Duan; Xinyu Zhao; Zhuoxuan Zhang; Eunhye Grace Ko; Lily Boddy; Chenan Wang; Tianhao Li; Alexander Rasgon; Junyuan Hong; Min Kyung Lee; Chenxi Yuan; Qi Long; Ying Ding; Tianlong Chen; Kaidi Xu;
74	AgentSense: Benchmarking Social Intelligence of Language Agents Through Interactive Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To this end, we introduce AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios.	Xinyi Mou; Jingcong Liang; Jiayu Lin; Xinnong Zhang; Xiawei Liu; Shiyue Yang; Rong Ye; Lei Chen; Haoyu Kuang; Xuanjing Huang; Zhongyu Wei;
75	Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we conduct the first in-depth analysis of the role padding tokens play in T2I models.	Michael Toker; Ido Galil; Hadas Orgad; Rinon Gal; Yoad Tewel; Gal Chechik; Yonatan Belinkov;
76	Reward-Guided Tree Search for Inference Time Alignment of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we proposed DARWIN, an inference-time alignment method that leverage the guidance of a reward model to achieve alignment through reward-guided tree search.	Chia-Yu Hung; Navonil Majumder; Ambuj Mehrish; Soujanya Poria;
77	Language Models Largely Exhibit Human-like Constituent Ordering Preferences Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: One prominent theory presents the notion that constituent ordering is directly correlated with constituent weight: a measure of the constituent’s length or complexity.	Ada Tur; Gaurav Kamath; Siva Reddy;
78	The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We identify language-selective units within 18 popular LLMs, using the same localization approach that is used in neuroscience.	Badr AlKhamissi; Greta Tuckute; Antoine Bosselut; Martin Schrimpf;
79	Superlatives in Context: Modeling The Implicit Semantics of Superlatives Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work we provide an extensive computational study on the semantics of superlatives.	Valentina Pyatkin; Bonnie Webber; Ido Dagan; Reut Tsarfaty;
80	SLM-Mod: Small Language Models Surpass LLMs at Content Moderation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Large language models (LLMs) have shown promise in many natural language understanding tasks, including content moderation.	Xianyang Zhan; Agam Goyal; Yilun Chen; Eshwar Chandrasekharan; Koustuv Saha;
81	Reverse Thinking Makes LLMs Stronger Reasoners Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To enable Large Language Models (LLMs) to perform reverse thinking, we introduce Reverse-Enhanced Thinking (RevThink), a framework composed of data augmentation and learning objectives.	Justin Chen; Zifeng Wang; Hamid Palangi; Rujun Han; Sayna Ebrahimi; Long Le; Vincent Perot; Swaroop Mishra; Mohit Bansal; Chen-Yu Lee; Tomas Pfister;
82	MiCEval: Unveiling Multimodal Chain of Thought’s Quality Via Image Description and Reasoning Steps Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite its popularity, there is a notable absence of automated methods for evaluating the quality of reasoning steps in MCoT. To address this gap, we propose Multimodal Chain-of-Thought Evaluation (MiCEval), a framework designed to assess the correctness of reasoning chains by evaluating the quality of both the description and each reasoning step.	Xiongtao Zhou; Jie He; Lanyu Chen; Jingyu Li; Haojing Chen; Victor Gutierrez Basulto; Jeff Z. Pan; Hanjie Chen;
83	VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities Via Single-Stage Joint Speech-Text Supervised Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Another critical challenge with SpeechLMs is catastrophic forgetting, where models optimized for speech tasks suffer significant degradation in text-only performance. To mitigate these issues, we propose a novel single-stage joint speech-text SFT approach on the low-rank adaptation (LoRA) of the LLM backbone.	Yifan Peng; Krishna C Puvvada; Zhehuai Chen; Piotr Zelasko; He Huang; Kunal Dhawan; Ke Hu; Shinji Watanabe; Jagadeesh Balam; Boris Ginsburg;
84	Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work we demonstrate a new method to identify training data known to proprietary LLMs like GPT-4 without requiring any access to model weights or token probabilities, by using information-guided probes.	Abhilasha Ravichander; Jillian Fisher; Taylor Sorensen; Ximing Lu; Maria Antoniak; Bill Yuchen Lin; Niloofar Mireshghallah; Chandra Bhagavatula; Yejin Choi;
85	Can Unconfident LLM Annotations Be Used for Confident Conclusions? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce Confidence-driven inference: a method that combines LLM annotations and LLM confidence indicators to strategically select which human annotations should be collected, with the goal of producing accurate statistical estimates and provably valid confidence intervals while reducing the number of human annotations needed.	Kristina Gligoric; Tijana Zrnic; Cinoo Lee; Emmanuel Candes; Dan Jurafsky;
86	When2Call: When (not) to Call Tools Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We develop a new benchmark, When2Call, which evaluates tool-calling decision-making: when to generate a tool call, when to ask follow-up questions and when to admit the question can’t be answered with the tools provided.	Hayley Ross; Ameya Sunil Mahabaleshwarkar; Yoshi Suhara;
87	HyPA-RAG: A Hybrid Parameter Adaptive Retrieval-Augmented Generation System for AI Legal and Policy Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents the Hybrid Parameter-Adaptive RAG (HyPA-RAG) system, designed for the AI legal domain, with NYC Local Law 144 (LL144) as the test case.	Rishi Kalra; Zekun Wu; Ayesha Gulley; Airlie Hilliard; Xin Guan; Adriano Koshiyama; Philip Colin Treleaven;
88	AdaMergeX: Cross-Lingual Transfer with Large Language Models Via Adaptive Adapter Merging Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save	Yiran Zhao; Wenxuan Zhang; Huiming Wang; Kenji Kawaguchi; Lidong Bing;
89	AlgoPuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Algorithmic Multimodal Puzzles Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces the novel task of multimodal puzzle solving, framed within the context of visual question-answering.	Deepanway Ghosal; Vernon Toh; Yew Ken Chia; Soujanya Poria;
90	On The Impact of Fine-Tuning on Chain-of-Thought Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our research investigates the effect of fine-tuning on the reasoning abilities of LLMs, addressing critical questions regarding the impact of task-specific fine-tuning on overall reasoning capabilities, the influence of fine-tuning on Chain-of-Thought (CoT) reasoning performance, and the implications for the faithfulness of CoT reasonings.	Elita Lobo; Chirag Agarwal; Himabindu Lakkaraju;
91	Latent Factor Models Meets Instructions: Goal-conditioned Latent Factor Discovery Without Task Supervision Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present Instruct-LF, a goal-oriented latent factor discovery system that integrates LLM’s instruction-following ability with statistical models to handle large, noisy datasets where LLM reasoning alone falls short.	Zhouhang Xie; Tushar Khot; Bhavana Dalvi Mishra; Harshit Surana; Julian McAuley; Peter Clark; Bodhisattwa Prasad Majumder;
92	Rethinking Word Similarity: Semantic Similarity Through Classification Confusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a new measure of similarity, Word Confusion, that reframes semantic similarity in terms of feature-based classification confusion.	Kaitlyn Zhou; Haishan Gao; Sarah Li Chen; Dan Edelstein; Dan Jurafsky; Chen Shani;
93	Entropy-Based Decoding for Retrieval-Augmented Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Despite their success, retrieval-augmented LLMs still face the distractibility issue, where the generated responses are negatively influenced by noise from both external and internal knowledge sources. In this paper, we introduce a novel, training-free decoding method guided by entropy considerations to mitigate this issue.	Zexuan Qiu; Zijing Ou; Bin Wu; Jingjing Li; Aiwei Liu; Irwin King;
94	MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To this end, in our paper, we propose a new evaluation paradigm for MLLMs, which is evaluating MLLMs with per-sample criteria using potent MLLM as the judge.	Wentao Ge; Shunian Chen; Hardy Chen; Nuo Chen; Junying Chen; Zhihong Chen; Wenya Xie; Shuo Yan; ChenghaoZhu ChenghaoZhu; Ziyue Lin; Dingjie Song; Xidong Wang; Anningzhe Gao; Zhang Zhiyi; Jianquan Li; Xiang Wan; Benyou Wang;
95	On Behalf of The Stakeholders: Trends in NLP Model Interpretability in The Era of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we address three fundamental questions: Why do we need interpretability, what are we interpreting, and how?	Nitay Calderon; Roi Reichart;
96	CoRAG: Collaborative Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce CoRAG, a framework extending RAG to collaborative settings, where clients jointly train a shared model using a collaborative passage store.	Aashiq Muhamed; Mona T. Diab; Virginia Smith;
97	Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Afrispeech-Dialog, a benchmark dataset of 50 simulated medical and non-medical African-accented English conversations, designed to evaluate automatic speech recognition (ASR) and related technologies.	Mardhiyah Sanni; Tassallah Abdullahi; Devendra Deepak Kayande; Emmanuel Ayodele; Naome A Etori; Michael Samwel Mollel; Moshood O. Yekini; Chibuzor Okocha; Lukman Enegi Ismaila; Folafunmi Omofoye; Boluwatife A. Adewale; Tobi Olatunji;
98	Smurfs: Multi-Agent System Using Context-Efficient DFSDT for Tool Planning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce “Smurfs,” a novel multi-agent system (MAS) that enhances DFSDT with a modular, context-efficient, and training-free design.	Junzhi Chen; Juhao Liang; Benyou Wang;
99	Tricking Retrievers with Influential Tokens: An Efficient Black-Box Corpus Poisoning Attack Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods for crafting such passages, such as random token replacement or training inversion models, are often slow and computationally expensive, requiring either access to retriever’s gradients or large computational resources. To address these limitations, we propose Dynamic Importance-Guided Genetic Algorithm (DIGA), an efficient black-box method that leverages two key properties of retrievers: insensitivity to token order and bias towards influential tokens.	Cheng Wang; Yiwei Wang; Yujun Cai; Bryan Hooi;
100	Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we introduce a novel evaluation framework based on sub-question coverage, which measures how well a RAG system addresses different facets of a question.	Kaige Xie; Philippe Laban; Prafulla Kumar Choubey; Caiming Xiong; Chien-Sheng Wu;
101	LiPO: Listwise Preference Optimization Through Learning-to-Rank Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we formulate the LM alignment as a listwise ranking problem and describe the LiPO framework, where the policy can potentially learn more effectively from a ranked list of plausible responses given the prompt.	Tianqi Liu; Zhen Qin; Junru Wu; Jiaming Shen; Misha Khalman; Rishabh Joshi; Yao Zhao; Mohammad Saleh; Simon Baumgartner; Jialu Liu; Peter J Liu; Xuanhui Wang;
102	Simulating Classroom Education with LLM-Empowered Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose SimClass, a multi-agent classroom simulation teaching framework.	Zheyuan Zhang; Daniel Zhang-Li; Jifan Yu; Linlu Gong; Jinchang Zhou; Zhanxin Hao; Jianxiao Jiang; Jie Cao; Huiqin Liu; Zhiyuan Liu; Lei Hou; Juanzi Li;
103	UNDIAL: Self-Distillation with Adjusted Logits for Robust Unlearning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we introduce UnDIAL (Unlearning via Self-Distillation on Adjusted Logits), a novel and robust unlearning method.	Yijiang River Dong; Hongzhou Lin; Mikhail Belkin; Ramon Huerta; Ivan Vulić;
104	Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we extend the evaluation to real-world user queries and non-English-centric LLMs, offering a broader examination of multilingual performance.	Chaoqun Liu; Wenxuan Zhang; Yiran Zhao; Anh Tuan Luu; Lidong Bing;
105	Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Existing evaluation methods struggle to ensure semantic correctness and rely on simple or unrealistic datasets. To bridge this gap, we introduce Planetarium, a benchmark designed to evaluate language models’ ability to generate PDDL code from natural language descriptions of planning tasks.	Max Zuo; Francisco Piedrahita Velez; Xiaochen Li; Michael Littman; Stephen Bach;
106	Little Giants: Synthesizing High-Quality Embedding Data at Scale Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we introduce SPEED, a framework that aligns open-source small models (8B) to efficiently generate large-scale synthetic embedding data.	Haonan Chen; Liang Wang; Nan Yang; Yutao Zhu; Ziliang Zhao; Furu Wei; Zhicheng Dou;
107	Reliability of Topic Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we show that the standard practice for quantifying topic model reliability fails to capture essential aspects of the variation in two widely-used topic models.	Kayla Schroeder; Zach Wood-Doughty;
108	Large Language Models Can Solve Real-World Planning Rigorously with Formal Verification Tools Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: For example, a U. S. domestic travel planning benchmark TravelPlanner was proposed in Xie et al. (2024), where the best LLM OpenAI o1-preview can only find viable travel plans with a 10% success rate given all needed information. In this work, we tackle this by proposing an LLM-based planning framework that formalizes and solves complex multi-constraint planning problems as constrained satisfiability problems, which are further consumed by sound and complete satisfiability solvers.	Yilun Hao; Yongchao Chen; Yang Zhang; Chuchu Fan;
109	AI-Assisted Human Evaluation of Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save	Vilém Zouhar; Tom Kocmi; Mrinmaya Sachan;
110	CodexGraph: Bridging Large Language Models and Code Repositories Via Code Graph Databases Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Similarity-based retrieval often has low recall in complex tasks, while manual tools and APIs are typically task-specific and require expert knowledge, reducing their generalizability across diverse code tasks and real-world applications. To mitigate these limitations, we introduce CodexGraph, a system that integrates LLM agents with graph database interfaces extracted from code repositories.	Xiangyan Liu; Bo Lan; Zhiyuan Hu; Yang Liu; Zhicheng Zhang; Fei Wang; Michael Qizhe Shieh; Wenmeng Zhou;
111	Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: For queries like “Find me a highly rated camera for wildlife photography compatible with my Nikon F-Mount lenses”, existing methods may generate expansions that are semantically similar but structurally unrelated to user intents. To handle such semi-structured queries with both textual and relational requirements, in this paper we propose a knowledge-aware query expansion framework, augmenting LLMs with structured document relations from knowledge graph (KG).	Yu Xia; Junda Wu; Sungchul Kim; Tong Yu; Ryan A. Rossi; Haoliang Wang; Julian McAuley;
112	Unifying AI Tutor Evaluation: An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we investigate whether current state-of-the-art large language models (LLMs) are effective as AI tutors and whether they demonstrate pedagogical abilities necessary for good AI tutoring in educational dialogues.	Kaushal Kumar Maurya; Kv Aditya Srivatsa; Kseniia Petukhova; Ekaterina Kochmar;
113	FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The rapid proliferation of large language models (LLMs) in natural language processing (NLP) has created a critical need for techniques that enable efficient deployment on memory-constrained devices without compromising performance. We present a method to prune LLMs that selectively prunes model blocks based on an importance score and replaces them with a low-parameter replacement strategy.	James Seale Smith; Chi-Heng Lin; Shikhar Tuli; Haris Jeelani; Shangqian Gao; Yilin Shen; Hongxia Jin; Yen-Chang Hsu;
114	StyleDistance: Stronger Content-Independent Style Embeddings with Synthetic Parallel Examples Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce StyleDistance, a novel approach to training stronger content-independent style embeddings.	Ajay Patel; Jiacheng Zhu; Justin Qiu; Zachary Horvitz; Marianna Apidianaki; Kathleen McKeown; Chris Callison-Burch;
115	Anticipating Future with Large Language Model for Simultaneous Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Motivated by human interpreters’ technique to forecast future words before hearing them, we propose Translation by Anticipating Future (TAF), a method to improve translation quality while retaining low latency.	Siqi Ouyang; Oleksii Hrinchuk; Zhehuai Chen; Vitaly Lavrukhin; Jagadeesh Balam; Lei Li; Boris Ginsburg;
116	Dynamic Uncertainty Ranking: Enhancing Retrieval-Augmented In-Context Learning for Long-Tail Knowledge in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To take advantage of the uncertainty in ICL for guiding LLM predictions toward correct answers on long-tail samples, we propose a reinforcement learning-based dynamic uncertainty ranking method for retrieval-augmented ICL that accounts for the varying impact of each retrieved sample on LLM predictions.	Shuyang Yu; Runxue Bao; Parminder Bhatia; Taha Kass-Hout; Jiayu Zhou; Cao Xiao;
117	Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing research on UI/UX automation often requires high-fidelity inputs like Figma designs or detailed screenshots, limiting accessibility and impeding efficient design iteration. To bridge this gap, we introduce Sketch2Code, a benchmark that evaluates state-of-the-art Vision Language Models (VLMs) on automating the conversion of rudimentary sketches into webpage prototypes.	Ryan Li; Yanzhe Zhang; Diyi Yang;
118	Improving Retrospective Language Agents Via Joint Policy Gradient Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Meanwhile, although fine-tuning methods significantly enhance the capabilities of smaller LLMs, the fine-tuned agents often lack the potential for self-reflection and self-improvement. To address these challenges, we introduce a novel agent framework named RetroAct, which is a framework that jointly optimizes both task-planning and self-reflective evolution capabilities in language agents.	Xueyang Feng; Bo Lan; Quanyu Dai; Lei Wang; Jiakai Tang; Xu Chen; Zhenhua Dong; Ji-Rong Wen;
119	Dynamic Fisher-weighted Model Merging Via Bayesian Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Both approaches exhibit their own weaknesses, leading to a notable performance gap compared to multi-task fine-tuning. In this paper, we unify these seemingly distinct strategies into a more general merging framework, and introduce Dynamic Fisher-weighted Merging (DF-Merge).	Sanwoo Lee; Jiahao Liu; Qifan Wang; Jingang Wang; Xunliang Cai; Yunfang Wu;
120	Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Examining social biases in LVLMs is particularly challenging due to the confounding contributions of bias induced by information contained across the text and visual modalities. To address this challenging problem, we conduct a large-scale study of text generated by different LVLMs under counterfactual changes to input images, producing over 57 million responses from popular models.	Phillip Howard; Kathleen C. Fraser; Anahita Bhiwandiwalla; Svetlana Kiritchenko;
121	Reverse Question Answering: Can An LLM Write A Question So Hard (or Bad) That It Can’t Answer? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: By finding question and answer types that lead to RQA errors, we suggest improvements for LLM reasoning.	Nishant Balepur; Feng Gu; Abhilasha Ravichander; Shi Feng; Jordan Lee Boyd-Graber; Rachel Rudinger;
122	Investigating Human Values in Online Communities Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To study the dynamics of communities online, we propose a method to computationally analyse values present on Reddit.	Nadav Borenstein; Arnav Arora; Lucie-Aimée Kaffee; Isabelle Augenstein;
123	Steering Knowledge Selection Behaviours in LLMs Via SAE-Based Representation Engineering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we propose SpARE, a training-free representation engineering method that uses pre-trained sparse auto-encoders (SAEs) to control the knowledge selection behaviour of LLMs.	Yu Zhao; Alessio Devoto; Giwon Hong; Xiaotang Du; Aryo Pradipta Gema; Hongru Wang; Xuanli He; Kam-Fai Wong; Pasquale Minervini;
124	Does Liking Yellow Imply Driving A School Bus? Semantic Leakage in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we identify and characterize a phenomenon never discussed before, which we call semantic leakage, where models leak irrelevant information from the prompt into the generation in unexpected ways.	Hila Gonen; Terra Blevins; Alisa Liu; Luke Zettlemoyer; Noah A. Smith;
125	ReachAgent: Enhancing Mobile Agent Via Page Reaching and Operation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing agents tend to focus on most task-relevant elements at each step, leading to local optimal solutions and ignoring the overall GUI flow. To address this issue, we constructed a training dataset called MobileReach, which breaks the task into page reaching and operation subtasks.	Qinzhuo Wu; Wei Liu; Jian Luan; Bin Wang;
126	MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To address these issues, we focus on developing small, efficient, and open-source text-to-SQL models. We demonstrate the benefits of sampling multiple candidate SQL generations and propose our method, MSc-SQL, to critique them using associated metadata.	Satya Krishna Gorti; Ilan Gofman; Zhaoyan Liu; Jiapeng Wu; Noël Vouitsis; Guangwei Yu; Jesse C. Cresswell; Rasa Hosseinzadeh;
127	SHADES: Towards A Multilingual Assessment of Stereotypes in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While research has attempted to identify and mitigate such biases, most efforts have been concentrated around English, lagging the rapid advancement of LLMs in multilingual settings. In this paper, we introduce a new multilingual parallel dataset SHADES to help address this issue, designed for examining culturally-specific stereotypes that may be learned by LLMs.	Margaret Mitchell; Giuseppe Attanasio; Ioana Baldini; Miruna Clinciu; Jordan Clive; Pieter Delobelle; Manan Dey; Sil Hamilton; Timm Dill; Jad Doughman; Ritam Dutt; Avijit Ghosh; Jessica Zosa Forde; Carolin Holtermann; Lucie-Aimée Kaffee; Tanmay Laud; Anne Lauscher; Roberto L Lopez-Davila; Maraim Masoud; Nikita Nangia; Anaelia Ovalle; Giada Pistilli; Dragomir Radev; Beatrice Savoldi; Vipul Raheja; Jeremy Qin; Esther Ploeger; Arjun Subramonian; Kaustubh Dhole; Kaiser Sun; Amirbek Djanibekov; Jonibek Mansurov; Kayo Yin; Emilio Villa Cueva; Sagnik Mukherjee; Jerry Huang; Xudong Shen; Jay Gala; Hamdan Al-Ali; Tair Djanibekov; Nurdaulet Mukhituly; Shangrui Nie; Shanya Sharma; Karolina Stanczak; Eliza Szczechla; Tiago Timponi Torrent; Deepak Tunuguntla; Marcelo Viridiano; Oskar Van Der Wal; Adina Yakefu; Aurélie Névéol; Mike Zhang; Sydney Zink; Zeerak Talat;
128	Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: While many previous works have addressed this issue in LLM via machine unlearning, it remains largely unexplored for MLLMs. To tackle this challenge, we introduce Multimodal Large Language Model Unlearning Benchmark (MLLMU-Bench), a novel benchmark aimed at advancing the understanding of multimodal machine unlearning.	Zheyuan Liu; Guangyao Dou; Mengzhao Jia; Zhaoxuan Tan; Qingkai Zeng; Yongle Yuan; Meng Jiang;
129	Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we studied LLM’s linguistic preference in a cross-language RAG-based information search setting.	Nikhil Sharma; Kenton Murray; Ziang Xiao;
130	RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose RAG-Star, a novel RAG approach that integrates the retrieved information to guide the tree-based deliberative reasoning process that relies on the inherent knowledge of LLMs.	Jinhao Jiang; Jiayi Chen; Junyi Li; Ruiyang Ren; Shijie Wang; Xin Zhao; Yang Song; Tao Zhang;
131	DAWN-ICL: Strategic Planning of Problem-solving Trajectories for Zero-Shot In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: The random traversing order may generate unreliable pseudo-demonstrations and lead to error accumulation. To address this problem, we reformulate ZS-ICL as a planning problem and propose a Demonstration-AWare MoNte Carlo Tree Search (MCTS) approach (DAWN-ICL), which leverages MCTS to strategically plan the problem-solving trajectories for ZS-ICL.	Xinyu Tang; Xiaolei Wang; Xin Zhao; Ji-Rong Wen;
132	The Russian-focused Embedders’ Exploration: RuMTEB Benchmark and Russian Embedding Model Design Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: It introduces a new Russian-focused embedding model called ru-en-RoSBERTa and the ruMTEB benchmark, the Russian version extending the Massive Text Embedding Benchmark (MTEB).	Artem Snegirev; Maria Tikhonova; Maksimova Anna; Alena Fenogenova; Aleksandr Abramov;
133	FactTrack: Time-Aware World State Tracking in Story Outlines Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a novel method, FactTrack, for tracking atomic facts and addressing factual contradictions.	Zhiheng Lyu; Kevin Yang; Lingpeng Kong; Dan Klein;
134	Can LLMs Convert Graphs to Text-Attributed Graphs? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: While promising, this method relies heavily on the availability of text-attributed graph data, which is difficult to obtain in practice. To bridge this gap, we propose a novel method named Topology-Aware Node description Synthesis (TANS), leveraging large language models (LLMs) to convert existing graphs into text-attributed graphs.	Zehong Wang; Sidney Liu; Zheyuan Zhang; Tianyi Ma; Chuxu Zhang; Yanfang Ye;
135	M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While, fully synthetic datasets are a promising alternative, research on their use in multilingual domain is limited as existing approaches still rely on machine translation to improve multilingual performance. To bridge this gap we introduce M2Lingual, the first fully synthetic, multi-turn multilingual dataset having 175K conversations across 70 languages with a balanced mix of high, low and mid-resourced languages.	Rishabh Maheshwary; Vikas Yadav; Hoang H Nguyen; Khyati Mahajan; Sathwik Tejaswi Madhusudhan;
136	Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Motivated by the acoustic modeling capabilities of frozen self-supervised speech model (S3M) features, we propose MixGoP, a novel approach that leverages Gaussian mixture models to model phoneme distributions with multiple subclusters.	Kwanghee Choi; Eunjung Yeo; Kalvin Chang; Shinji Watanabe; David R Mortensen;
137	Evaluating The Prompt Steerability of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To this end, we propose a benchmark for evaluating the steerability of model personas as a function of prompting.	Erik Miehling; Michael Desmond; Karthikeyan Natesan Ramamurthy; Elizabeth M. Daly; Kush R. Varshney; Eitan Farchi; Pierre Dognin; Jesus Rios; Djallel Bouneffouf; Miao Liu; Prasanna Sattigeri;
138	AI-LieDar : Examine The Trade-off Between Utility and Truthfulness in LLM Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Truthfulness (adherence to factual accuracy) and utility (satisfying human needs and instructions) are both fundamental aspects of Large Language Models, yet these goals often conflict (e. g. , sell a car with known flaws), making it challenging to achieve both in real-world deployments. We propose AI-LieDar, a framework to study how LLM-based agents navigate these scenarios in an multi-turn interactive setting.	Zhe Su; Xuhui Zhou; Sanketh Rangreji; Anubha Kabra; Julia Mendelsohn; Faeze Brahman; Maarten Sap;
139	SELFGOAL: Your Language Agents Already Know How to Achieve High-level Goals Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present SELFGOAL, a novel automatic approach designed to enhance agents’ capabilities to achieve high-level goals with limited human prior and environmental feedback.	Ruihan Yang; Jiangjie Chen; Yikai Zhang; Siyu Yuan; Aili Chen; Kyle Richardson; Yanghua Xiao; Deqing Yang;
140	LLaMA-Berry: Pairwise Optimization for Olympiad-level Mathematical Reasoning Via O1-like Monte Carlo Tree Search Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents LLaMA-Berry, an advanced mathematical reasoning framework to enhance the problem-solving ability of large language models (LLMs).	Di Zhang; Jianbo Wu; Jingdi Lei; Tong Che; Jiatong Li; Tong Xie; Xiaoshui Huang; Shufei Zhang; Marco Pavone; Yuqiang Li; Wanli Ouyang; Dongzhan Zhou;
141	CharacterBox: Evaluating The Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose CharacterBox, which is a simulation sandbox designed to generate situational fine-grained character behavior trajectories.	Lei Wang; Jianxun Lian; Yi Huang; Yanqi Dai; Haoxuan Li; Xu Chen; Xing Xie; Ji-Rong Wen;
142	Stronger Universal and Transferable Attacks By Suppressing Refusals Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Contrary to this belief, we find that the adversarial prompts discovered by such optimizers are inherently prompt-universal and transferable, even when optimized on a single model and a single harmful request. To further exploit this phenomenon, we introduce IRIS, a new objective to these optimizers to explicitly deactivate the safety feature to create an even stronger universal and transferable attack.	David Huang; Avidan Shah; Alexandre Araujo; David Wagner; Chawin Sitawarin;
143	Unmasking Implicit Bias: Evaluating Persona-Prompted LLM Responses in Power-Disparate Social Scenarios Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a novel framework using cosine distance to measure semantic shifts in responses and an LLM-judged Preference Win Rate (WR) to assess how demographic prompts affect response quality across power-disparate social scenarios.	Bryan Chen Zhengyu Tan; Roy Ka-Wei Lee;
144	Arabic Dataset for LLM Safeguard Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To uncover the impact of different stances in handling sensitive and controversial topics, we propose a dual-perspective evaluation framework.	Yasser Ashraf; Yuxia Wang; Bin Gu; Preslav Nakov; Timothy Baldwin;
145	PeerQA: A Scientific Question Answering Dataset from Peer Reviews Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present PeerQA, a real-world, scientific, document-level Question Answering (QA) dataset.	Tim Baumgärtner; Ted Briscoe; Iryna Gurevych;
146	WorkTeam: Constructing Workflows from Natural Language with Multi-Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recent advancements in Large Language Models (LLMs) have improved the generation of workflows from natural language instructions (aka NL2Workflow), yet existing single LLM agent-based methods face performance degradation on complex tasks due to the need for specialized knowledge and the strain of task-switching. To tackle these challenges, we propose WorkTeam, a multi-agent NL2Workflow framework comprising a supervisor, orchestrator, and filler agent, each with distinct roles that collaboratively enhance the conversion process.	Hanchao Liu; Rongjun Li; Weimin Xiong; Ziyu Zhou; Wei Peng;
147	PromptRefine: Enhancing Few-Shot Performance on Low-Resource Indic Languages with Example Selection from Related Example Banks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose PromptRefine, a novel Alternating Minimization approach for example selection that improves ICL performance on low-resource Indic languages.	Soumya Suvra Ghosal; Soumyabrata Pal; Koyel Mukherjee; Dinesh Manocha;
148	Beyond Literal Token Overlap: Token Alignability for Multilinguality Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose subword token alignability as a new way to understand the impact and quality of multilingual tokenisation.	Katharina Hämmerl; Tomasz Limisiewicz; Jindřich Libovický; Alexander Fraser;
149	Differentially Private Learning Needs Better Model Initialization and Self-Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce DPRefine, a three-phase method that initializes a model using data synthesis from a small pre-trained LM with rigorous filtering, applies DP finetuning on private data, and performs self-distillation to refine outputs.	Ivoline C. Ngong; Joseph Near; Niloofar Mireshghallah;
150	LLM2: Let Large Language Models Harness System 2 Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Drawing inspiration from the dual-process theory of human cognition, we introduce LLM2, a novel framework that combines an LLM (System 1) with a process-based verifier (System 2).	Cheng Yang; Chufan Shi; Siheng Li; Bo Shui; Yujiu Yang; Wai Lam;
151	A Logical Fallacy-Informed Framework for Argument Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: An important factor contributing to LLMs’ suboptimal performance in generating coherent arguments is their oversight of logical fallacies. To address this issue, we introduce fallacy-informed preference optimization (FIPO) that helps steer LLMs toward generating logically sound arguments.	Luca Mouchel; Debjit Paul; Shaobo Cui; Robert West; Antoine Bosselut; Boi Faltings;
152	World Models with Hints of Large Language Models for Goal Achieving Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by human cognition, we propose a new multi-modal model-based RL approach named Dreaming with Large Language Models (DLLM).	Zeyuan Liu; Ziyu Huan; Xiyao Wang; Jiafei Lyu; Jian Tao; Xiu Li; Furong Huang; Huazhe Xu;
153	Reversed Attention: On The Gradient Descent Of Attention Layers In GPT Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we study the mathematics of the backward pass of attention, revealing that it implicitly calculates an attention matrix we refer to as “Reversed Attention”.	Shahar Katz; Lior Wolf;
154	JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Save	Shota Onohara; Atsuyuki Miyai; Yuki Imajuku; Kazuki Egashira; Jeonghun Baek; Xiang Yue; Graham Neubig; Kiyoharu Aizawa;
155	Auto-Cypher: Improving LLMs on Cypher Generation Via LLM-supervised Generation-verification Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we present an automated, LLM Supervised, pipeline to generate high quality synthetic data for Text2Cypher.	Aman Tiwari; Shiva Krishna Reddy Malay; Vikas Yadav; Masoud Hashemi; Sathwik Tejaswi Madhusudhan;
156	Reasoning Aware Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Self-consistency mitigates hallucinations in Large Language Models (LLMs) by sampling multiple reasoning paths, but it lacks a systematic approach to determine the optimal number of samples or select the most faithful rationale. To address this limitation, we introduce Reasoning-Aware Self-Consistency (RASC), a novel framework that enhances sampling efficiency and reasoning faithfulness by dynamically evaluating both outputs and rationales.	Guangya Wan; Yuqi Wu; Jie Chen; Sheng Li;
157	Teaching Models to Balance Resisting and Accepting Persuasion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In order to balance positive and negative persuasion, we introduce Persuasion-Balanced Training (or PBT), which leverages multi-agent recursive dialogue trees to create data and trains models via preference optimization to accept persuasion when appropriate.	Elias Stengel-Eskin; Peter Hase; Mohit Bansal;
158	Guiding Through Complexity: What Makes Good Supervision for Hard Reasoning Tasks? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: How can “weak teacher models” (Bowman et al. , 2022) such as average human annotators or existing AI systems, effectively supervise LLMs to improve performance on hard reasoning tasks, especially those that challenge and requires expertise or daily practice from the teacher models? In this paper, we seek for empirical answers to this question by investigating various data-driven strategies that offer supervision data at different quality levels upon tasks of varying complexity.	Xuan He; Da Yin; Nanyun Peng;
159	PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce PoisonedParrot: the first stealthy data poisoning attack that induces an LLM to generate copyrighted content even when the model has not been directly trained on the specific copyrighted material.	Michael-Andrei Panaitescu-Liess; Pankayaraj Pathmanathan; Yigitcan Kaya; Zora Che; Bang An; Sicheng Zhu; Aakriti Agrawal; Furong Huang;
160	The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Our contributions are as follows: (1) We introduce MosAIC, a Multi-Agent framework to enhance cross-cultural Image Captioning using LMMs with distinct cultural personas; (2) We provide a dataset of culturally enriched image captions in English for images from China, India, and Romania across three datasets: GeoDE, GD-VCR, CVQA; (3) We propose a culture-adaptable metric for evaluating cultural information within image captions; and (4) We show that the multi-agent interaction outperforms single-agent models across different metrics, and offer valuable insights for future research.	Longju Bai; Angana Borah; Oana Ignat; Rada Mihalcea;
161	FiNE: Filtering and Improving Noisy Data Elaborately with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save	Junliang He; Ziyue Fan; Shaohui Kuang; Li Xiaoqing; Kai Song; Yaqian Zhou; Xipeng Qiu;
162	Can Large Language Models Invent Algorithms to Improve Themselves? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the methods for improving LLMs are still designed by humans, which restricts the invention of new model-improving algorithms to human expertise and imagination. To address this, we propose the Self-Developing framework, which enables LLMs to autonomously generate and learn model-improvement algorithms.	Yoichi Ishibashi; Taro Yano; Masafumi Oyamada;
163	Grammar Control in Dialogue Response Generation for Language Learning Chatbots Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We comprehensively evaluate prompting, fine-tuning, and decoding strategies for grammar-controlled dialogue response generation.	Dominik Glandorf; Peng Cui; Detmar Meurers; Mrinmaya Sachan;
164	Pointwise Mutual Information As A Performance Gauge for Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, there is no method to date that exploits this phenomenon to improve generation. To fill this gap, in this study, we show that the pointwise mutual information between a context and a question is an effective gauge for language model performance.	Tianyu Liu; Jirui Qi; Paul He; Arianna Bisazza; Mrinmaya Sachan; Ryan Cotterell;
165	Hello Again! LLM-powered Personalized Agent for Long-term Dialogue Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Recent progress in the human-like cognitive and reasoning capabilities of LLMs suggests that LLM-based agents could significantly enhance automated perception, decision-making, and problem-solving. In response to this potential, we introduce a model-agnostic framework, the Long-term Dialogue Agent (LD-Agent), which incorporates three independently tunable modules dedicated to event perception, persona extraction, and response generation.	Hao Li; Chenghao Yang; An Zhang; Yang Deng; Xiang Wang; Tat-Seng Chua;
166	Sneaking Syntax Into Transformer Language Models with Tree Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work instead aims to softly inject syntactic inductive biases into given transformer circuits, through a structured regularizer.	Ananjan Nandi; Christopher D Manning; Shikhar Murty;
167	MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present a simple efficient technique to combine the best of both worlds.	Nandan Thakur; Suleman Kazi; Ge Luo; Jimmy Lin; Amin Ahmad;
168	ReIFE: Re-evaluating Instruction-Following Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, there is a lack of comprehensive evaluation of these LLM-based evaluators across two dimensions: the base LLMs and the evaluation protocols. Therefore, we present a thorough meta-evaluation of instruction following, including 25 base LLMs and 15 recently proposed evaluation protocols, on 4 human-annotated datasets, assessing the evaluation accuracy of the LLM-evaluators.	Yixin Liu; Kejian Shi; Alexander Fabbri; Yilun Zhao; PeiFeng Wang; Chien-Sheng Wu; Shafiq Joty; Arman Cohan;
169	Are Multimodal LLMs Robust Against Adversarial Perturbations? RoMMath: A Systematic Evaluation on Multimodal Math Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce RoMMath, the first benchmark designed to evaluate the capabilities and robustness of multimodal large language models (MLLMs) in handling multimodal math reasoning, particularly when faced with adversarial perturbations.	Yilun Zhao; Guo Gan; Chen Zhao; Arman Cohan;
170	From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose integrating attention analysis with LLaVA-CAM, concretely, attention scores highlight relevant regions during forward propagation, while LLaVA-CAM captures gradient changes through backward propagation, revealing key image features.	Xiaofeng Zhang; Yihao Quan; Chen Shen; Xiaosong Yuan; Shaotian Yan; Liang Xie; Wenxiao Wang; Chaochen Gu; Hao Tang; Jieping Ye;
171	Self-Pluralising Culture Alignment for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose CultureSPA, a Self-Pluralising Culture Alignment framework that allows LLMs to simultaneously align to pluralistic cultures.	Shaoyang Xu; Yongqi Leng; Linhao Yu; Deyi Xiong;
172	Cracking The Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce IndoCareer, a dataset comprising 8,834 multiple-choice questions designed to evaluate performance in vocational and professional certification exams across various fields.	Fajri Koto;
173	PRACTIQ: A Practical Conversational Text-to-SQL Dataset with Ambiguous and Unanswerable Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we construct a practical conversational text-to-SQL dataset called PRACTIQ, consisting of ambiguous and unanswerable questions inspired by real-world user questions.	Mingwen Dong; Nischal Ashok Kumar; Yiqun Hu; Anuj Chauhan; Chung-Wei Hang; Shuaichen Chang; Lin Pan; Wuwei Lan; Henghui Zhu; Jiarong Jiang; Patrick Ng; Zhiguo Wang;
174	MASTER: A Multi-Agent System with LLM Specialized MCTS Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Secondly, obtaining statistically significant reward estimations typically requires a sample size exceeding 30 simulations, resulting in excessive token usage and time consumption. To address these challenges, we present Multi-Agent System with Tactical Execution and Reasoning using LLM Specialized MCTS (MASTER), a novel framework that coordinates agent recruitment and communication through LLM specialized MCTS.	Bingzheng Gan; Yufan Zhao; Tianyi Zhang; Jing Huang; Li Yusu; Shu Xian Teo; Changwang Zhang; Wei Shi;
175	UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we introduce UniHGKR, a unified instruction-aware heterogeneous knowledge retriever that (1) builds a unified retrieval space for heterogeneous knowledge and (2) follows diverse user instructions to retrieve knowledge in specified types.	Dehai Min; Zhiyang Xu; Guilin Qi; Lifu Huang; Chenyu You;
176	Take The Essence and Discard The Dross: A Rethinking on Data Selection for Fine-Tuning Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we conduct a focused review of recent data selection techniques for fine-tuning LLMs, analyzing a dozen key studies.	Ziche Liu; Rui Ke; Yajiao Liu; Feng Jiang; Haizhou Li;
177	Evaluating Morphological Compositional Generalization in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we systematically investigate the morphological generalization abilities of LLMs through the lens of compositionality.	Mete Ismayilzada; Defne Circi; Jonne Sälevä; Hale Sirin; Abdullatif Köksal; Bhuwan Dhingra; Antoine Bosselut; Duygu Ataman; Lonneke Van Der Plas;
178	Social Norms in Cinema: A Cross-Cultural Analysis of Shame, Pride and Prejudice Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce the first cross-cultural dataset of over 10k shame/pride-related expressions with underlying social expectations from ~5.	Sunny Rai; Khushang Zaveri; Shreya Havaldar; Soumna Nema; Lyle Ungar; Sharath Chandra Guntuku;
179	WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Vision Language Models (VLMs) often struggle with culture-specific knowledge, particularly in languages other than English and in underrepresented cultural contexts. To evaluate their understanding of such knowledge, we introduce WorldCuisines, a massive-scale benchmark for multilingual and multicultural, visually grounded language understanding.	Genta Indra Winata; Frederikus Hudi; Patrick Amadeus Irawan; David Anugraha; Rifki Afina Putri; Wang Yutong; Adam Nohejl; Ubaidillah Ariq Prathama; Nedjma Ousidhoum; Afifa Amriani; Anar Rzayev; Anirban Das; Ashmari Pramodya; Aulia Adila; Bryan Wilie; Candy Olivia Mawalim; Cheng Ching Lam; Daud Abolade; Emmanuele Chersoni; Enrico Santus; Fariz Ikhwantri; Garry Kuwanto; Hanyang Zhao; Haryo Akbarianto Wibowo; Holy Lovenia; Jan Christian Blaise Cruz; Jan Wira Gotama Putra; Junho Myung; Lucky Susanto; Maria Angelica Riera Machin; Marina Zhukova; Michael Anugraha; Muhammad Farid Adilazuarda; Natasha Christabelle Santosa; Peerat Limkonchotiwat; Raj Dabre; Rio Alexander Audino; Samuel Cahyawijaya; Shi-Xiong Zhang; Stephanie Yulia Salim; Yi Zhou; Yinxuan Gui; David Ifeoluwa Adelani; En-Shiun Annie Lee; Shogo Okada; Ayu Purwarianti; Alham Fikri Aji; Taro Watanabe; Derry Tanti Wijaya; Alice Oh; Chong-Wah Ngo;
180	H-STAR: LLM-driven Hybrid SQL-Text Adaptive Reasoning on Tables Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Existing methods employ either textual reasoning, which excels in semantic interpretation but struggles with mathematical operations, or symbolic reasoning, which handles computations well but lacks semantic understanding. This paper introduces a novel algorithm H-STAR that integrates both symbolic and semantic (textual) approaches in a two-stage process to address these limitations.	Nikhil Abhyankar; Vivek Gupta; Dan Roth; Chandan K. Reddy;
181	Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we critically assess the limitations of the state-of-the-art training-free technique, the logit lens, in handling generalized visual hallucinations.	Anirudh Phukan; Divyansh Divyansh; Harshit Kumar Morj; Vaishnavi Vaishnavi; Apoorv Saxena; Koustava Goswami;
182	Black-Box Visual Prompt Engineering for Mitigating Object Hallucination in Large Vision Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Surprisingly, we find that simple object-based visual prompting—overlaying visual cues (e. g. , bounding box, circle) on images—can significantly mitigate such hallucination; however, different visual prompts (VPs) vary in effectiveness. To address this, we propose Black-Box Visual Prompt Engineering (BBVPE), a framework to identify optimal VPs that enhance LVLM responses without needing access to model internals.	Sangmin Woo; Kang Zhou; Yun Zhou; Shuai Wang; Sheng Guan; Haibo Ding; Lin Lee Cheong;
183	Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Despite the increasing capabilities of LLMs, past research has not explored their capabilities in detecting ADRs related to psychiatric medications or in providing effective harm reduction strategies. To address this, we introduce the Psych-ADR benchmark and the Adverse Drug Reaction Response Assessment (ADRA) framework to systematically evaluate LLM performance in detecting ADR expressions and delivering expert-aligned mitigation strategies.	Mohit Chandra; Siddharth Sriraman; Gaurav Verma; Harneet Singh Khanuja; Jose Suarez Campayo; Zihang Li; Michael L. Birnbaum; Munmun De Choudhury;
184	Exploring Safety-Utility Trade-Offs in Personalized Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: As large language models (LLMs) become increasingly integrated into daily applications, it is essential to ensure they function fairly across diverse user demographics. In this work, we show that LLMs suffer from personalization bias, where their performance is impacted when they are personalized to a user’s identity.	Anvesh Rao Vijjini; Somnath Basu Roy Chowdhury; Snigdha Chaturvedi;
185	IrokoBench: A New Benchmark for African Languages in The Age of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce IrokoBench—a human-translated benchmark dataset for 17 typologically-diverse low-resource African languages covering three tasks: natural language inference(AfriXNLI), mathematical reasoning(AfriMGSM), and multi-choice knowledge-based QA(AfriMMLU).	David Ifeoluwa Adelani; Jessica Ojo; Israel Abebe Azime; Jian Yun Zhuang; Jesujoba Oluwadara Alabi; Xuanli He; Millicent Ochieng; Sara Hooker; Andiswa Bukula; En-Shiun Annie Lee; Chiamaka Ijeoma Chukwuneke; Happy Buzaaba; Blessing Kudzaishe Sibanda; Godson Koffi Kalipe; Jonathan Mukiibi; Salomon Kabongo Kabenamualu; Foutse Yuehgoh; Mmasibidi Setaka; Lolwethu Ndolela; Nkiruka Odu; Rooweither Mabuya; Salomey Osei; Shamsuddeen Hassan Muhammad; Sokhar Samb; Tadesse Kebede Guge; Tombekai Vangoni Sherman; Pontus Stenetorp;
186	Retrieval, Reasoning, Re-ranking: A Context-Enriched Framework for Knowledge Graph Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Apart from triples, entity contexts (e. g. , labels, descriptions, aliases) also play a significant role in augmenting KGs. To address these limitations, we propose KGR3, a context-enriched framework for KGC.	Muzhi Li; Cehao Yang; Chengjin Xu; Xuhui Jiang; Yiyan Qi; Jian Guo; Ho-fung Leung; Irwin King;
187	On Positional Bias of Faithfulness for Long-form Summarization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Large Language Models (LLMs) often exhibit positional bias in long-context settings, under-attending to information in the middle of inputs. We investigate the presence of this bias in long-form summarization, its impact on faithfulness, and various techniques to mitigate this bias.	David Wan; Jesse Vig; Mohit Bansal; Shafiq Joty;
188	Correcting Negative Bias in Large Language Models Through Negative Attention Score Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we observe that language models exhibit a negative bias in the binary decisions of complex reasoning tasks.	Sangwon Yu; Jongyoon Song; Bongkyu Hwang; Hoyoung Kang; Sooah Cho; Junhwa Choi; Seongho Joe; Taehee Lee; Youngjune Gwon; Sungroh Yoon;
189	Self-DC: When to Reason and When to Act? Self Divide-and-Conquer for Compositional Unknown Questions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we introduce a Self Divide-and-Conquer (Self-DC) framework, accompanying with the first Compositional unknown Question-Answering dataset (CuQA).	Hongru Wang; Boyang Xue; Baohang Zhou; Tianhua Zhang; Cunxiang Wang; Huimin Wang; Guanhua Chen; Kam-Fai Wong;
190	MoDS: Moderating A Mixture of Document Speakers to Summarize Debatable Queries in Document Collections Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Debatable QFS (DQFS), a task to create summaries that answer debatable queries via documents with opposing perspectives; summaries must comprehensively cover all sources and balance perspectives, favoring no side.	Nishant Balepur; Alexa Siu; Nedim Lipka; Franck Dernoncourt; Tong Sun; Jordan Lee Boyd-Graber; Puneet Mathur;
191	LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, despite these achievements, current evaluations are mostly limited to specific mathematical topics, and it remains unclear whether LLMs are genuinely engaging in reasoning. To address these gaps, we present the Mathematical Topics Tree (MaTT) benchmark, a challenging and structured benchmark that offers 1,958 questions across a wide array of mathematical subjects, each paired with a detailed hierarchical chain of topics.	Arash Gholami Davoodi; Seyed Pouyan Mousavi Davoudi; Pouya Pezeshkpour;
192	Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we adapt Mahalanobis Distance (MD) – a well-established UQ technique in classification tasks – for text generation and introduce a new supervised UQ method.	Artem Vazhentsev; Lyudmila Rvanova; Ivan Lazichny; Alexander Panchenko; Maxim Panov; Timothy Baldwin; Artem Shelmanov;
193	DIRAS: Efficient LLM Annotation of Document Relevance for Retrieval Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To address these concerns, RAG developers need to annotate information retrieval (IR) data for their domain of interest, which is challenging because (1) domain-specific queries usually need nuanced definitions of relevance beyond shallow semantic relevance; and (2) human or GPT-4 annotation is costly and cannot cover all (query, document) pairs (i. e. , annotation selection bias), thus harming the effectiveness in evaluating IR recall. To address these challenges, we propose DIRAS (Domain-specific Information Retrieval Annotation with Scalability), a manual-annotation-free schema that fine-tunes open-sourced LLMs to consider nuanced relevance definition and annotate (partial) relevance labels with calibrated relevance scores.	Jingwei Ni; Tobias Schimanski; Meihong Lin; Mrinmaya Sachan; Elliott Ash; Markus Leippold;
194	ToW: Thoughts of Words Improve Reasoning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce thoughts of words (ToW), a novel training-time data-augmentation method for next-word prediction.	Zhikun Xu; Ming Shen; Jacob Dineen; Zhaonan Li; Xiao Ye; Shijie Lu; Aswin Rrv; Chitta Baral; Ben Zhou;
195	Soft Prompting for Unlearning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Motivated by corresponding data protection guidelines, we investigate machine unlearning for LLMs.	Karuna Bhaila; Minh-Hao Van; Xintao Wu;
196	Mitigating Tail Narrowing in LLM Self-Improvement Via Socratic-Guided Sampling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we introduce Guided Self-Improvement (GSI), a strategy aimed at improving the efficiency of sampling challenging heavy-tailed data.	Yiwen Ding; Zhiheng Xi; Wei He; Lizhuoyuan Lizhuoyuan; Yitao Zhai; Shi Xiaowei; Xunliang Cai; Tao Gui; Qi Zhang; Xuanjing Huang;
197	GroundCocoa: A Benchmark for Evaluating Compositional & Conditional Reasoning in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we choose to study compositional and conditional reasoning, two aspects that are central to human cognition, and introduce GroundCocoa – a lexically diverse benchmark connecting these reasoning skills to the real-world problem of flight booking.	Harsh Kohli; Sachin Kumar; Huan Sun;
198	CAMIEval: Enhancing NLG Evaluation Through Multidimensional Comparative Instruction-Following Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these methods encounter the following challenges: (1) distinguishing instruction-following ability, (2) being applicable across diverse NLG tasks, and (3) identifying low-quality outputs. To address these issues, we propose CAMIEval, a multidimensional comparative evaluation method based on instruction-following.	Ziyue Fan; Junliang He; Li Xiaoqing; Shaohui Kuang; Kai Song; Yaqian Zhou; Xipeng Qiu;
199	CORD: Balancing COnsistency and Rank Distillation for Robust Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We thus propose CORD, balancing COnsistency and Rank Distillation: CORD adaptively samples noise-controlled perturbations from an interpolation space, ensuring both consistency and respect for the rank prior.	Youngwon Lee; Seung-won Hwang; Daniel F Campos; Filip Graliński; Zhewei Yao; Yuxiong He;
200	Analyzing Memorization in Large Language Models Through The Lens of Model Attribution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Existing research has mainly focused on post-hoc analyses—such as extracting memorized content or developing memorization metrics—without exploring the underlying architectural factors that contribute to memorization. In this work, we investigate memorization from an architectural lens by analyzing how attention modules at different layers impact its memorization and generalization performance.	Tarun Ram Menta; Susmit Agrawal; Chirag Agarwal;
201	NormAd: A Framework for Measuring The Cultural Adaptability of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce NormAd, an evaluation framework to assess LLMs’ cultural adaptability, specifically measuring their ability to judge social acceptability across varying levels of cultural norm specificity, from abstract values to explicit social norms.	Abhinav Sukumar Rao; Akhila Yerukola; Vishwa Shah; Katharina Reinecke; Maarten Sap;
202	Revealing The Barriers of Language Agents in Planning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we apply the feature attribution study and identify two key factors that hinder agent planning: the limited role of constraints and the diminishing influence of questions.	Jian Xie; Kexun Zhang; Jiangjie Chen; Siyu Yuan; Kai Zhang; Yikai Zhang; Lei Li; Yanghua Xiao;
203	EmojiPrompt: Generative Prompt Obfuscation for Privacy-Preserving Communication with Cloud-based LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Nevertheless, they also introduce privacy concerns: firstly, numerous studies underscore the risks to user privacy posed by jailbreaking cloud-based LLMs; secondly, the LLM service providers have access to all user data, which deters individuals from confidently utilizing such services. To address such concerns, we propose a simple yet effective paradigm, EmojiPrompt, to protect user privacy.	Sam Lin; Wenyue Hua; Zhenting Wang; Mingyu Jin; Lizhou Fan; Yongfeng Zhang;
204	Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We find that the measure using global grouping and Pearson correlation coefficient exhibits the best performance in both discriminative power and ranking consistency.	Mingqi Gao; Xinyu Hu; Li Lin; Xiaojun Wan;
205	REFFLY: Melody-Constrained Lyrics Editing Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces REFFLY (REvision Framework For LYrics), the first revision framework for editing and generating melody-aligned lyrics.	Songyan Zhao; Bingxuan Li; Yufei Tian; Nanyun Peng;
206	Transferable Post-training Via Inverse Value Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose modeling the changes at the logits level during post-training using a separate neural network (i. e. , the value network).	Xinyu Lu; Xueru Wen; Yaojie Lu; Bowen Yu; Hongyu Lin; Haiyang Yu; Le Sun; Xianpei Han; Yongbin Li;
207	Evaluating and Improving Graph to Text Generation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To further improve LLMs in planning with graph sequences and grounding in truth, we introduce a new graph-to-text dataset, PlanGTG, annotated with two sub-tasks: reordering and attribution.	Jie He; Yijun Yang; Wanqiu Long; Deyi Xiong; Victor Gutierrez Basulto; Jeff Z. Pan;
208	Multimodal Needle in A Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, a comprehensive evaluation of their long-context capabilities remains underexplored. To address these gaps, we introduce the MultiModal Needle-in-a-haystack (MMNeedle) benchmark, specifically designed to assess the long-context capabilities of MLLMs.	Hengyi Wang; Haizhou Shi; Shiwei Tan; Weiyi Qin; Wenyuan Wang; Tunyu Zhang; Akshay Nambi; Tanuja Ganu; Hao Wang;
209	RAG LLMs Are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We conduct a detailed comparative analysis of RAG and non-RAG frameworks with eleven LLMs.	Bang An; Shiyue Zhang; Mark Dredze;
210	Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Moreover, the lack of reference explanations means we cannot easily evaluate the reasoning of model decisions, a crucial component of supporting doctors in making complex medical decisions. To address these challenges, we construct two new datasets: JAMA Clinical Challenge and Medbullets.	Hanjie Chen; Zhouxiang Fang; Yash Singla; Mark Dredze;
211	Interpret and Control Dense Retrieval with Sparse Latent Features Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper introduces a novel approach using sparse autoencoders (SAE) to interpret and control dense embeddings via the learned latent sparse features.	Hao Kang; Tevin Wang; Chenyan Xiong;
212	Few-shot Personalization of LLMs with Mis-aligned Responses Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper proposes a new approach for a few-shot personalization of LLMs with their mis-aligned responses (Fermi).	Jaehyung Kim; Yiming Yang;
213	Evaluating Input Feature Explanations Through A Unified Diagnostic Evaluation Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these explanation types have only been studied in isolation, making it difficult to judge their respective applicability. To bridge this gap, we develop a unified framework that facilitates an automated and direct comparison between highlight and interactive explanations comprised of four diagnostic properties.	Jingyi Sun; Pepa Atanasova; Isabelle Augenstein;
214	NLI Under The Microscope: What Atomic Hypothesis Decomposition Reveals Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We use atomic decomposition of hypotheses in two natural language reasoning tasks, traditional NLI and defeasible NLI, to form atomic sub-problems, or granular inferences that models must weigh when solving the overall problem.	Neha Srikanth; Rachel Rudinger;
215	ALPACA AGAINST VICUNA: Using LLMs to Uncover Memorization of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we investigate the overlooked impact of instruction-tuning on memorization in large language models (LLMs), which has largely been studied in base, pre-trained models.	Aly M. Kassem; Omar Mahmoud; Niloofar Mireshghallah; Hyunwoo Kim; Yulia Tsvetkov; Yejin Choi; Sherif Saad; Santu Rana;
216	Generating Diverse Hypotheses for Inductive Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we 1) demonstrate that increasing the temperature to enhance the diversity is limited due to text degeneration issue, and 2) propose a novel method to improve the diversity while maintaining text quality.	Kang-il Lee; Hyukhun Koh; Dongryeol Lee; Seunghyun Yoon; Minsung Kim; Kyomin Jung;
217	Fact-Aware Multimodal Retrieval Augmentation for Accurate Medical Radiology Report Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce a fact-aware multimodal retrieval-augmented pipeline in generating accurate radiology reports (FactMM-RAG).	Liwen Sun; James Jialun Zhao; Wenjing Han; Chenyan Xiong;
218	Unlocking Decoding-time Controllability: Gradient-Free Multi-Objective Alignment with Contrastive Prompts Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Considering the limitation of previous approaches, we propose MCA, which constructs an expert prompt and an adversarial prompt for each objective to contrast at the decoding time and balances the objectives through combining the contrast.	Tingchen Fu; Yupeng Hou; Julian McAuley; Rui Yan;
219	LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present the first systematic evaluation examining format bias in performance of large language models (LLMs).	Do Xuan Long; Ngoc-Hai Nguyen; Tiviatis Sim; Hieu Dao; Shafiq Joty; Kenji Kawaguchi; Nancy F. Chen; Min-Yen Kan;
220	Measuring Memorization in Language Models Via Probabilistic Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce probabilistic discoverable extraction, which, without additional cost, relaxes discoverable extraction by considering multiple queries to quantify the probability of extracting a target sequence.	Jamie Hayes; Marika Swanberg; Harsh Chaudhari; Itay Yona; Ilia Shumailov; Milad Nasr; Christopher A. Choquette-Choo; Katherine Lee; A. Feder Cooper;
221	MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We investigate how iterative collaboration among multiple instances and types of large language models (LLMs) enhances subtasks in the refinement process, such as error detection, critiquing unfaithful sentences, and making corrections based on critiques.	David Wan; Justin Chen; Elias Stengel-Eskin; Mohit Bansal;
222	AdaCAD: Adaptively Decoding to Balance Conflicts Between Contextual and Parametric Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose a fine-grained, instance-level approach called AdaCAD, which dynamically infers the weight of adjustment based on the degree of conflict, as measured by the Jensen-Shannon divergence between distributions representing contextual and parametric knowledge.	Han Wang; Archiki Prasad; Elias Stengel-Eskin; Mohit Bansal;
223	Open-World Evaluation for Retrieving Diverse Perspectives Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We study retrieving a set of documents that covers various perspectives on a complex and contentious question (e. g. , will ChatGPT do more harm than good?)	Hung-Ting Chen; Eunsol Choi;
224	DenseSSM: State Space Models with Dense Hidden Connection for Efficient Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces DenseSSM, a novel approach to enhance the flow of hidden information between layers in SSMs.	Wei He; Kai Han; Yehui Tang; Chengcheng Wang; Yujie Yang; Tianyu Guo; Yunhe Wang;
225	Diversity Helps Jailbreak Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Abstract: We have uncovered a powerful jailbreak technique that leverages large language models’ ability to diverge from prior context, enabling them to bypass safety constraints and …	Weiliang Zhao; Daniel Ben-Levi; Wei Hao; Junfeng Yang; Chengzhi Mao;
226	FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Based on our findings, we propose a novel approach for synthetic data generation, CG2C, that leverages multi-hop reasoning on context graphs extracted from documents.	Deren Lei; Yaxi Li; Siyao Li; Mengya Hu; Rui Xu; Ken Archer; Mingyu Wang; Emily Ching; Alex Deng;
227	CROPE: Evaluating In-Context Adaptation of Vision and Language Models to Culture-Specific Concepts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In his paper, we introduce CROPE, a visual question answering benchmark designed to probe the knowledge of culture-specific concepts and evaluate the capacity for cultural adaptation through contextual information.	Malvina Nikandrou; Georgios Pantazopoulos; Nikolas Vitsakis; Ioannis Konstas; Alessandro Suglia;
228	Entangled Relations: Leveraging NLI and Meta-analysis to Enhance Biomedical Relation Extraction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Recent research efforts have explored the potential of leveraging natural language inference (NLI) techniques to enhance relation extraction (RE). In this vein, we introduce MetaEntail-RE, a novel adaptation method that harnesses NLI principles to enhance RE performance.	William P Hogan; Jingbo Shang;
229	Privacy Checklist: Privacy Violation Detection Grounding on Contextual Integrity Theory Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we argue that privacy is not solely about PII patterns.	Haoran Li; Wei Fan; Yulin Chen; Cheng Jiayang; Tianshu Chu; Xuebing Zhou; Peizhao Hu; Yangqiu Song;
230	Inference-Time Selective Debiasing to Enhance Fairness in Text Classification Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose selective debiasing – an inference-time safety mechanism designed to enhance the overall model quality in terms of prediction performance and fairness, especially in scenarios where retraining the model is impractical.	Gleb Kuzmin; Neemesh Yadav; Ivan Smirnov; Timothy Baldwin; Artem Shelmanov;
231	Mastering The Craft of Data Synthesis for CodeLLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Data synthesis and filtering techniques have been widely adopted and shown to be highly effective in this context. In this paper, we present a focused survey and taxonomy of these techniques, emphasizing recent advancements.	Meng Chen; Philip Arthur; Qianyu Feng; Cong Duy Vu Hoang; Yu-Heng Hong; Mahdi Kazemi Moghaddam; Omid Nezami; Duc Thien Nguyen; Gioacchino Tangari; Duy Vu; Thanh Vu; Mark Johnson; Krishnaram Kenthapadi; Don Dharmasiri; Long Duong; Yuan-Fang Li;
232	Scaling LLM Inference Efficiently with Optimized Sample Compute Allocation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our experiments show that with our learned mixed allocation, we can achieve accuracy better than the best single configuration with 128x less compute on code generation and 25x less compute on 4 reasoning tasks.	Kexun Zhang; Shang Zhou; Danqing Wang; William Yang Wang; Lei Li;
233	ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Our findings indicate that current benchmarks exhibit low IC; although the input context may be extensive, the actual usable context is often limited. To address this, we present ETHIC, a novel benchmark designed to assess LLMs’ ability to leverage the entire context.	Taewhoo Lee; Chanwoong Yoon; Kyochul Jang; Donghyeon Lee; Minju Song; Hyunjae Kim; Jaewoo Kang;
234	Reverse Modeling in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper investigates whether LLMs, like humans, struggle with reverse modeling, specifically with reversed text inputs.	Sicheng Yu; Xu Yuanchen; Cunxiao Du; Yanying Zhou; Minghui Qiu; Qianru Sun; Hao Zhang; Jiawei Wu;
235	Language Models Predict Empathy Gaps Between Social In-groups and Out-groups Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Studies of human psychology have demonstrated that people are more motivated to extend empathy to in-group members than out-group members (Cikara et al. , 2011). In this study, we investigate how this aspect of intergroup relations in humans is replicated by LLMs in an emotion intensity prediction task.	Yu Hou; Hal Daumé Iii; Rachel Rudinger;
236	AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective Intelligence Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: As the integration of large language models into daily life is on the rise, there is still a lack of dataset for advising on subjective and personal dilemmas. To address this gap, we introduce AdvisorQA, which aims to improve LLMs’ capability to offer advice for deeply subjective concerns, utilizing the LifeProTips Reddit forum.	Minbeom Kim; Hwanhee Lee; Joonsuk Park; Hwaran Lee; Kyomin Jung;
237	MILU: A Multi-task Indic Language Understanding Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Existing benchmarks predominantly focus on English, leaving substantial gaps in assessing LLM capabilities in these languages. We introduce MILU, a Multi-task Indic Language Understanding Benchmark, a comprehensive evaluation benchmark designed to address this gap.	Sshubam Verma; Mohammed Safi Ur Rahman Khan; Vishwajeet Kumar; Rudra Murthy; Jaydeep Sen;
238	VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose VisDoMRAG, a novel multimodal Retrieval Augmented Generation (RAG) approach that simultaneously utilizes visual and textual RAG, combining robust visual retrieval capabilities with sophisticated linguistic reasoning.	Manan Suri; Puneet Mathur; Franck Dernoncourt; Kanika Goswami; Ryan A. Rossi; Dinesh Manocha;
239	CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we present CFinBench: a meticulously crafted, the most comprehensive evaluation benchmark to date, for assessing the financial knowledge of LLMs under Chinese context.	Ying Nie; Binwei Yan; Tianyu Guo; Hao Liu; Haoyu Wang; Wei He; Binfan Zheng; Weihao Wang; Qiang Li; Weijian Sun; Yunhe Wang; Dacheng Tao;
240	Cross-Lingual Transfer Learning for Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: There has been increasing interest in building multilingual foundation models for NLP and speech research. This paper examines how to expand the speech translation capability of these models with restricted data.	Rao Ma; Mengjie Qian; Yassir Fathullah; Siyuan Tang; Mark Gales; Kate Knill;
241	LibEvolutionEval: A Benchmark and Study for Version-Specific Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these studies do not fully capture the complexity of real-world software development, which often requires the use of rapidly-evolving public libraries. To address this gap, we introduce LibEvolutionEval, a comprehensive study that emphasizes the need to understand library evolution to perform accurate in-line code completions.	Sachit Kuhar; Wasi Uddin Ahmad; Zijian Wang; Nihal Jain; Haifeng Qian; Baishakhi Ray; Murali Krishna Ramanathan; Xiaofei Ma; Anoop Deoras;
242	Evaluating Bias in LLMs for Job-Resume Matching: Gender, Race, and Education Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This study examines the performance and fairness of LLMs in job-resume matching tasks within the English language and U. S. context. It evaluates how factors such as gender, race, and educational background influence model decisions, providing critical insights into the fairness and reliability of LLMs in HR applications.	Hayate Iso; Pouya Pezeshkpour; Nikita Bhutani; Estevam Hruschka;
243	Fingerspelling Within Sign Language Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We find that 1) substantially improves understanding of fingerspelling (and translation quality overall), but the effect of 2) is mixed.	Garrett Tanzer;
244	FLEURS-ASL: Including American Sign Language in Massively Multilingual Multitask Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In order to help converge the fields, we introduce FLEURS-ASL, an extension of the multiway parallel benchmarks FLORES (for text) and FLEURS (for speech) to support their first sign language (as video), American Sign Language, translated by 5 Certified Deaf Interpreters.	Garrett Tanzer;
245	LLM Safety for Children Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: The study acknowledges the diverse nature of children, often overlooked by standard safety evaluations, and proposes a comprehensive approach to evaluating LLM safety specifically for children.	Prasanjit Rath; Hari Shrawgi; Parag Agrawal; Sandipan Dandapat;
246	Learning to Summarize from LLM-generated Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce FeedSum, a large-scale dataset containing multi-dimensional LLM feedback on summaries of varying quality across diverse domains.	Hwanjun Song; Taewon Yun; Yuho Lee; Jihwan Oh; Gihun Lee; Jason Cai; Hang Su;
247	Towards Rationality in Language and Multimodal Agents: A Survey Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This work discusses how to build more rational language and multimodal agents and what criteria define rationality in intelligent systems.	Bowen Jiang; Yangxinyu Xie; Xiaomeng Wang; Yuan Yuan; Zhuoqun Hao; Xinyi Bai; Weijie J Su; Camillo Jose Taylor; Tanwi Mallick;
248	Zero-Shot ATC Coding with Large Language Models for Clinical Assessments Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Manual assignment of Anatomical Therapeutic Chemical (ATC) codes to prescription records is a significant bottleneck in healthcare research and operations at Ontario Health and InterRAI Canada, requiring extensive expert time and effort. To automate this process while maintaining data privacy, we develop a practical approach using locally deployable large language models (LLMs).	Zijian Chen; John-Michael Gamble; Jimmy Lin;
249	From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We conduct a systematic and principled evaluation of multimodal ICL for models of different scales on a broad spectrum of new yet critical tasks.	Nan Xu; Fei Wang; Sheng Zhang; Hoifung Poon; Muhao Chen;
250	MatViX: Multimodal Information Extraction from Visually Rich Articles Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a novel evaluation method to assess the accuracy of curve similarity and the alignment of hierarchical structures.	Ghazal Khalighinejad; Sharon Scott; Ollie Liu; Kelly L. Anderson; Rickard Stureborg; Aman Tyagi; Bhuwan Dhingra;
251	Efficient Prompting for Continual Adaptation to Missing Modalities Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we formulate the dynamic missing modality problem as a continual learning task and introduce the continual multimodal missing modality task.	Zirun Guo; Shulei Wang; Wang Lin; Weicai Yan; Yangyang Wu; Tao Jin;
252	Effective Skill Unlearning Through Intervention and Abstention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save	Yongce Li; Chung-En Sun; Tsui-Wei Weng;
253	Automatic Input Rewriting Improves Translation with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present an empirical study of 21 input rewriting methods with 3 open-weight LLMs for translating from English into 6 target languages.	Dayeon Ki; Marine Carpuat;
254	LLM The Genius Paradox: A Linguistic and Math Expert’s Struggle with Simple Word-based Counting Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we carefully design multiple evaluation settings to investigate validity of prevalent conjectures.	Nan Xu; Xuezhe Ma;
255	Identifying Emerging Concepts in Large Corpora Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce a new method to identify emerging concepts in large text corpora.	Sibo Ma; Julian Nyarko;
256	CausalEval: Towards Better Causal Reasoning in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We aim for this work to serve as a comprehensive resource, fostering further advancements in causal reasoning with LMs.	Longxuan Yu; Delin Chen; Siheng Xiong; Qingyang Wu; Dawei Li; Zhikai Chen; Xiaoze Liu; Liangming Pan;
257	Getting More Juice Out of Your Data: Hard Pair Refinement Enhances Visual-Language Models Without Extra Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce HELIP, a cost-effective strategy that improves CLIP models by exploiting challenging text-image pairs within existing datasets in continuous training.	Haonan Wang; Minbin Huang; Runhui Huang; Lanqing Hong; Hang Xu; Tianyang Hu; Xiaodan Liang; Zhenguo Li; Hong Cheng; Kenji Kawaguchi;
258	Audio Is The Achilles’ Heel: Red Teaming Audio Large Multimodal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we comprehensively red team the safety of five advanced audio LMMs under three settings: (i) harmful questions in both audio and text formats, (ii) harmful questions in text format accompanied by distracting non-speech audio, and (iii) speech-specific jailbreaks.	Hao Yang; Lizhen Qu; Ehsan Shareghi; Gholamreza Haffari;
259	Logic-of-Thought: Injecting Logic Into Contexts for Full Reasoning in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To this end, we propose Logic-of-Thought (LoT) prompting which employs propositional logic to generate expanded logical information descriptions and utilizes them as an additional augmentation to original contexts, thereby ensuring information completeness and enhancing logical reasoning ability.	Tongxuan Liu; Wenjiang Xu; Weizhe Huang; Yuting Zeng; Jiaxing Wang; Xingyu Wang; Hailong Yang; Jing Li;
260	Cascading Large Language Models for Salient Event Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper presents CALLMSAE, a CAscading Large Language Model framework for SAlient Event graph generation, which leverages the capabilities of LLMs and eliminates the need for costly human annotations.	Xingwei Tan; Yuxiang Zhou; Gabriele Pergola; Yulan He;
261	VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save	Zejun Li; Ruipu Luo; Jiwen Zhang; Minghui Qiu; Xuanjing Huang; Zhongyu Wei;
262	Multilingual Needle in A Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: As such, a systematic evaluation of the long-context capabilities of LLMs in multilingual settings is crucial, specifically in the context of information retrieval. To address this gap, we introduce the MultiLingual Needle-in-a-Haystack (MLNeedle) test, designed to assess a model’s ability to retrieve relevant information (the needle) from a collection of multilingual distractor texts (the haystack).	Amey Hengle; Prasoon Bajpai; Soham Dan; Tanmoy Chakraborty;
263	CSEval: Towards Automated, Multi-Dimensional, and Reference-Free Counterspeech Evaluation Using Auto-Calibrated LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This has led to an increased dependency on labor-intensive human evaluations to assess automated counter-speech generation methods. To address these challenges, we introduce ‘CSEval‘, a novel dataset and framework for evaluating counterspeech quality across four dimensions: contextual-relevance, aggressiveness, argument-coherence, and suitableness.	Amey Hengle; Aswini Kumar Padhi; Anil Bandhakavi; Tanmoy Chakraborty;
264	MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Current benchmarks primarily focus on single-chart tasks, neglecting the multi-hop reasoning required to extract and integrate information from multiple charts, which is essential in practical applications. To fill this gap, we introduce MultiChartQA, a benchmark that evaluates MLLMs’ capabilities in four key areas: direct question answering, parallel question answering, comparative reasoning, and sequential reasoning.	Zifeng Zhu; Mengzhao Jia; Zhihan Zhang; Lang Li; Meng Jiang;
265	Are LLM-Judges Robust to Expressions of Uncertainty? Investigating The Effect of Epistemic Markers on LLM-based Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, evaluation in the presence of epistemic markers has been largely overlooked, raising a critical question: Could the use of epistemic markers in LLM-generated outputs lead to unintended negative consequences? To address this, we present EMBER, a benchmark designed to assess the robustness of LLM-judges to epistemic markers in both single and pairwise evaluation settings.	Dongryeol Lee; Yerin Hwang; Yongil Kim; Joonsuk Park; Kyomin Jung;
266	Verifiable By Design: Aligning Language Models to Quote from Pre-Training Data Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: _We propose Quote-Tuning, which demonstrates the feasibility of aligning models to quote.	Jingyu Zhang; Marc Marone; Tianjian Li; Benjamin Van Durme; Daniel Khashabi;
267	One Fish, Two Fish, But Not The Whole Sea: Alignment Reduces Language Models’ Conceptual Diversity Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Inspired by human studies, we use a new way of measuring the conceptual diversity of synthetically-generated LLM “populations” by relating the internal variability of simulated individuals to the population-level variability.	Sonia Krishna Murthy; Tomer Ullman; Jennifer Hu;
268	PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce PAT (Parameter-free Audio-Text aligner), a simple and training-free method aimed at boosting zero-shot audio classification performance of CLAP-like ALMs.	Ashish Seth; Ramaneswaran Selvakumar; Sonal Kumar; Sreyan Ghosh; Dinesh Manocha;
269	Benchmarking Language Model Creativity: A Case Study on Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we introduce a framework for quantifying LLM creativity that incorporates the two design ingredients: (1) We introduce DENIAL PROMPTING which pushes LLMs to develop more creative solutions to a given problem by incrementally imposing new constraints on the previous solution, compelling LLMs to adopt new strategies.	Yining Lu; Dixuan Wang; Tianjian Li; Dongwei Jiang; Sanjeev Khudanpur; Meng Jiang; Daniel Khashabi;
270	Multilingual Reasoning Via Self-training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To improve LLMs’ multilingual reasoning abilities, we propose a modular approach that instructs the models to structure reasoning passages in a different problem space and then self-refine their capabilities to deliver step-wise reasoning passages that lead to the solution.	Leonardo Ranaldi; Giulia Pucci;
271	Guiding Medical Vision-Language Models with Diverse Visual Prompts: Framework Design and Comprehensive Exploration of Prompt Variations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Rather, they typically rely on large volumes of high-quality image-text paired data to learn and generate posterior attention maps. To address this critical issue, we propose leveraging visual prompts—simple visual markers in various forms—to guide and enhance the formation of region-specific attention.	Kangyu Zhu; Ziyuan Qin; Huahui Yi; Zekun Jiang; Qicheng Lao; Shaoting Zhang; Kang Li;
272	Characterizing The Role of Similarity in The Property Inferences of Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we investigate how LMs perform property inheritance with behavioral and causal representational analysis experiments.	Juan Diego Rodriguez; Aaron Mueller; Kanishka Misra;
273	Sparser Mixture-of-Adapters with Cross-Layer Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Save	Ziyue Li; Tianyi Zhou;
274	PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation Metrics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a prompt optimization approach that uses a smaller, fine-tuned language model to compress input data for evaluation prompt, thus reducing token usage and computational cost when using larger LLMs for downstream evaluation.	Daniil Larionov; Steffen Eger;
275	ALiiCE: Evaluating Positional Fine-grained Citation Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, existing research on citation generation is predominantly limited to sentence-level statements, neglecting the significance of positional fine-grained citations that can appear anywhere within sentences. To facilitate further exploration of the positional fine-grained citation generation, we propose ALiiCE, the first automatic evaluation framework for this task.	Yilong Xu; Jinhua Gao; Xiaoming Yu; Baolong Bi; Huawei Shen; Xueqi Cheng;
276	ToolFlow: Boosting LLM Tool-Calling Through Natural and Coherent Dialogue Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Additionally, current work overlooks the coherence between turns of dialogues, leading to a gap between the synthesized data and real-world scenarios. To address these issues, we propose a Graph-based Sampling strategy to sample more relevant tool combinations, and a Planned-generation strategy to create plans that guide the synthesis of coherent dialogues.	Zezhong Wang; Xingshan Zeng; Weiwen Liu; Liangyou Li; Yasheng Wang; Lifeng Shang; Xin Jiang; Qun Liu; Kam-Fai Wong;
277	SynthDetoxM: Modern LLMs Are Few-Shot Parallel Detoxification Data Annotators Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we introduce a pipeline for the generation of multilingual parallel detoxification data.	Daniil Moskovskiy; Nikita Sushko; Sergey Pletenev; Elena Tutubalina; Alexander Panchenko;
278	RuleR: Improving LLM Controllability By Rule-based Data Recycling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, curating supervised fine-tuning (SFT) datasets to improve LLM controllability usually relies on human experts or proprietary LLMs, which requires additional costs. To bridge this gap, we propose Rule-based Data Recycling (RuleR), a data augmentation method incorporating multiple constraints into the original data samples according to predefined rules, which creates new training tasks to consolidate the controllability of LLMs.	Ming Li; Han Chen; Chenguang Wang; Dang Nguyen; Dianqi Li; Tianyi Zhou;
279	Enhancing Multimodal Entity Linking with Jaccard Distance-based Conditional Contrastive Learning and Contextual Visual Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose JD-CCL (Jaccard Distance-based Conditional Contrastive Learning), a novel approach designed to enhance the ability to match multimodal entity linking models.	Cong-Duy T Nguyen; Xiaobao Wu; Thong Thanh Nguyen; Shuai Zhao; Khoi M. Le; Nguyen Viet Anh; Feng Yichao; Anh Tuan Luu;
280	MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we focus on the downstream tasks of medical calculators, which use standardized tests to assess an individual’s health status.	Yakun Zhu; Shaohang Wei; Xu Wang; Kui Xue; Shaoting Zhang; Xiaofan Zhang;
281	Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we are the first to specialize LLMs for the task of simulating survey response distributions.	Yong Cao; Haijiang Liu; Arnav Arora; Isabelle Augenstein; Paul Röttger; Daniel Hershcovich;
282	ACCESS : A Benchmark for Abstract Causal Event Discovery and Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save	Vy Vo; Lizhen Qu; Tao Feng; Yuncheng Hua; Xiaoxi Kang; Songhai Fan; Tim Dwyer; Lay-Ki Soon; Gholamreza Haffari;
283	Intrinsic Bias Is Predicted By Pretraining Data and Correlates with Downstream Performance in Vision-Language Encoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we present the largest comprehensive analysis to-date of how the upstream pre-training factors and downstream performance of CLIP models relate to their intrinsic biases.	Kshitish Ghate; Isaac Slaughter; Kyra Wilson; Mona T. Diab; Aylin Caliskan;
284	A Bayesian Optimization Approach to Machine Translation Reranking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This algorithm scores candidates iteratively, choosing next candidates by balancing between exploration, choosing to score those that differ from candidates already scored, and exploitation, choosing to score those that resemble high-scoring candidates. This procedure finds high-scoring candidates while scoring only a fraction of the candidates list; given candidate lists of 200 random samples (before deduplication), our method achieves the same CometKiwi score using only 70 scoring evaluations on average compared to scoring a random subset of 180 candidates.	Julius Cheng; Maike Züfle; Vilém Zouhar; Andreas Vlachos;
285	Eliciting Critical Reasoning in Retrieval-Augmented Generation Via Contrastive Explanations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we investigate how to elicit critical arguments in RAG via contrastive explanations.	Leonardo Ranaldi; Marco Valentino; Andre Freitas;
286	Towards Operationalizing Right to Data Protection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we introduce RegText, a framework that injects imperceptible spurious correlations into natural language datasets, effectively rendering them unlearnable without affecting semantic content.	Abhinav Java; Simra Shahid; Chirag Agarwal;
287	Uplifting Lower-Income Data: Strategies for Socioeconomic Perspective Shifts in Large Multi-modal Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To improve LMM model performance on underrepresented data, we propose and evaluate several prompting strategies using non-English, geographic, and socioeconomic attributes.	Joan Nwatu; Oana Ignat; Rada Mihalcea;
288	EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose a novel method that can resolve the issue of inconsistent tokens accepted by different samples without necessitating an increase in memory or computing overhead.	Yunsheng Ni; Chuanjian Liu; Yehui Tang; Kai Han; Yunhe Wang;
289	Entity Decomposition with Filtering: A Zero-Shot Clinical Named Entity Recognition Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce a novel framework, entity decomposition with filtering, or EDF.	Reza Averly; Xia Ning;
290	Pretrained Image-Text Models Are Secretly Video Captioners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Our adapted model demonstrates top-tier performance on major benchmarks, ranking 2nd on MSR-VTT and MSVD, and 3rd on VATEX.	Chunhui Zhang; Yiren Jian; Zhongyu Ouyang; Soroush Vosoughi;
291	Regularized Best-of-N Sampling with Minimum Bayes Risk Objective for Language Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this research, we propose MBR-BoN, a variant of BoN that aims to mitigate reward hacking at inference time by incorporating the Minimum Bayes Risk (MBR) objective as a proximity regularization term.	Yuu Jinnai; Tetsuro Morimura; Kaito Ariu; Kenshi Abe;
292	A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To systematically investigate different techniques of cross-layer KV sharing, we propose a unified framework that covers several recent methods and their novel variants.	You Wu; Haoyi Wu; Kewei Tu;
293	Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To reduce the potential redundancies of datasets, we make the first attempt and propose a novel dynamic data mixture for MoE instruction tuning.	Tong Zhu; Daize Dong; Xiaoye Qu; Jiacheng Ruan; Wenliang Chen; Yu Cheng;
294	AudioBench: A Universal Benchmark for Audio Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce AudioBench, a universal benchmark designed to evaluate Audio Large Language Models (AudioLLMs).	Bin Wang; Xunlong Zou; Geyu Lin; Shuo Sun; Zhuohan Liu; Wenyu Zhang; Zhengyuan Liu; AiTi Aw; Nancy F. Chen;
295	Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce the Language Model Council (LMC), where a group of LLMs collaborate to create tests, respond to them, and evaluate each other’s responses to produce a ranking in a democratic fashion.	Justin Zhao; Flor Miriam Plaza-del-Arco; Amanda Cercas Curry;
296	Towards Quantifying Commonsense Reasoning with Mechanistic Insights Related Papers Related Patents Related Grants Related Venues Related Experts View Save	Abhinav Joshi; Areeb Ahmad; Divyaksh Shukla; Ashutosh Modi;
297	Is It Navajo? Accurate Language Detection for Endangered Athabaskan Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This study evaluates Google’s Language Identification (LangID) tool, which does not currently support any Native American languages. To address this, we introduce a random forest classifier trained on Navajo and twenty erroneously suggested languages by LangID.	Ivory Yang; Weicheng Ma; Chunhui Zhang; Soroush Vosoughi;
298	Towards Lifelong Dialogue Agents Via Timeline-based Memory Management Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present THEANINE, a framework for LLM-based lifelong dialogue agents.	Kai Tzu-iunn Ong; Namyoung Kim; Minju Gwak; Hyungjoo Chae; Taeyoon Kwon; Yohan Jo; Seung-won Hwang; Dongha Lee; Jinyoung Yeo;
299	Adaptive Prompting: Ad-hoc Prompt Composition for Social Bias Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Most existing approaches to automatic prompting aim to optimize individual techniques instead of compositions of techniques and their dependence on the input. To fill this gap, we propose an adaptive prompting approach that predicts the optimal prompt composition ad-hoc for a given input.	Maximilian Spliethöver; Tim Knebler; Fabian Fumagalli; Maximilian Muschalik; Barbara Hammer; Eyke Hüllermeier; Henning Wachsmuth;
300	IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce IFIR, the first comprehensive benchmark designed to evaluate instruction-following information retrieval (IR) in expert domains.	Tingyu Song; Guo Gan; Mingsheng Shang; Yilun Zhao;
301	A Fair Comparison Without Translationese: English Vs. Target-language Instructions for Multilingual LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Prior studies suggested that English instructions are more effective than target-language instructions even for non-English tasks; however, these studies often use datasets and instructions translated from English, which introduce biases known as translationese, hindering an unbiased comparison. To address this issue, we conduct a fair comparison between English and target-language instructions by eliminating translationese effects.	Taisei Enomoto; Hwichan Kim; Zhousi Chen; Mamoru Komachi;
302	Hazards in Daily Life? Enabling Robots to Proactively Detect and Resolve Anomalies Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Specifically, we introduce a multi-agent brainstorming approach, where agents collaborate and generate diverse scenarios covering household hazards, hygiene management, and child safety.	Zirui Song; Guangxian Ouyang; Meng Fang; Hongbin Na; Zijing Shi; Zhenhao Chen; Fu Yujie; Zeyu Zhang; Shiyu Jiang; Miao Fang; Ling Chen; Xiuying Chen;
303	AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: These limitations are mainly due to the lack of high-quality data in the local languages and the failure to include local communities in the collection, annotation, and moderation processes. To address this issue, we present AfriHate: a multilingual collection of hate speech and abusive language datasets in 15 African languages.	Shamsuddeen Hassan Muhammad; Idris Abdulmumin; Abinew Ali Ayele; David Ifeoluwa Adelani; Ibrahim Said Ahmad; Saminu Mohammad Aliyu; Paul Röttger; Abigail Oppong; Andiswa Bukula; Chiamaka Ijeoma Chukwuneke; Ebrahim Chekol Jibril; Elyas Abdi Ismail; Esubalew Alemneh; Hagos Tesfahun Gebremichael; Lukman Jibril Aliyu; Meriem Beloucif; Oumaima Hourrane; Rooweither Mabuya; Salomey Osei; Samuel Rutunda; Tadesse Destaw Belay; Tadesse Kebede Guge; Tesfa Tegegne Asfaw; Lilian Diana Awuor Wanzare; Nelson Odhiambo Onyango; Seid Muhie Yimam; Nedjma Ousidhoum;
304	LRQ: Optimizing Post-Training Quantization for Large Language Models By Learning Low-Rank Weight-Scaling Matrices Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, existing post-training quantization (PTQ) techniques for quantizing weights and activations of LLMs still suffer from non-negligible accuracy drops, especially on massive multitask language understanding. To address this issue, we propose Low-Rank Quantization (LRQ) – a simple yet effective post-training weight quantization method for LLMs that reconstructs the outputs of an intermediate Transformer block by leveraging low-rank weight-scaling matrices, replacing the conventional full weight-scaling matrices that entail as many learnable scales as their associated weights.	Jung Hyun Lee; Jeonghoon Kim; June Yong Yang; Se Jung Kwon; Eunho Yang; Kang Min Yoo; Dongsoo Lee;
305	Markov Chain of Thought for Efficient Mathematical Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save	Wen Yang; Minpeng Liao; Kai Fan;
306	Upsample or Upweight? Balanced Training on Heavily Imbalanced Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Specifically, Temperature Sampling exhibits lower variance in gradient estimation, leading to faster convergence but a higher risk of overfitting. Based on these insights, we propose Cooldown, a strategy that starts by heavily upsampling low-resource languages to accelerate convergence and gradually reduces the upsampling to prevent overfitting—achieving the best of both worlds.	Tianjian Li; Haoran Xu; Weiting Tan; Kenton Murray; Daniel Khashabi;
307	TurkingBench: A Challenge Benchmark for Web Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Such tasks are often found on crowdsourcing platforms, where crowdworkers engage in challenging micro-tasks within web-based environments. Building on this idea, we present TurkingBench, a benchmark consisting of tasks presented as web pages with textual instructions and multi-modal contexts.	Kevin Xu; Yeganeh Kordi; Tanay Nayak; Adi Asija; Yizhong Wang; Kate Sanders; Adam Byerly; Jingyu Zhang; Benjamin Van Durme; Daniel Khashabi;
308	ITALIC: An Italian Culture-Aware Natural Language Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present ITALIC, a large-scale benchmark dataset of 10,000 multiple-choice questions designed to evaluate the natural language understanding of the Italian language and culture.	Andrea Seveso; Daniele Potertì; Edoardo Federici; Mario Mezzanzanica; Fabio Mercorio;
309	DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we begin with a detailed analysis aimed at disentangling risks through step-by-step reasoning within multimodal inputs.	Jianyu Liu; Hangyu Guo; Ranjie Duan; Xingyuan Bu; Yancheng He; Shilong Li; Hui Huang; Jiaheng Liu; Yucheng Wang; Chenchen Jing; Xingwei Qu; Xiao Zhang; Pei Wang; Yanan Wu; Jihao Gu; Yangguang Li; Jianke Zhu;
310	SSMLoRA: Enhancing Low-Rank Adaptation with State Space Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: To this end, we propose SSMLoRA (State Space Model Low-Rank Adaptation), an extension of LoRA that incorporates a State Space Model (SSM) to interconnect low-rank matrices.	Jiayang Yu; Yihang Zhang; Bin Wang; Peiqin Lin; YongKang Liu; Shi Feng;
311	Communication Makes Perfect: Persuasion Dataset Construction Via Multi-LLM Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents a multi-LLM communication framework designed to enhance the generation of persuasive data automatically.	Weicheng Ma; Hefan Zhang; Ivory Yang; Shiyu Ji; Joice Chen; Farnoosh Hashemi; Shubham Mohole; Ethan Gearey; Michael Macy; Saeed Hassanpour; Soroush Vosoughi;
312	STAR: Spectral Truncation and Rescale for Model Merging Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose Spectral Truncation And Rescale (STAR) that aims at mitigating “merging conflicts” by truncating small components in the respective spectral spaces, which is followed by an automatic parameter rescaling scheme to retain the nuclear norm of the original matrix.	Yu-Ang Lee; Ching-Yun Ko; Tejaswini Pedapati; I-Hsin Chung; Mi-Yen Yeh; Pin-Yu Chen;
313	LLM-Based Explicit Models of Opponents for Multi-Agent Games Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce Explicit Models of Opponents (EMO) based on Large Language Models (LLMs), enabling agents to better predict and adapt to diverse, dynamic multi-agent interactions.	XiaoPeng Yu; Wanpeng Zhang; Zongqing Lu;
314	Language Models “Grok” to Copy Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose a novel perspective that Transformer-based language models develop copying abilities similarly to grokking, which refers to sudden generalization on test set long after the model fit to the training set.	Ang Lv; Ruobing Xie; Xingwu Sun; Zhanhui Kang; Rui Yan;
315	FaithBench: A Diverse Hallucination Benchmark for Summarization By Modern LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This paper introduces FaithBench, a summarization hallucination benchmark comprising challenging hallucinations made by 10 modern LLMs from 8 different families, with ground truth annotations by human experts.	Forrest Sheng Bao; Miaoran Li; Renyi Qu; Ge Luo; Erana Wan; Yujia Tang; Weisi Fan; Manveer Singh Tamber; Suleman Kazi; Vivek Sourabh; Mike Qi; Ruixuan Tu; Chenyu Xu; Matthew Gonzales; Ofer Mendelevitch; Amin Ahmad;
316	Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples Within Packs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The mainstream approaches in SFT ensure that each token in the attention calculation phase only focuses on tokens within its own short sequence, without providing additional learning signals for the preceding context. To address these challenges, we introduce Threshold Filtering Packing (TFP), a method that selects samples with related context while maintaining sufficient diversity within the same pack.	Jiancheng Dong; Lei Jiang; Wei Jin; Lu Cheng;
317	BPO: Towards Balanced Preference Optimization Between Knowledge Breadth and Depth in Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: BPO is motivated by the observation that the usefulness of knowledge varies across samples, necessitating tailored learning of knowledge depth. To achieve this, we introduce gradient-based clustering, estimating the knowledge informativeness and usefulness of each augmented sample based on the model’s optimization direction.	Sizhe Wang; Yongqi Tong; Hengyuan Zhang; Dawei Li; Xin Zhang; Tianlong Chen;
318	Patent-CR: A Dataset for Patent Claim Revision Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents Patent-CR, the first dataset created for the patent claim revision task in English.	Lekang Jiang; Pascal A. Scherz; Stefan Goetz;
319	SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, adapting general-purpose RAG systems to specialized fields such as science and medicine poses unique challenges due to distribution shifts and limited access to domain-specific data. To tackle this, we propose SimRAG, a self-training approach that equips LLMs with joint capabilities of question answering and question generation for domain adaptation.	Ran Xu; Hui Liu; Sreyashi Nag; Zhenwei Dai; Yaochen Xie; Xianfeng Tang; Chen Luo; Yang Li; Joyce C. Ho; Carl Yang; Qi He;
320	NAT: Enhancing Agent Tuning with Negative Samples Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: For instance, existing SFT approaches typically utilize only positive examples, limiting their efficiency in low-resource scenarios. To address this, we introduce Negative-Aware Training (NAT), a straightforward yet effective method that leverages both successful and failed trajectories for fine-tuning, maximizing the utility of limited resources.	Renxi Wang; Xudong Han; Yixuan Zhang; Timothy Baldwin; Haonan Li;
321	Balancing Forget Quality and Model Utility: A Reverse KL-Divergence Knowledge Distillation Approach for Better Unlearning in LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Existing methods based on gradient ascent and its variants often struggle with balancing forget quality and model utility, leading to either over unlearning or partial unlearning. To address this challenge, we propose Reverse KL-Divergence based Knowledge Distillation for Unlearning (RKLU), a novel unlearning method for LLMs.	Bichen Wang; Yuzhe Zi; Yixin Sun; Yanyan Zhao; Bing Qin;
322	QAVA: Query-Agnostic Visual Attack to Large Vision-Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, it is common for a single image to be associated with multiple questions, and LVLMs may still answer other questions correctly even for an adversarial image attacked by a specific question. To address this, we introduce the query-agnostic visual attack (QAVA), which aims to create robust adversarial examples that generate incorrect responses to unspecified and unknown questions.	Yudong Zhang; Ruobing Xie; Jiansheng Chen; Xingwu Sun; Zhanhui Kang; Yu Wang;
323	Active Few-Shot Learning for Text Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This problem arises due to the heavy reliance on a limited number of support samples, which hampers consistent performance improvement even when more support samples are added. To address this challenge, we propose an active learning-based instance selection mechanism that identifies effective support instances from the unlabeled pool and can work with different LLMs.	Saeed Ahmadnia; Arash Yousefi Jordehi; Mahsa Hosseini Khasheh Heyran; Seyed Abolghasem Mirroshandel; Owen Rambow; Cornelia Caragea;
324	Beyond The Safety Bundle: Auditing The Helpful and Harmless Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite the widespread adoption of LHF in practice, the quality of this feedback and its effectiveness as a safety mitigation technique remain unclear. This study addresses these issues by auditing the widely-used Helpful and Harmless (HH) dataset by Anthropic.	Khaoula Chehbouni; Jonathan Colaço Carr; Yash More; Jackie CK Cheung; Golnoosh Farnadi;
325	A Picture Is Worth A Thousand Numbers: Enabling LLMs Reason About Time Series Via Visualization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, wepropose TimerBed, the first comprehensivetestbed for evaluating LLMs’ TsR performance.	Haoxin Liu; Chenghao Liu; B. Aditya Prakash;
326	How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Recent research has focused on literary machine translation (MT) as a new challenge in MT. However, the evaluation of literary MT remains an open problem. We contribute to this ongoing discussion by introducing LITEVAL-CORPUS, a paragraph-level parallel corpus containing verified human translations and outputs from 9 MT systems, which totals over 2k translations and 13k evaluated sentences across four language pairs, costing 4.	Ran Zhang; Wei Zhao; Steffen Eger;
327	Improving Model Evaluation Using SMART Filtering of Benchmark Datasets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Some of the most pressing issues pertain to benchmark saturation, data contamination, and diversity in the quality of test examples. To address these concerns, we propose Selection Methodology for Accurate, Reduced, and Targeted (SMART) filtering, a novel approach to select a high-quality subset of examples from existing benchmark datasets by systematically removing less informative and lower quality examples.	Vipul Gupta; Candace Ross; David Pantoja; Rebecca J. Passonneau; Megan Ung; Adina Williams;
328	Large Language Models Are Cross-Lingual Knowledge-Free Reasoners Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, the relationship between capabilities in different languages is less explored. In this work, we decompose the process of reasoning tasks into two separated components: knowledge retrieval and knowledge-free reasoning, and analyze the relationship between cross-lingual transferability and these two components.	Peng Hu; Sizhe Liu; Changjiang Gao; Xin Huang; Xue Han; Junlan Feng; Chao Deng; Shujian Huang;
329	Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models Through Continual Pre-Training Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Hephaestus-Forge, the first large-scale pre-training corpus designed to enhance the fundamental capabilities of LLM agents in API function calling, intrinsic reasoning and planning, and adapting to environmental feedback.	Yuchen Zhuang; Jingfeng Yang; Haoming Jiang; Xin Liu; Kewei Cheng; Sanket Lokegaonkar; Yifan Gao; Qing Ping; Tianyi Liu; Binxuan Huang; Zheng Li; Zhengyang Wang; Pei Chen; Ruijie Wang; Rongzhi Zhang; Nasser Zalmout; Priyanka Nigam; Bing Yin; Chao Zhang;
330	FinLLM-B: When Large Language Models Meet Financial Breakout Trading Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: The reason is that the unique data and specific knowledge are required in breakout detection. To address these issues, we created the first financial breakout dataset and introduce FinLLM-B, the premier large language model for financial breakout detection, which enhances the effectiveness of breakout trading strategies.	Kang Zhang; Osamu Yoshie; Lichao Sun; Weiran Huang;
331	No Simple Answer to Data Complexity: An Examination of Instance-Level Complexity Metrics for Classification Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, there exists a diverse taxonomy of complexity metrics that can be used for a classification task, making metric selection itself a difficult task. We empirically examine the relationship between these metrics and find that simply storing training loss provides similar complexity rankings as other more computationally intensive techniques.	Ryan A. Cook; John P. Lalor; Ahmed Abbasi;
332	Learning Vs Retrieval: The Role of In-Context Examples in Regression with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a framework for evaluating in-context learning mechanisms, which we claim are a combination of retrieving internal knowledge and learning from in-context examples by focusing on regression tasks.	Aliakbar Nafar; K. Brent Venable; Parisa Kordjamshidi;
333	Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In response, we present Language model Ensemble with Monte Carlo Tree Search (LE-MCTS), a novel framework for process-level ensembling of language models.	Sungjin Park; Xiao Liu; Yeyun Gong; Edward Choi;
334	MGM: Global Understanding of Audience Overlap Graphs for Predicting The Factuality and The Bias of News Media Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we study the classification problem of profiling news media from the lens of political bias and factuality.	Muhammad Arslan Manzoor; Ruihong Zeng; Dilshod Azizov; Preslav Nakov; Shangsong Liang;
335	ProSE: Diffusion Priors for Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Additionally, for a wide variety of applications, SE systems need to be employed in real-time, and traditional diffusion models (DMs) requiring many iterations of a large model during inference are inefficient. To address these issues, we propose ProSE (diffusion-based Priors for SE), a novel methodology based on an alternative framework for applying diffusion models to SE.	Sonal Kumar; Sreyan Ghosh; Utkarsh Tyagi; Anton Jeran Ratnarajah; Chandra Kiran Reddy Evuru; Ramani Duraiswami; Dinesh Manocha;
336	Do Audio-Language Models Understand Linguistic Variations? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, for the first time, we perform controlled experiments on various benchmarks to show that existing ALMs struggle to generalize to linguistic variations in textual queries. To address this issue, we propose RobustCLAP, a novel and compute-efficient technique to learn audio-language representations agnostic to linguistic variations.	Ramaneswaran Selvakumar; Sonal Kumar; Hemant Kumar Giri; Nishit Anand; Ashish Seth; Sreyan Ghosh; Dinesh Manocha;
337	IHEval: Evaluating Language Models on Following The Instruction Hierarchy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Despite its importance, this topic receives limited attention, and there is a lack of comprehensive benchmarks for evaluating models’ ability to follow the instruction hierarchy. We bridge this gap by introducing IHEval, a novel benchmark comprising 3,538 examples across nine tasks, covering cases where instructions in different priorities either align or conflict.	Zhihan Zhang; Shiyang Li; Zixuan Zhang; Xin Liu; Haoming Jiang; Xianfeng Tang; Yifan Gao; Zheng Li; Haodong Wang; Zhaoxuan Tan; Yichuan Li; Qingyu Yin; Bing Yin; Meng Jiang;
338	Does Mapo Tofu Contain Coffee? Probing LLMs for Food-related Cultural Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Recent studies have highlighted the presence of cultural biases in Large Language Models (LLMs), yet often lack a robust methodology to dissect these phenomena comprehensively. Our work aims to bridge this gap by delving into the Food domain—a universally relevant yet culturally diverse aspect of human life.	Li Zhou; Taelin Karidi; Wanlong Liu; Nicolas Garneau; Yong Cao; Wenyu Chen; Haizhou Li; Daniel Hershcovich;
339	DSRAG: A Double-Stream Retrieval-Augmented Generation Framework for Countless Intent Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the single retrieval route sometimes fails to recall target intents and causes incorrect results. To alleviate the above challenges, we introduce the DSRAG framework combining query-to-query (Q2Q) and query-to-metadata (Q2M) double-stream RAG approaches.	Pei Guo; Enjie Liu; Ruichao Zhong; Mochi Gao; Yunzhi Tan; Bo Hu; Zang Li;
340	Exploring The Potential of Large Language Models for Heterophilic Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we explore the potential of LLMs for modeling heterophilic graphs and propose a novel two-stage framework: LLM-enhanced edge discriminator and LLM-guided edge reweighting.	Yuxia Wu; Shujie Li; Yuan Fang; Chuan Shi;
341	SCORE: Systematic COnsistency and Robustness Evaluation for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: While some studies discuss issues with LLM robustness, there is no unified or centralized framework for evaluating the robustness of language models. To address this gap and consolidate existing research on model robustness, we present SCORE (Systematic COnsistency and Robustness Evaluation), a comprehensive framework for non-adversarial evaluation of LLMs.	Grigor Nalbandyan; Rima Shahbazyan; Evelina Bakhturina;
342	Concept Distillation from Strong to Weak Models Via Hypotheses-to-Theories Prompting Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Concept Distillation (CD), an automatic prompt optimization technique for enhancing weaker models on complex tasks.	Emmanuel Aboah Boateng; Cassiano O Becker; Nabiha Asghar; Kabir Walia; Ashwin Srinivasan; Ehi Nosakhare; Soundararajan Srinivasan; Victor Dibia;
343	THREAD: Thinking Deeper with Recursive Spawning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Large language models (LLMs) have shown impressive capabilities across diverse settings, but still struggle as the length and complexity of the context increases. To address this challenge, we propose Thinking Recursively and Dynamically (ThReaD).	Philip Schroeder; Nathaniel W. Morgan; Hongyin Luo; James R. Glass;
344	Debate-Feedback: A Multi-Agent Framework for Efficient Legal Judgment Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, inspired by the debate phase of real courtroom trials, we propose a novel legal judgment prediction model based on the Debate-Feedback architecture, which integrates LLM multi-agent debate and reliability evaluation models.	Xi Chen; Mao Mao; Shuo Li; Haotian Shangguan;
345	On The Origin of Cultural Biases in Language Models: From Pre-training Data to Linguistic Phenomena Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we aim to uncover the origins of entity-related cultural biases in LMs by analyzing several contributing factors, including the representation of entities in pre-training data and the impact of variations in linguistic phenomena across languages.	Tarek Naous; Wei Xu;
346	Diversify-verify-adapt: Efficient and Robust Retrieval-Augmented Ambiguous Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Although the iterative RAG approach has been proposed to address this problem, it comes at the cost of significantly reduced efficiency. To address these issues, we propose the diversify-verify-adapt (DIVA) framework.	Yeonjun In; Sungchul Kim; Ryan A. Rossi; Mehrab Tanjim; Tong Yu; Ritwik Sinha; Chanyoung Park;
347	Investigating Hallucinations in Simultaneous Machine Translation: Knowledge Distillation Solution and Components Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In contrast, traditional offline machine translation (OMT) models exhibit significantly fewer hallucinations. Motivated by this disparity, we propose Knowledge Distillation for SiMT (KD-SiMT), a simple yet effective method that utilizes the OMT model to mitigate hallucinations in SiMT.	Donglei Yu; Xiaomian Kang; Yuchen Liu; Feifei Zhai; Nanchang Cheng; Yu Zhou; Chengqing Zong;
348	How LLMs React to Industrial Spatio-Temporal Data? Assessing Hallucination with A Novel Traffic Incident Benchmark Dataset Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This study originated from a real world industrial GenAI application, introduces a novel cross-lingual benchmark dataset comprising nearly 99,869 real traffic incident records from Vienna (2013-2023) to assess the robustness of state-of-the-art LLMs (= 9) in the spatio vs temporal domain for traffic incident classification.	Qiang Li; Mingkun Tan; Xun Zhao; Dan Zhang; Daoan Zhang; Shengzhao Lei; Anderson S. Chu; Lujun Li; Porawit Kamnoedboon;
349	One Unified Model for Diverse Tasks: Emotion Cause Analysis Via Self-Promote Cognitive Structure Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Drawing inspiration from this theory, in this paper, we propose a unified model capable of tackling diverse emotion cause analysis tasks, which constructs the emotion cognitive structure through LLM-based in-context learning.	Zhaoxin Yu; Xinglin Xiao; Wenji Mao;
350	Stealthy Jailbreak Attacks on Large Language Models Via Benign Data Mirroring Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose an improved transfer attack method that guides malicious prompt construction by locally training a mirror model of the target black-box model through benign data distillation.	Honglin Mu; Han He; Yuxin Zhou; Yunlong Feng; Yang Xu; Libo Qin; Xiaoming Shi; Zeming Liu; Xudong Han; Qi Shi; Qingfu Zhu; Wanxiang Che;
351	Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinement Related Papers Related Patents Related Grants Related Venues Related Experts View Save	Suchae Jeong; Inseong Choi; Youngsik Yun; Jihie Kim;
352	How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: This research investigates the evaluation and mitigation of bias in LLMs applied to complex clinical cases, focusing on gender and ethnicity biases. We introduce a novel Counterfactual Patient Variations (CPV) dataset derived from the JAMA Clinical ChallengeUsing this dataset, we built a framework for bias evaluation, employing both Multiple Choice Questions (MCQs) and corresponding explanations.	Kenza Benkirane; Jackie Kay; Maria Perez-Ortiz;
353	CPRM: A LLM-based Continual Pre-training Framework for Relevance Modeling in Commercial Search Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Furthermore, structured item text remains underutilized, and there is a shortage in the supply of corresponding queries and background knowledge. We thereby propose CPRM (Continual Pre-training for Relevance Modeling), a framework designed for the continual pre-training of LLMs to address these issues.	Kaixin Wu; Yixin Ji; Zeyuan Chen; Qiang Wang; Cunxiang Wang; Hong Liu; Baijun Ji; Xu Jia; Zhongyi Liu; Jinjie Gu; Yuan Zhou; Linjian Mo;
354	Grounding Fallacies Misrepresenting Scientific Publications in Evidence Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: These publications only superficially seem to support the false claim, when logical fallacies are applied. In this work, we aim to detect and to highlight such fallacies, which requires assessing the exact content of the misrepresented publications.	Max Glockner; Yufang Hou; Preslav Nakov; Iryna Gurevych;
355	CultureInstruct: Curating Multi-Cultural Instructions at Scale Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Large language models, despite their remarkable success in recent years, still exhibit severe cultural bias. Therefore, in this paper, we introduce CultureInstruct, a large-scale instruction-tuning dataset designed to reduce cultural bias in LLMs.	Viet Thanh Pham; Zhuang Li; Lizhen Qu; Gholamreza Haffari;
356	SVD-LLM V2: Optimizing Singular Value Truncation for Large Language Model Compression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we introduce , a novel SVD-based LLM compression method that optimizes singular value truncation in SVD compression with two key strategies.	Xin Wang; Samiul Alam; Zhongwei Wan; Hui Shen; Mi Zhang;
357	CogLM: Tracking Cognitive Development of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we construct a benchmark CogLM (Cognitive Ability Evaluation for Language Model) based on PTC to assess the cognitive levels of LLMs.	Xinglin Wang; Peiwen Yuan; Shaoxiong Feng; Yiwei Li; Boyuan Pan; Heda Wang; Yao Hu; Kan Li;
358	PICLe: Pseudo-annotations for In-Context Learning in Low-Resource Named Entity Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we conduct a perturbation study of in-context demonstrations for low-resource Named Entity Detection (NED).	Sepideh Mamooler; Syrielle Montariol; Alexander Mathis; Antoine Bosselut;
359	How to Make LLMs Forget: On Reversing In-Context Knowledge Edits Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our continuous reversal tokens prove particularly effective, with minimal impact on unedited prompts. Through analysis of output distributions, attention patterns, and token rankings, we provide insights into IKE’s effects on LLMs and how reversal tokens mitigate them.	Paul Youssef; Zhixue Zhao; Jörg Schlötterer; Christin Seifert;
360	Related Knowledge Perturbation Matters: Rethinking Multiple Pieces of Knowledge Editing in Same-Subject Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save	Zenghao Duan; Wenbin Duan; Zhiyi Yin; Yinghan Shen; Shaoling Jing; Jie Zhang; Huawei Shen; Xueqi Cheng;
361	What Goes Into A LM Acceptability Judgment? Rethinking The Impact of Frequency and Length Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Prior works in comparing LM and human acceptability judgments treat these effects uniformly across models, making a strong assumption that models require the same degree of adjustment to control for length and unigram frequency effects. We propose MORCELA, a new linking theory between LM scores and acceptability judgments where the optimal level of adjustment for these effects is estimated from data via learned parameters for length and unigram frequency.	Lindia Tjuatja; Graham Neubig; Tal Linzen; Sophie Hao;
362	Evaluating Large Language Models with Enterprise Benchmarks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work explores benchmarking strategies focused on LLM evaluation, with a specific emphasis on both English and Japanese.	Bing Zhang; Mikio Takeuchi; Ryo Kawahara; Shubhi Asthana; Maruf Hossain; Guang-Jie Ren; Kate Soule; Yifan Mai; Yada Zhu;
363	What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, the internal mechanisms of VLMs, particularly the roles of cross-attention and self-attention in multimodal integration, are not fully understood. To address this gap, we introduce NOTICE, a Gaussian-Noise-free Text-Image Corruption and Evaluation pipeline for mechanistic interpretability in VLMs.	Michal Golovanevsky; William Rudman; Vedant Palit; Carsten Eickhoff; Ritambhara Singh;
364	EC-Tab2Text: Aspect-Based Text Generation from E-Commerce Product Tables Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Large Language Models (LLMs) have demonstrated exceptional versatility across diverse domains, yet their application in e-commerce remains underexplored due to a lack of domain-specific datasets. To address this gap, we introduce eC-Tab2Text, a novel dataset designed to capture the intricacies of e-commerce, including detailed product attributes and user-specific queries.	Luis Antonio Gutierrez Guanilo; Mir Tafseer Nayeem; Cristian Jose Lopez Del Alamo; Davood Rafiei;
365	Bridging The Gap Between Expert and Language Models: Concept-guided Chess Commentary Generation and Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce Concept-guided Chess Commentary generation (CCC) for producing commentary and GPT-based Chess Commentary Evaluation (GCC-Eval) for assessing it.	Jaechang Kim; Jinmin Goh; Inseok Hwang; Jaewoong Cho; Jungseul Ok;
366	Bottom-Up Synthesis of Knowledge-Grounded Task-Oriented Dialogues with Iteratively Self-Refined Prompts Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a bottom-up conversation synthesis approach, where QA pairs are generated first and then combined into a coherent dialogue.	Kun Qian; Maximillian Chen; Siyan Li; Arpit Sharma; Zhou Yu;
367	Granite Guardian: Comprehensive LLM Safeguarding Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: These challenges highlight the urgent need for robust safeguards to ensure safe and responsible AI. To address this, we introduce Granite Guardian, a suite of advanced models designed to detect and mitigate risks associated with prompts and responses, enabling seamless integration with any large language model (LLM).	Inkit Padhi; Manish Nagireddy; Giandomenico Cornacchia; Subhajit Chaudhury; Tejaswini Pedapati; Pierre Dognin; Keerthiram Murugesan; Erik Miehling; Martín Santillán Cooper; Kieran Fraser; Giulio Zizzo; Muhammad Zaid Hameed; Mark Purcell; Michael Desmond; Qian Pan; Inge Vejsbjerg; Elizabeth M. Daly; Michael Hind; Werner Geyer; Ambrish Rawat; Kush R. Varshney; Prasanna Sattigeri;
368	MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Remarkably, Large Language Models (LLMs) without any visual perception capabilities achieve non-trivial performance, undermining the credibility of these evaluations. To address this issue while maintaining the efficiency of MCQ evaluations, we propose MMEVALPRO, a benchmark designed to avoid Type-I errors through a trilogy evaluation pipeline and more rigorous metrics.	Jinsheng Huang; Liang Chen; Taian Guo; Fu Zeng; Yusheng Zhao; Bohan Wu; Ye Yuan; Haozhe Zhao; Zhihui Guo; Yichi Zhang; Jingyang Yuan; Wei Ju; Luchen Liu; Tianyu Liu; Baobao Chang; Ming Zhang;
369	MADial-Bench: Towards Real-world Evaluation of Memory-Augmented Dialogue Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce new scoring criteria to the evaluation, including memory injection, emotion support (ES) proficiency, and intimacy, to comprehensively assess generated responses.	Junqing He; Liang Zhu; Rui Wang; Xi Wang; Gholamreza Haffari; Jiaxing Zhang;
370	Enhancing Function-Calling Capabilities in LLMs: Strategies for Prompt Formats, Data Integration, and Multilingual Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This research delves into enhancing the function-calling capabilities of LLMs by exploring different approaches, including prompt formats for integrating function descriptions, blending function-calling and instruction-following data, introducing a novel Decision Token for conditional prompts, leveraging chain-of-thought reasoning, and overcoming multilingual challenges with a translation pipeline.	Yi-Chang Chen; Po-Chun Hsu; Chan-Jan Hsu; Da-shan Shiu;
371	Measuring and Benchmarking Large Language Models’ Capabilities to Generate Persuasive Language Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In our analysis, we find that different ‘personas’ in LLaMA3’s system prompt change persuasive language substantially, even when only instructed to paraphrase.	Amalie Brogaard Pauli; Isabelle Augenstein; Ira Assent;
372	Defense Against Prompt Injection Attacks Via Mixture of Encodings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Despite its efficacy, this method can degrade LLM performance on certain NLP tasks. To address this challenge, we propose a novel defense mechanism: mixture of encodings, which utilizes multiple character encodings, including Base64.	Ruiyi Zhang; David Sullivan; Kyle Jackson; Pengtao Xie; Mei Chen;
373	Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design Related Papers Related Patents Related Grants Related Venues Related Experts View Save	Mohan Zhang; Pingzhi Li; Jie Peng; Mufan Qiu; Tianlong Chen;
374	Fine-Grained Transfer Learning for Harmful Content Detection Through Label-Specific Soft Prompt Tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, detection tasks vary—some focus on hate speech, offensive, or abusive content, which differ in the intent to harm, while others focus on identifying targets of harmful speech such as racism, sexism, etc. —raising the challenge of handling nuanced class differences. To address these issues, we introduce a novel transfer learning method that leverages class-specific knowledge to enhance harmful content detection.	Faeze Ghorbanpour; Viktor Hangya; Alexander Fraser;
375	Computational Discovery of Chiasmus in Ancient Religious Text Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we introduce the first computational approach to systematically detect chiasmus within Biblical passages.	Hope McGovern; Hale Sirin; Tom Lippincott;
376	Characterizing The Effects of Translation on Intertextuality Using Multilingual Embedding Spaces Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We investigate the use of multilingual embedding spaces to characterize the preservation of intertextuality, one common rhetorical device, across human and machine translation.	Hope McGovern; Hale Sirin; Tom Lippincott;
377	Taxi1500: A Dataset for Multilingual Text Classification in 1500 Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: One reason is the lack of evaluation datasets that cover a diverse range of languages, particularly those that are low-resource or endangered. To address this gap, we present a large-scale text classification dataset encompassing 1504 languages many of which have otherwise limited or no annotated data.	Chunlan Ma; Ayyoob Imani; Haotian Ye; Renhao Pei; Ehsaneddin Asgari; Hinrich Schuetze;
378	Decoding Speculative Decoding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we perform a detailed study comprising over 350 experiments with LLaMA-65B and OPT-66B using speculative decoding and delineate the factors that affect the performance gain provided by speculative decoding.	Minghao Yan; Saurabh Agarwal; Shivaram Venkataraman;
379	ReasVQA: Advancing VideoQA with Imperfect Reasoning Process Related Papers Related Patents Related Grants Related Venues Related Experts View Save	Jianxin Liang; Xiaojun Meng; Huishuai Zhang; Yueqian Wang; Jiansheng Wei; Dongyan Zhao;
380	Mitigating Hallucinated Translations in Large Language Models with Hallucination-focused Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While effective, this approach introduces additional complexity in deploying extra tools in production and also increases latency. To address these limitations, we propose a method that intrinsically learns to mitigate hallucinations during the model training phase.	Zilu Tang; Rajen Chatterjee; Sarthak Garg;
381	SPeCtrum: A Grounded Framework for Multidimensional Identity Representation in LLM-Based Agent Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Existing methods for simulating individual identities often oversimplify human complexity, which may lead to incomplete or flattened representations. To address this, we introduce SPeCtrum, a grounded framework for constructing authentic LLM agent personas by incorporating an individual’s multidimensional self-concept.	Keyeun Lee; Seo Hyeong Kim; Seolhee Lee; Jinsu Eun; Yena Ko; Hayeon Jeon; Esther Hehsun Kim; Seonghye Cho; Soeun Yang; Eun-mee Kim; Hajin Lim;
382	Divergent Thoughts Toward One Goal: LLM-based Multi-Agent Collaboration System for Electronic Design Automation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Any errors will lead to the instability and failure of EDA flow automation. To address these challenges, we introduce EDAid, a multi-agent collaboration system where multiple agents harboring divergent thoughts converge towards a common goal, ensuring reliable and successful EDA flow automation.	Haoyuan Wu; Haisheng Zheng; Zhuolun He; Bei Yu;
383	MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose MiLoRA, a simple yet effective LLM finetuning approach that only updates the minor singular components of the weight matrix while keeping the principal singular components frozen.	Hanqing Wang; Yixia Li; Shuo Wang; Guanhua Chen; Yun Chen;
384	MixLLM: Dynamic Routing in Mixed Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the challenges involve: (1) dynamic trade-offs among quality, cost, and latency; (2) enabling continual learning in deployed systems; and (3) navigating a varying (e. g. , new LLM addition or old LLM removal) set of LLM candidates over time. To bridge these gaps, we develop MixLLM, a dynamic contextual-bandit-based routing system for query-LLM assignment.	Xinyuan Wang; Yanchi Liu; Wei Cheng; Xujiang Zhao; Zhengzhang Chen; Wenchao Yu; Yanjie Fu; Haifeng Chen;
385	ChaI-TeA: A Benchmark for Evaluating Autocompletion of Interactions with LLM-based Chatbots Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We present ChaI-TeA: Chat Interaction Autocomplete; An autocomplete evaluation framework for LLM-based chatbot interactions.	Shani Goren; Oren Kalinsky; Tomer Stav; Yuri Rapoport; Yaron Fairstein; Ram Yazdi; Nachshon Cohen; Alexander Libov; Guy Kushilevitz;
386	Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision–Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: How well do current vision-language models (VLMs) navigate these nuances? To investigate this, we create the first multimodal and multilingual parallel hate speech dataset, annotated by a multiculturally diverse set of annotators, called Multi3Hate.	Minh Duc Bui; Katharina Von Der Wense; Anne Lauscher;
387	MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, current large language models (LLMs) like GPT-4 struggle with professional MCQG due to outdated knowledge, hallucination issues, and prompt sensitivity, resulting in unsatisfactory quality and difficulty. To address these challenges, we propose MCQG-SRefine, an LLM self-refine-based (Critique and Correction) framework for converting medical cases into high-quality USMLE-style questions.	Zonghai Yao; Aditya Parashar; Huixue Zhou; Won Seok Jang; Feiyun Ouyang; Zhichao Yang; Hong Yu;
388	Text2Sql: Pure Fine-Tuning and Pure Knowledge Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This approach has two issues: 1) the model’s context is limited when dealing with a large number of database tables; 2) the question is often related to only a few tables, leading to excessive irrelevant information that distracts the model. To address these issues, we employed pure fine-tuning strategy to reduce redundancy.	Gao yu Zhu; Wei Shao; Xichou Zhu; Lei Yu; Jiafeng Guo; Xueqi Cheng;
389	SMAB: MAB Based Word Sensitivity Estimation Framework and Its Applications in Adversarial Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Though effective, calculating sensitivity at scale using this framework is costly because of exponential time complexity. Therefore, we introduce a Sensitivity-based Multi-Armed Bandit framework (SMAB), which provides a scalable approach for calculating word-level local (sentence-level) and global (aggregated) sensitivities concerning an underlying text classifier for any dataset.	Saurabh Kumar Pandey; Sachin Vashistha; Debrup Das; Somak Aditya; Monojit Choudhury;
390	Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work addresses the challenge of transitioning pre-trained NMT models from absolute Sinusoidal PEs to Relative PEs, such as RoPE and ALiBi, without compromising performance.	Varun Gumma; Pranjal A Chitale; Kalika Bali;
391	Main Predicate and Their Arguments As Explanation Signals For Intent Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This observation enables us to hypothesize that the main predicate in the text utterances, along with the arguments of the main predicate, can serve as explanation signals. Leveraging this, we introduce a new technique to automatically augment text samples from intent classification datasets with word-level explanations.	Sameer Pimparkhede; Pushpak Bhattacharyya;
392	Knowledge Graph-Guided Retrieval Augmented Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose a novel Knowledge Graph-Guided Retrieval Augmented Generation (KG2RAG) framework that utilizes knowledge graphs (KGs) to provide fact-level relationships between chunks, improving the diversity and coherence of the retrieved results.	Xiangrong Zhu; Yuexiang Xie; Yi Liu; Yaliang Li; Wei Hu;
393	HMT: Hierarchical Memory Transformer for Efficient Long Context Language Processing Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Thus, we propose the Hierarchical Memory Transformer (HMT), a novel framework that facilitates a model’s long-context processing ability by imitating human memorization behavior.	Zifan He; Yingqi Cao; Zongyue Qin; Neha Prakriya; Yizhou Sun; Jason Cong;
394	Understanding LLM Development Through Longitudinal Study: Insights from The Open Ko-LLM Leaderboard Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: By extending the analysis duration, we aim to provide a more comprehensive understanding of the progression in developing Korean large language models (LLMs).	Chanjun Park; Hyeonwoo Kim;
395	Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Furthermore, the benchmark suite is largely composed of translated versions of their English counterparts, which may not fully capture the intricacies of the Korean language. To address these issues, we propose Open Ko-LLM Leaderboard2, an improved version of the earlier Open Ko-LLM Leaderboard.	Hyeonwoo Kim; Dahyun Kim; Jihoo Kim; Sukyung Lee; Yungi Kim; Chanjun Park;
396	CoME: An Unlearning-based Approach to Conflict-free Model Editing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we propose Conflict-free Model Editing (CoME), a novel framework that enhances the accuracy of knowledge updates in LLMs by selectively removing outdated knowledge.	Dahyun Jung; Jaehyung Seo; Jaewook Lee; Chanjun Park; Heuiseok Lim;
397	GameTox: A Comprehensive Dataset and Analysis for Enhanced Toxicity Detection in Online Gaming Communities Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce GameTox, a novel dataset comprising 53K game chat utterances annotated for toxicity detection through intent classification and slot filling.	Usman Naseem; Shuvam Shiwakoti; Siddhant Bikram Shah; Surendrabikram Thapa; Qi Zhang;
398	VisCGEC: Benchmarking The Visual Chinese Grammatical Error Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Save	Xiaoman Wang; Dan Yuan; Xin Liu; Yike Zhao; Xiaoxiao Zhang; Xizhi Chen; Yunshi Lan;
399	CORRECT: Context- and Reference-Augmented Reasoning and Prompting for Fact-Checking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, most fact-checking models mainly focus on the reasoning within evidence sentences, and ignore the auxiliary contexts and references. To address this problem, we propose a novel method, Context- and Reference-augmented Reasoning and Prompting.	Delvin Ce Zhang; Dongwon Lee;
400	Improving and Assessing The Fidelity of Large Language Models Alignment to Online Communities Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents a robust framework for aligning LLMs with online communities via instruction-tuning and comprehensively evaluating alignment across various aspects of language, including authenticity, emotional tone, toxicity, and harm.	Minh Duc Chu; Zihao He; Rebecca Dorn; Kristina Lerman;
401	SweEval: Do LLMs Really Swear? A Safety Benchmark for Testing Limits for Enterprise Use Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: For enterprise applications, it is crucial to mitigate reputational risks, maintain trust, and ensure compliance by effectively identifying and handling unsafe or offensive language. To address this, we introduce SweEval, a benchmark simulating real-world scenarios with variations in tone (positive or negative) and context (formal or informal).	Hitesh Laxmichand Patel; Amit Agarwal; Arion Das; Bhargava Kumar; Srikant Panda; Priyaranjan Pattnayak; Taki Hasan Rafi; Tejaswini Kumar; Dong-Kyu Chae;
402	ScratchEval: Are GPT-4o Smarter Than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, these benchmarks are limited to specific visual programming scenarios where the logic reasoning and the multimodal understanding capacities are split apart. To fill this gap, we propose ScratchEval, a novel benchmark designed to evaluate the visual programming reasoning ability of LMMs.	Rao Fu; Ziyang Luo; Hongzhan Lin; Zhen Ye; Jing Ma;
403	Unlearning As Multi-task Optimization: A Normalized Gradient Difference Approach with An Adaptive Learning Rate Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we examine machine unlearning from an optimization perspective, framing it as a regularized multi-task optimization problem, where one task optimizes a forgetting objective and another optimizes the model performance.	Xiaomeng Jin; Zhiqi Bu; Bhanukiran Vinzamuri; Anil Ramakrishna; Kai-Wei Chang; Volkan Cevher; Mingyi Hong;
404	Octopus: On-device Language Model for Function Calling of Software APIs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This study presents a framework to train a series of on-device LLMs optimized for invoking software APIs.	Wei Chen; Zhiyuan Li; Mingyuan Ma;
405	VividMed: Vision Language Model with Versatile Visual Grounding for Medicine Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: The lack of medical data further compounds these obstacles. To address these challenges, we present VividMed, a vision language model with versatile visual grounding for medicine.	Lingxiao Luo; Bingda Tang; Xuanzhong Chen; Rong Han; Ting Chen;
406	Exploiting Edited Large Language Models As General Scientific Optimizers Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, due to LLM’s high sensitivity to the prompts and tendency to get lost in lengthy prompts, these methods struggle to effectively utilize the observational feedback from each optimization step, which severely hinders the applications for real-world scenarios. To address these challenges, we propose a conceptually simple and general bi-level optimization method, namely General Scientific Optimizers (GSO).	Qitan Lv; Tianyu Liu; Hong Wang;
407	Watching The AI Watchdogs: A Fairness and Robustness Analysis of AI Safety Moderation Classifiers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we thus examine the fairness and robustness of four widely-used, closed-source ASM classifiers: OpenAI Moderation API, Perspective API, Google Cloud Natural Language (GCNL) API, and Clarifai API.	Akshit Achara; Anshuman Chhabra;
408	MedEthicEval: Evaluating Large Language Models Based on Chinese Medical Ethics Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces MedEthicEval, a novel benchmark designed to systematically evaluate LLMs in the domain of medical ethics.	Haoan Jin; Jiacheng Shi; Hanhui Xu; Kenny Q. Zhu; Mengyue Wu;
409	SafetyQuizzer: Timely and Dynamic Evaluation on The Safety of LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This hinders the effective application of these benchmarks in continuous evaluation tasks. To address these limitations, we propose SafetyQuizzer, a question-generation framework designed to evaluate the safety of LLMs more sustainably in the Chinese context.	Zhichao Shi; Shaoling Jing; Yi Cheng; Hao Zhang; Yuanzhuo Wang; Jie Zhang; Huawei Shen; Xueqi Cheng;
410	WaterPool: A Language Model Watermark Mitigating Trade-Offs Among Imperceptibility, Efficacy and Robustness Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we introduce WaterPool, a simple yet effective key module that preserves a complete key sampling space for imperceptibility while utilizing semantics-based search to improve the key restoration process.	Baizhou Huang; Xiaojun Wan;
411	B4: A Black-Box Scrubbing Attack on LLM Watermarks Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Targeting at a more realistic black-box threat model with fewer assumptions, we here propose B4, a black-box scrubbing attack on watermarks.	Baizhou Huang; Xiao Pu; Xiaojun Wan;
412	MAD Speech: Measures of Acoustic Diversity of Speech Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, the extent to which generated speech is acoustically diverse remains unclear due to a lack of appropriate metrics. We address this gap by developing lightweight metrics of acoustic diversity, which we collectively refer to as MAD Speech.	Matthieu Futeral; Andrea Agostinelli; Marco Tagliasacchi; Neil Zeghidour; Eugene Kharitonov;
413	SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To this end, we propose a novel backward chaining system, SymBa (Symbolic Backward Chaining), which integrates a symbolic solver and an LLM.	Jinu Lee; Wonseok Hwang;
414	StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, these models often face issues such as slow inference speeds, reliance on complex pre-trained neural codec representations, and difficulties in achieving naturalness and high similarity to reference speakers. To address these challenges, this work introduces StyleTTS-ZS, an efficient zero-shot TTS model that leverages distilled time-varying style diffusion to capture diverse speaker identities and prosodies.	Yinghao Aaron Li; Xilin Jiang; Cong Han; Nima Mesgarani;
415	InfoPO: On Mutual Information Maximization for Large Language Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite these benefits, these methods rely on explicit assumptions about the Bradley-Terry (BT) model, which makes them prone to overfitting and results in suboptimal performance, particularly on reasoning-heavy tasks. To address these challenges, we propose a principled preference fine-tuning algorithm called InfoPO, which effectively and efficiently aligns large language models using preference data.	Teng Xiao; Zhen Ge; Sujay Sanghavi; Tian Wang; Julian Katz-Samuels; Marc Versage; Qingjun Cui; Trishul Chilimbi;
416	Explore The Reasoning Capability of LLMs in The Chess Testbed Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Based on the observation that expert chess players employ a dual approach combining long-term strategic play with short-term tactical play along with language explanation, we propose improving the reasoning capability of large language models in chess by integrating annotated strategy and tactic.	Shu Wang; Lei Ji; Renxi Wang; Wenxiao Zhao; Haokun Liu; Yifan Hou; Ying Nian Wu;
417	Step-by-Step Fact Verification System for Medical Claims with Explainable Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we apply an iterative FV system on three medical fact-checking datasets and evaluate it with multiple settings, including different LLMs, external web search, and structured reasoning using logic predicates.	Juraj Vladika; Ivana Hacajova; Florian Matthes;
418	Evaluating Small Language Models for News Summarization: Implications and Factors Influencing Performance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In contrast, small language models (SLMs) present a more accessible alternative, capable of real-time summarization on edge devices.	Borui Xu; Yao Chen; Zeyi Wen; Weiguo Liu; Bingsheng He;
419	COVE: COntext and VEracity Prediction for Out-of-context Images Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we introduce COVE, a new method that predicts first the true COntext of the image and then uses it to predict the VEracity of the caption.	Jonathan Tonglet; Gabriel Thiem; Iryna Gurevych;
420	Towards Robust Knowledge Representations in Multilingual LLMs for Equivalence and Inheritance Based Consistent Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we focus on evaluating whether LLMs have the requisite representations to reason using two foundational relationships: “equivalence” and “inheritance”.	Gaurav Arora; Srujana Merugu; Shreya Jain; Vaibhav Saxena;
421	Causally Modeling The Linguistic and Social Factors That Predict Email Response Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we aim to model the intents, expectations, and responsiveness in email exchanges.	Yinuo Xu; Hong Chen; Sushrita Rakshit; Aparna Ananthasubramaniam; Omkar Yadav; Mingqian Zheng; Michael Jiang; Lechen Zhang; Bowen Yi; Kenan Alkiek; Abraham Israeli; Bangzhao Shu; Hua Shen; Jiaxin Pei; Haotian Zhang; Miriam Schirmer; David Jurgens;
422	Evaluating Defeasible Reasoning in LLMs with DEFREASING Related Papers Related Patents Related Grants Related Venues Related Experts View Save	Emily Allaway; Kathleen McKeown;
423	Meta-Cultural Competence: Climbing The Right Hill of Cultural Awareness Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, “culture” is a complex, multifaceted topic, and its awareness, representation, and modeling in LLMs and LLM-based applications can be defined and measured in numerous ways. In this position paper, we ask what does it mean for an LLM to possess “cultural awareness”, and through a thought experiment, which is an extension of the Octopus test proposed by Bender and Koller (2020), we argue that it is not cultural awareness or knowledge, rather meta-cultural competence, which is required of an LLM and LLM-based AI system that will make it useful across various, including completely unseen, cultures.	Sougata Saha; Saurabh Kumar Pandey; Monojit Choudhury;
424	Reading Between The Lines: Can LLMs Identify Cross-Cultural Communication Gaps? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we investigate the extent and patterns of gaps in understandability of book reviews due to the presence of culturally-specific items and elements that might be alien to users from another culture.	Sougata Saha; Saurabh Kumar Pandey; Harshit Gupta; Monojit Choudhury;
425	Language Models Can Infer Action Semantics for Symbolic Planners from Environment Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose Predicting Semantics of Actions with Language Models (PSALM), which automatically learns action semantics by leveraging the strengths of both symbolic planners and LLMs.	Wang Bill Zhu; Ishika Singh; Robin Jia; Jesse Thomason;
426	Aligning Sentence Simplification with ESL Learner’s Proficiency for Language Acquisition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This study goes a step further and aims to facilitate ESL learners’ language acquisition by simplification.	Guanlin Li; Yuki Arase; Noel Crespi;
427	FLEX: Expert-level False-Less EXecution Metric for Text-to-SQL Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Thus, this paper introduces FLEX(False-Less EXecution), a novel approach to evaluating text-to-SQL systems using large language models (LLMs) to emulate human expert-level evaluation of SQL queries.	Heegyu Kim; Jeon Taeyang; SeungHwan Choi; Seungtaek Choi; Hyunsouk Cho;
428	AID: Adaptive Integration of Detectors for Safe AI with Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, defining safety is complex, given that entities across domains may interpret it through varied lenses and develop safety detectors—models trained to identify specific unsafe content based on predefined criteria. To address this complexity, we introduce the approach of Adaptive Integration of Detectors (AID) to orchestrate the strengths of multiple pretrained detectors to ensure comprehensive effectiveness in diverse scenarios.	Xinran Wang; Enmao Diao; Qi Le; Jie Ding; Ali Anwar;
429	Rethinking The Role of LLMs for Document-level Relation Extraction: A Refiner with Task Distribution and Probability Fusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Additionally, another noteworthy challenge and discovery we reveal: the small language models (SLMs) for DocRE tend to classify existing relations as ”no relation” (NA), while LLMs tend to predict existing relations for all entity pairs. To address these challenges, we propose a novel method that utilizes LLMs as a refiner, employing task distribution and probability fusion.	Fu Zhang; Xinlong Jin; Jingwei Cheng; Hongsen Yu; Huangming Xu;
430	Wav2Prompt: End-to-End Speech Prompt Learning and Task-based Fine-tuning for Text-based LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Wav2Prompt uses a straightforward training process with only the same data used to train an automatic speech recognition (ASR) model.	Keqi Deng; Guangzhi Sun; Phil Woodland;
431	Automatic Evaluation of Healthcare LLMs Beyond Question-Answering Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This work is focused on the healthcare domain, where both factuality and discourse matter greatly. It introduces a comprehensive, multi-axis suite for healthcare LLM evaluation, exploring correlations between open and close benchmarks and metrics.	Anna Arias-Duart; Pablo Agustin Martin-Torres; Daniel Hinjos; Pablo Bernabeu-Perez; Lucia Urcelay Ganzabal; Marta Gonzalez Mallo; Ashwin Kumar Gururajan; Enrique Lopez-Cuena; Sergio Alvarez-Napagao; Dario Garcia-Gasulla;
432	Private Synthetic Text Generation with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Here the evidence is missing, yet the promises from private image generation look strong. In this paper we address this open question by extensive experiments.	Sebastian Ochs; Ivan Habernal;
433	A Systematic Examination of Preference Learning Through The Lens of Instruction-Following Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work we systematically investigate how specific attributes of preference datasets affect the alignment and downstream performance of LLMs in instruction-following tasks.	Joongwon Kim; Anirudh Goyal; Aston Zhang; Bo Xiong; Rui Hou; Melanie Kambadur; Dhruv Mahajan; Hannaneh Hajishirzi; Liang Tan;
434	The Impact of Visual Information in Chinese Characters: Evaluating Large Models’ Ability to Recognize and Utilize Radicals Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we establish a benchmark to evaluate LLMs’ and VLMs’ understanding of visual elements in Chinese characters, including radicals, composition structures, strokes, and stroke counts.	Xiaofeng Wu; Karl Stratos; Wei Xu;
435	DPL: Diverse Preference Learning Without A Reference Model Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose Diverse Preference Learning (DPL), a reference model-free method that simultaneously learns a baseline desirability in LLM responses while being robust to the diversity of preference annotations.	Abhijnan Nath; Andrey Volozin; Saumajit Saha; Albert Aristotle Nanda; Galina Grunin; Rahul Bhotika; Nikhil Krishnaswamy;
436	PLEX: Adaptive Parameter-Efficient Fine-Tuning for Code LLMs Using Lottery-Tickets Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose PLEX, a lottery-ticket based parameter-efficient fine-tuning (PEFT) method that adapts LLMs to either well-supported and underrepresented PLs.	Jaeseong Lee; Hojae Han; Jongyoon Kim; Seung-won Hwang; Naun Kang; KyungJun An; Sungho Jang;
437	Mitigating Biases of Large Language Models in Stance Detection with Counterfactual Augmented Calibration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Furthermore, the results demonstrate a strong negative correlation between stance bias and stance detection performance, underscoring the importance of mitigating bias to enhance the utility of LLMs in stance detection. Therefore, in this paper, we propose a Counterfactual Augmented Calibration Network (FACTUAL), which a novel calibration network is devised to calibrate potential bias in the stance prediction of LLMs.	Ang Li; Jingqian Zhao; Bin Liang; Lin Gui; Hui Wang; Xi Zeng; Xingwei Liang; Kam-Fai Wong; Ruifeng Xu;
438	Multi-Conditional Ranking with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we define and explore the task of multi-conditional ranking by introducing MCRank, a benchmark tailored for assessing multi-conditional ranking across various item types and conditions.	Pouya Pezeshkpour; Estevam Hruschka;
439	RAD-Bench: Evaluating Large Language Models’ Capabilities in Retrieval Augmented Dialogues Related Papers Related Patents Related Grants Related Venues Related Experts View Save	Tzu-Lin Kuo; FengTing Liao; Mu-Wei Hsieh; Fu-Chieh Chang; Po-Chun Hsu; Da-shan Shiu;
440	Self-Harmonized Chain of Thought Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We propose ECHO (Self-Harmonized Chain of Thought), a novel method that unifies diverse solution paths into a consistent and effective reasoning pattern.	Ziqi Jin; Wei Lu;
441	A Diverse and Effective Retrieval-Based Debt Collection System with Expert Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents a debt collection system based on real debtor-collector data from a major commercial bank.	Jiaming Luo; Weiyi Luo; Guoqing Sun; Mengchen Zhu; Haifeng Tang; Kenny Q. Zhu; Mengyue Wu;
442	Towards Federated Low-Rank Adaptation of Language Models with Rank Heterogeneity Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Our analysis attributes this instability to the conventional zero-padding aggregation strategy, which dilutes information from high-rank clients during model aggregation. To address this issue, we propose a replication-based padding strategy that better retains valuable information from clients with high-quality data.	Yuji Byun; Jaeho Lee;
443	ALinFiK: Learning to Approximate Linearized Future Influence Kernel for Scalable Third-Parity LLM Data Valuation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we aim to offer a third-party data valuation approach that benefits both data providers and model developers.	Yanzhou Pan; Huawei Lin; Yide Ran; Jiamin Chen; Xiaodong Yu; Weijie Zhao; Denghui Zhang; Zhaozhuo Xu;
444	AgentMove: A Large Language Model Based Agentic Framework for Zero-shot Next Location Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce AgentMove, a systematic agentic prediction framework to achieve generalized next location prediction.	Jie Feng; Yuwei Du; Jie Zhao; Yong Li;
445	CONSTRUCTA: Automating Commercial Construction Schedules in Fabrication Facilities with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We propose CONSTRUCTA, a novel framework leveraging LLMs to optimize construction schedules in complex projects like semiconductor fabrication.	Yifan Zhang; Xue Yang;
446	Alligators All Around: Mitigating Lexical Confusion in Low-resource Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: So, for example, “lion” might be translated as “alligator”, and “orange” might be rendered as “purple. ” We propose a recall-based metric for measuring this problem and show that the problem exists in 122 low-resource languages.	Elizabeth Nielsen; Isaac Rayburn Caswell; Jiaming Luo; Colin Cherry;
447	WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we present WaveFM, a reparameterized flow matching model for mel-spectrogram conditioned speech synthesis, designed to enhance both sample quality and generation speed for diffusion vocoders.	Tianze Luo; Xingchen Miao; Wenbo Duan;
448	Elevating Legal LLM Responses: Harnessing Trainable Logical Structures and Semantic Knowledge with Legal Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this paper, we propose the Logical-Semantic Integration Model (LSIM), a novel supervised framework that bridges semantic and logical coherence.	Rujing Yao; Yang Wu; Chenghao Wang; Jingwei Xiong; Fang Wang; Xiaozhong Liu;
449	CAST: Corpus-Aware Self-similarity Enhanced Topic Modelling Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In parallel, it is found that functional words are frequently selected over topical words. To address these limitations, we introduce CAST: Corpus-Aware Self-similarity Enhanced Topic modelling, a novel topic modelling method that builds upon candidate centroid word embeddings contextualized on the dataset, and a novel self-similarity-based method to filter out less meaningful tokens.	Yanan Ma; Chenghao Xiao; Chenhan Yuan; Sabine N Van Der Veer; Lamiece Hassan; Chenghua Lin; Goran Nenadic;
450	Navigating The Cultural Kaleidoscope: A Hitchhiker’s Guide to Sensitivity in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present two key contributions: (1) A cultural harm test dataset, created to assess model outputs across different cultural contexts through scenarios that expose potential cultural insensitivities, and (2) A culturally aligned preference dataset, aimed at restoring cultural sensitivity through fine-tuning based on feedback from diverse annotators.	Somnath Banerjee; Sayan Layek; Hari Shrawgi; Rajarshi Mandal; Avik Halder; Shanu Kumar; Sagnik Basu; Parag Agrawal; Rima Hazra; Animesh Mukherjee;
451	Breaking Boundaries: Investigating The Effects of Model Editing on Cross-linguistic Performance Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Through rigorous testing across eight languages spanning high-resource (English, German, French, Italian, Spanish) and low-resource (Hindi, Tamil, Kannada) settings, we reveal systemic failures in preserving multilingual reliability and adaptability.	Somnath Banerjee; Avik Halder; Rajarshi Mandal; Sayan Layek; Ian Soboroff; Rima Hazra; Animesh Mukherjee;
452	Repetition Neurons: How Do Language Models Produce Repetitions? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper introduces repetition neurons, which can be regarded as “skill neurons” responsible for the repetition problem in text generation tasks.	Tatsuya Hiraoka; Kentaro Inui;
453	Temporal-Aware Soft Prompt Tuning for Automatic Text Dating Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper presents Temporal-aware Soft Prompt Tuning (TASPT), a novel approach for automatic text dating.	Hai Wang; Yuzhi Liang; Han Ren;
454	ManaTTS Persian: A Recipe for Creating TTS Datasets for Lower Resource Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we introduce ManaTTS, the most extensive publicly accessible single-speaker Persian corpus, and a comprehensive framework for collecting transcribed speech datasets for the Persian language.	Mahta Fetrat Qharabagh; Zahra Dehghanian; Hamid R. Rabiee;
455	Decomposition Dilemmas: Does Claim Decomposition Boost or Burden Fact-Checking Performance? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To date, no comprehensive analysis has been conducted to understand this variability. To address this gap, we present an in-depth analysis that explicitly examines the impact of decomposition on downstream verification performance.	Qisheng Hu; Quanyu Long; Wenya Wang;
456	QSpell 250K: A Large-Scale, Practical Dataset for Chinese Search Query Spell Correction Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Despite the availability of comprehensive datasets like Microsoft Speller and Webis, their monolingual nature and limited scope pose significant challenges in evaluating modern pre-trained language models such as BERT and GPT. To address this, we introduce QSpell 250K, a large-scale benchmark specifically developed for Chinese Query Spelling Correction.	Dezhi Ye; Haomei Jia; Junwei Hu; Tian Bowen; Jie Liu; Haijin Liang; Jin Ma; Wenmin Wang;
457	Graph Neural Network Enhanced Retrieval for Question Answering of Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel retrieval method, called GNN-Ret, which leverages graph neural networks (GNNs) to enhance retrieval by exploiting the relatedness between passages.	Zijian Li; Qingyan Guo; Jiawei Shao; Lei Song; Jiang Bian; Jun Zhang; Rui Wang;
458	Coverage-based Fairness in Multi-document Summarization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this work, we propose a new summary-level fairness measure, Equal Coverage, which is based on coverage of documents with different social attribute values and considers the redundancy within documents.	Haoyuan Li; Yusen Zhang; Rui Zhang; Snigdha Chaturvedi;
459	Analyzing (In)Abilities of SAEs Via Formal Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: While the efficacy and pitfalls of such methods are well-studied in vision, there is a lack of corresponding results, both qualitative and quantitative, for the text domain. We aim to address this gap by training sparse autoencoders (SAEs) on a synthetic testbed of formal languages.	Abhinav Menon; Manish Shrivastava; David Krueger; Ekdeep Singh Lubana;
460	Fine-Tuning Large Language Models with Sequential Instructions Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We find that existing instruction-tuned models usually struggle to adhere to a query with multiple intentions, which impairs their performance when the completion of several tasks is demanded by a single command.	Hanxu Hu; Simon Yu; Pinzhen Chen; Edoardo Ponti;
461	Learning to Solve Domain-Specific Calculation Problems with Knowledge-Intensive Programs Generator Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we investigate into knowledge-intensive calculation problems.	Chengyuan Liu; Shihang Wang; Lizhi Qing; Jun Lin; Ji Zhang; Fei Wu; Kun Kuang;
462	STEP: Staged Parameter-Efficient Pre-training for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce STaged parameter-Efficient Pre-training (STEP), which integrates parameter-efficient tuning techniques with model growth.	Kazuki Yano; Takumi Ito; Jun Suzuki;
463	Exploring The Cost-Effectiveness of Perspective Taking in Crowdsourcing Subjective Assessment: A Case Study of Toxicity Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, using toxicity evaluation as an example, we explore the feasibility of using perspective taking—that is, asking annotators to take the point of views of a certain subgroup and estimate opinions within that subgroup—as a way to achieve this objective cost-efficiently.	Xiaoni Duan; Zhuoyan Li; Chien-Ju Ho; Ming Yin;
464	SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We investigate the mechanistic sources of uncertainty in large language models (LLMs), an area with important implications for language model reliability and trustworthiness.	Carter Teplica; Yixin Liu; Arman Cohan; Tim G. J. Rudner;
465	TurtleBench: A Visual Programming Benchmark in Turtle Geometry Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: We introduce TurtleBench, a benchmark designed to evaluate LMMs’ capacity to interpret geometric patterns—given visual examples, textual instructions, or both—and generate precise code outputs.	Sina Rismanchian; Yasaman Razeghi; Sameer Singh; Shayan Doroudi;
466	Self-calibration for Language Model Quantization and Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose self-calibration as a solution.	Miles Williams; George Chrysostomou; Nikolaos Aletras;
467	DTELS: Towards Dynamic Granularity of Timeline Summarization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save	Chenlong Zhang; Tong Zhou; Pengfei Cao; Zhuoran Jin; Yubo Chen; Kang Liu; Jun Zhao;
468	CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Thus, we develop CRScore — a reference-free metric to measure dimensions of review quality like conciseness, comprehensiveness, and relevance.	Atharva Naik; Marcus Alenius; Daniel Fried; Carolyn Rose;
469	Context-Efficient Retrieval with Factual Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Here we demonstrate that pre-processing the external corpus into semi-structured “atomic facts” makes retrieval more efficient.	Yanhong Li; David Yunis; David McAllester; Jiawei Zhou;
470	Palette of Language Models: A Solver for Controlled Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: A common approach is to linearly combine single-attribute models, but this strategy often overlooks attribute overlaps and can lead to conflicts. Therefore, we propose a novel combination strategy inspired by the Law of Total Probability and Conditional Mutual Information Minimization on generative language models.	Zhe Yang; Yi Huang; Yaqin Chen; XiaotingWu XiaotingWu; Junlan Feng; Chao Deng;
471	Exploring Large Language Models for Effective Rumor Detection on Social Media Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we explore using Large Language Models (LLMs) for rumor detection on social media.	Yirong Zeng; Xiao Ding; Bibo Cai; Ting Liu; Bing Qin;
472	LLaSA: Large Language and Structured Data Assistant Related Papers Related Patents Related Grants Related Venues Related Experts View Save	Yao Xu; Shizhu He; Jiabei Chen; ZengXiangrong ZengXiangrong; Bingning Wang; Guang Liu; Jun Zhao; Kang Liu;
473	Developing A Reliable, Fast, General-Purpose Hallucination Detection and Mitigation Service Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce a reliable and high-speed production system aimed at detecting and rectifying the hallucination issue within LLMs.	Song Wang; Xun Wang; Jie Mei; Yujia Xie; Si-Qing Chen; Wayne Xiong;
474	CAVE: Controllable Authorship Verification Explanations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Current offline AV models however have lower downstream utility due to limited accuracy (eg: traditional stylometry AV systems) and lack of accessible post-hoc explanations. In this work, we address the above challenges by developing a trained, offline model CAVE (Controllable Authorship Verification Explanations).	Sahana Ramnath; Kartik Pandey; Elizabeth Boschee; Xiang Ren;
475	Generating Long-form Story Using Dynamic Hierarchical Outlining with Memory-Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, existing methods, including LLMs, rely on rigid outlines or lack macro-level planning, making it difficult to achieve both contextual consistency and coherent plot development in long-form story generation. To address this issues, we propose Dynamic Hierarchical Outlining with Memory-Enhancement long-form story generation method, named DOME, to generate the long-form story with coherent content and plot.	Qianyue Wang; Jinwu Hu; Zhengping Li; Yufeng Wang; Daiyuan Li; Yu Hu; Mingkui Tan;
476	Style Transfer with Multi-iteration Preference Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: Numerous recent techniques for text style transfer characterize their approaches as variants of reinforcement learning and preference optimization. In this work, we consider the relationship between these approaches and a class of optimization approaches developed primarily for (non-neural) statistical machine translation, formerly known as ‘tuning’.	Shuai Liu; Jonathan May;
477	Legal Judgment Prediction Based on Knowledge-enhanced Multi-Task and Multi-Label Text Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we address the challenge of predicting relevant law articles and charges within the framework of legal judgment prediction, treating it as a multi-task and multi-label text classification problem.	Ang Li; Yiquan Wu; Ming Cai; Adam Jatowt; Xiang Zhou; Weiming Lu; Changlong Sun; Fei Wu; Kun Kuang;
478	SUNAR: Semantic Uncertainty Based Neighborhood Aware Retrieval for Complex QA Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we introduce SUNAR, a novel approach that leverages LLMs to guide a Neighborhood Aware Retrieval process.	Venktesh V; Mandeep Rathee; Avishek Anand;
479	Script-Agnosticism and Its Impact on Language Identification for Dravidian Languages Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Moreover, languages with different writing systems do not share significant lexical, semantic, and syntactic properties in neural representation spaces, which is a disadvantage for closely related languages and low-resource languages, especially those from the Indian Subcontinent. To counter this, we propose learning script-agnostic representations using several different experimental strategies (upscaling, flattening, and script mixing) focusing on four major Dravidian languages (Tamil, Telugu, Kannada, and Malayalam).	Milind Agarwal; Joshua Otten; Antonios Anastasopoulos;
480	ConQRet: A New Benchmark for Fine-Grained Automatic Evaluation of Retrieval Augmented Computational Argumentation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: To validate the proposed techniques, we introduce ConQRet, a new benchmark featuring long and complex human-authored arguments on debated topics, grounded in real-world websites, allowing an exhaustive evaluation across retrieval effectiveness, argument quality, and groundedness.	Kaustubh Dhole; Kai Shu; Eugene Agichtein;
481	Knowledge Graph Guided Evaluation of Abstention Techniques Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we focus on evaluating the underlying techniques that cause models to abstain.	Kinshuk Vasisht; Navreet Kaur; Danish Pruthi;
482	Instantly Learning Preference Alignment Via In-context DPO Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose a novel and effective approach for HPA in a tuning-free way, named In-Context Direct Preference Optimization (ICDPO).	Feifan Song; Yuxuan Fan; Xin Zhang; Peiyi Wang; Houfeng Wang;
483	ALERT: An LLM-powered Benchmark for Automatic Evaluation of Recommendation Explanations Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We present ALERT, a model-agnostic recommendation explanation evaluation benchmark.	Yichuan Li; Xinyang Zhang; Chenwei Zhang; Mao Li; Tianyi Liu; Pei Chen; Yifan Gao; Kyumin Lee; Kaize Ding; Zhengyang Wang; Zhihan Zhang; Jingbo Shang; Xian Li; Trishul Chilimbi;
484	PORT: Preference Optimization on Reasoning Traces Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This paper proposes using preference optimization methods on Chain-of-Thought steps in order to improve the mathematical reasoning performances of language models.	Salem Lahlou; Abdalgader Abubaker; Hakim Hacid;
485	MAPWise: Evaluating Vision-Language Models for Advanced Map Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This study investigates the efficacy of VLMs in answering questions based on choropleth maps, which are widely used for data analysis and representation. To facilitate and encourage research in this area, we introduce a novel map-based question-answering benchmark, consisting of maps from three geographical regions (United States, India, China), each containing around 1000 questions.	Srija Mukhopadhyay; Abhishek Rajgaria; Prerana Khatiwada; Manish Shrivastava; Dan Roth; Vivek Gupta;
486	Enhancing Language Model Hypernetworks with Restart: A Study on Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: However, a comprehensive investigation into optimization strategies for hypernetworks remains absent. To address this gap, we analyze the loss landscape of hypernetworks and propose that restart optimization strategies can improve their performance for language models.	Yihan Zhang; Jie Fu; Rongrong Ji; Jie Chen;
487	Token-based Decision Criteria Are Suboptimal in In-context Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: However, we argue that such token-based classification criteria lead to suboptimal decision boundaries, despite delicate calibrations through translation and constrained rotation applied. To address this problem, we propose Hidden Calibration, which renounces token probabilities and uses the nearest centroid classifier on the LM’s last hidden states.	Hakaze Cho; Yoshihiro Sakai; Mariko Kato; Kenshiro Tanaka; Akira Ishii; Naoya Inoue;
488	Beyond The Next Token: Towards Prompt-Robust Zero-Shot Classification Via Efficient Multi-Token Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save	Junlang Qian; Zixiao Zhu; Hanzhang Zhou; Zijian Feng; Zepeng Zhai; Kezhi Mao;
489	Evaluating Contextualized Representations of (Spanish) Ambiguous Words: A New Lexical Resource and Empirical Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We develop a novel dataset of minimal-pair sentences evoking the same or different sense for a target ambiguous noun.	Pamela D Riviere; Anne L. Beatty-Martínez; Sean Trott;
490	Label Drop for Multi-Aspect Relation Modeling in Universal Information Extraction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: While multiple-target instruction UIE allows for the extraction of multiple relations simultaneously, the inclusion of irrelevant relations introduces decision complexity and impacts extraction accuracy. Therefore, for multi-relation extraction, we propose LDNet, which incorporates multi-aspect relation modeling and a label drop mechanism.	Lu Yang; Jiajia Li; En Ci; Lefei Zhang; Zuchao Li; Ping Wang;
491	See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language Balance to Mitigate Dominant Modality Bias Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this study, we analyze model behavior under dominant modality bias and theoretically show that unaligned gradients or differences in gradient magnitudes prevent balanced convergence of the loss.	Junehyoung Kwon; MiHyeon Kim; Eunju Lee; Juhwan Choi; YoungBin Kim;
492	Making Language Models Robust Against Negation Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this work, we propose a self-supervised method to make language models more robust against negation.	MohammadHossein Rezaei; Eduardo Blanco;
493	Beyond Benchmarks: Building A Richer Cross-Document Event Coreference Dataset with Decontextualization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: We introduce a new approach leveraging large language models (LLMs) to decontextualize event mentions, by simplifying the document-level annotation task to sentence pairs with enriched context, enabling the creation of Richer EventCorefBank (RECB), a denser and more expressive dataset annotated at faster speed.	Jin Zhao; Jingxuan Tu; Bingyang Ye; Xinrui Hu; Nianwen Xue; James Pustejovsky;
494	Efficient and Effective Prompt Tuning Via Prompt Decomposition and Compressed Outer Product Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Achieving high efficiency and performance remains an ongoing challenge. To address these issues, we propose a novel Low-parameters Prompt Tuning (LAMP) method, which leverages prompt decomposition and compressed outer product.	Pengxiang Lan; Haoyu Xu; Enneng Yang; Yuliang Liang; Guibing Guo; Jianzhe Zhao; Xingwei Wang;
495	Sociodemographic Prompting Is Not Yet An Effective Approach for Simulating Subjective Judgments with LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Save Highlight: In this study, leveraging the POPQUORN dataset, we evaluate nine popular LLMs on their abilityto understand demographic differences in two subjective judgment tasks: politeness and offensiveness.	Huaman Sun; Jiaxin Pei; Minje Choi; David Jurgens;
496	Aggregation Artifacts in Subjective Tasks Collapse Large Language Models’ Posteriors Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: This limitation is particularly evident in complex subjective domains such as emotion and morality, where priors significantly influence posterior predictions. In this work, we examine whether this is the result of the aggregation used in corresponding datasets, where trying to combine low-agreement, disparate annotations might lead to annotation artifacts that create detrimental noise in the prompt.	Georgios Chochlakis; Alexandros Potamianos; Kristina Lerman; Shrikanth Narayanan;
497	Has This Fact Been Edited? Detecting Knowledge Edits in Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Knowing whether a generated output is based on edited knowledge or first-hand knowledge from pre-training can increase users’ trust in generative models and provide more transparency. Driven by this, we propose a novel task: detecting knowledge edits in language models.	Paul Youssef; Zhixue Zhao; Christin Seifert; Jörg Schlötterer;
498	Pay More Attention to Images: Numerous Images-Oriented Multimodal Summarization Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Considering that most existing metrics evaluate summaries from a unimodal perspective, we propose a new Multimodal Information evaluation (M-info) method, measuring the differences between the generated summary and the multimodal input.	Min Xiao; Junnan Zhu; Feifei Zhai; Chengqing Zong; Yu Zhou;
499	KS-Lottery: Finding Certified Lottery Tickets for Multilingual Transfer in Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: In this paper, we propose KS-Lottery, a method to identify a small subset of LLM parameters highly effective in multilingual fine-tuning.	Fei Yuan; Chang Ma; Shuai Yuan; Qiushi Sun; Lei Li;
500	Can Post-Training Quantization Benefit from An Additional QLoRA Integration? Related Papers Related Patents Related Grants Related Venues Related Experts View Save Highlight: Model compression techniques such as quantization are often leveraged to alleviate resource demand, but they may have a negative impact on the generation quality. In this study, we explore the integration of 4-bit Post-training Quantization (PTQ) with QLoRA to address these issues.	Xiliang Zhu; Elena Khasanova; Cheng Chen;

This table only includes 500 papers selected by our daily digest algorithm. To continue with the full list (~800 papers), please visit Paper Digest: NAACL-2025 (Full List).