MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning
Abstract
To date, there is a notable lack of rigorous benchmarks that assess Multimodal Large Language Models (MLLMs) within the financial domain, a field characterized by specialized financial charts and complex domain-specific expertise. To address this gap, we introduce MME-Finance, the first comprehensive bilingual multimodal benchmark tailored for financial analysis. MME-Finance comprises 4,751 meticulously curated samples, encompassing 2,274 open-ended questions, 2,000 binary-choice questions, and 477 multi-turn questions. To mitigate bias when LLMs act as judges, we also created an evaluation framework that strengthens alignment with human judgments by embedding visual context into the multimodal assessment pipeline. A comprehensive evaluation of 31 popular MLLMs has been conducted to assess their perception, reasoning, and cognitive capabilities. Gemini2.5Pro achieves highest accuracy of 79.28\% and 85.71\% on the open-ended questions and multi-turn questions, respectively. Among open-source models, InternVL3-78B attains 71.24 \% accuracy on the open-ended question, whereas Qwen2.5-VL-72B achieves an F1 score of 88.73 \% on the binary-choice question. The results indicate that state-of-the-art MLLMs demonstrate considerable overall competence, yet exhibit significant deficiencies in fine-grained visual perception and the understanding of domain-specific financial images. Source code is available at https://github.com/HiThink-Research/MME-Finance.