External risk intelligence

Apache OpenNLP allows attackers to steal files or trick systems into visiting attacker sites

CVE advisorySeverity: CRITICAL (CVSS 9.1)

CVE-2026-40682

By submitting a manipulated dictionary file to Apache OpenNLP, an external attacker can trick the system into revealing sensitive files or internal network data. This creates a risk of unauthorized access to business credentials and configuration information.

2Halo Surface Signal

XML External Entity Injection

Apache Opennlp

before 2.5.93.0.0

External exposure likelihood

Halo Surface Signal score for CVE-2026-40682

Apache OpenNLP is a software library, not a standalone network service. The vulnerability requires the host application to accept and process user-supplied dictionary files. While reachable in some implementations, it is not a default internet-facing service, and public exposure depends on specific, often controlled, application-level design choices.

Horizon Alert

Summary of the vulnerability and why it matters

This vulnerability in Apache OpenNLP allows an attacker to exploit how dictionary files are processed, potentially leading to the disclosure of sensitive information or unauthorized actions on the server. This is concerning because the parsing of untrusted dictionary inputs is not adequately secured, creating an opening before any legitimate data is handled.

  • Data Exposure: Local files could be read.
  • System Compromise: Unauthorized requests could be made.
  • Widespread Impact: Affects applications using this library.

Attack Path

How an attacker could exploit the issue

An attacker can weaponize this by crafting a malicious dictionary file containing an XXE payload. When an application using the vulnerable OpenNLP library parses this file, it will trigger the XXE, allowing the attacker to read local files or perform SSRF. This attack path is feasible because the library's documented API directly handles user-supplied dictionaries without sufficient security controls.

  • Attacker supplies crafted dictionary file.
  • Vulnerable API parses XML dictionary.
  • Local file disclosure or SSRF occurs.

Live Threat

Current exploitation, exposure, and threat context

This XML External Entity vulnerability in Apache OpenNLP's DictionaryEntryPersistor allows attackers to read local files or perform server-side requests by supplying a crafted dictionary file. The vulnerability is present in versions before 2.5.9 and 3.0.0-M3, and it can be triggered during the parsing of untrusted input. While the severity is high, the actual threat picture is less clear due to the library's nature.

  • Exploitation requires application-level design.
  • No public exploit code observed.
  • No KEV listing.

Priority actions

Operational Fix

Recommended remediation, mitigation, and detection steps

Prioritize blocking or isolating any service that accepts user-supplied dictionary files to prevent XML external entity injection. This vulnerability allows attackers to read local files or perform server-side requests, posing a significant risk to data confidentiality and integrity. If patching is delayed, implement input validation to reject XML with DOCTYPE declarations before parsing.

  • Upgrade OpenNLP to 2.5.9 or 3.0.0-M3.
  • Block or validate untrusted dictionary inputs.
  • Monitor for suspicious file access or network requests.

Frequently asked questions

What is Apache OpenNLP used for?

Apache OpenNLP is a machine learning-based toolkit for processing natural language text. It provides APIs for common NLP tasks such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, and parsing. Developers use it to build applications that understand and process human language.

How does CVE-2026-40682 affect Apache OpenNLP?

CVE-2026-40682 is an XML External Entity (XXE) vulnerability. In Apache OpenNLP's DictionaryEntryPersistor, the XML parser can be tricked into processing external entities defined in a malicious dictionary file. This weakness in XML parsing could allow an attacker to read local files or send requests to other servers.

What are the conditions for exploiting CVE-2026-40682?

An attacker must be able to supply a crafted dictionary file containing a malicious DOCTYPE declaration to an application using a vulnerable version of Apache OpenNLP. The vulnerability is triggered when the `Dictionary(InputStream)` constructor, designed for loading user-supplied dictionaries, parses this crafted file.

Who should be concerned about this CVE?

Organizations using Apache OpenNLP in applications that process externally supplied dictionary files should be concerned. While not typically a direct internet-facing service, the Halo Surface Signal indicates that this vulnerability is unlikely to be directly exploitable from the internet without specific application configurations that allow untrusted file uploads.

What is the first step to address CVE-2026-40682?

The primary recommendation is to upgrade Apache OpenNLP. Users of the 2.x versions should upgrade to 2.5.9, and users of the 3.x versions should upgrade to 3.0.0-M3. If immediate upgrading is not possible, developers should validate all incoming dictionary files to reject any containing a DOCTYPE declaration before they are parsed.

References