External risk intelligence

Apache OpenNLP allows attackers to steal files or trick systems into visiting attacker sites

CVE advisorySeverity: CRITICAL (CVSS 9.1)

CVE-2026-40682

Apache OpenNLP is a software library, not a standalone network service. The vulnerability requires the host application to accept and process user-supplied dictionary files. While reachable in some implementations, it is not a default internet-facing service, and public exposure depends on specific, often controlled, application-level design choices.

XML External Entity Injection

Apache Opennlp

before 2.5.93.0.0

Halo Surface Signal: 2 out of 5 — less likely to be public-facing.

External exposure likelihood

Horizon Alert

Summary of the vulnerability and why it matters

This vulnerability in Apache OpenNLP allows an attacker to exploit how dictionary files are processed, potentially leading to the disclosure of sensitive information or unauthorized actions on the server. This is concerning because the parsing of untrusted dictionary inputs is not adequately secured, creating an opening before any legitimate data is handled.

Data Exposure: Local files could be read.
System Compromise: Unauthorized requests could be made.
Widespread Impact: Affects applications using this library.

Attack Path

How an attacker could exploit the issue

An attacker can weaponize this by crafting a malicious dictionary file containing an XXE payload. When an application using the vulnerable OpenNLP library parses this file, it will trigger the XXE, allowing the attacker to read local files or perform SSRF. This attack path is feasible because the library's documented API directly handles user-supplied dictionaries without sufficient security controls.

Attacker supplies crafted dictionary file.
Vulnerable API parses XML dictionary.
Local file disclosure or SSRF occurs.

Live Threat

Current exploitation, exposure, and threat context

This XML External Entity vulnerability in Apache OpenNLP's DictionaryEntryPersistor allows attackers to read local files or perform server-side requests by supplying a crafted dictionary file. The vulnerability is present in versions before 2.5.9 and 3.0.0-M3, and it can be triggered during the parsing of untrusted input. While the severity is high, the actual threat picture is less clear due to the library's nature.

Exploitation requires application-level design.
No public exploit code observed.
No KEV listing.

Operational Fix

Recommended remediation, mitigation, and detection steps

Prioritize blocking or isolating any service that accepts user-supplied dictionary files to prevent XML external entity injection. This vulnerability allows attackers to read local files or perform server-side requests, posing a significant risk to data confidentiality and integrity. If patching is delayed, implement input validation to reject XML with DOCTYPE declarations before parsing.

Upgrade OpenNLP to 2.5.9 or 3.0.0-M3.
Block or validate untrusted dictionary inputs.
Monitor for suspicious file access or network requests.

Supplementary metadata

CVSS vector

Validate whether this threat affects your internet-facing exposure.

Halo Threat Intelligence helps prioritize remediation with Halo Surface Signal and H/A/L/O context. Start exposure validation with a free external attack surface trial.

Free EASM Trial See How Signal Works

Frequently asked questions

What is Apache OpenNLP used for?

Apache OpenNLP is a machine learning-based toolkit for processing natural language text. It provides APIs for common NLP tasks such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, and parsing. Developers use it to build applications that understand and process human language.

How does CVE-2026-40682 affect Apache OpenNLP?

CVE-2026-40682 is an XML External Entity (XXE) vulnerability. In Apache OpenNLP's DictionaryEntryPersistor, the XML parser can be tricked into processing external entities defined in a malicious dictionary file. This weakness in XML parsing could allow an attacker to read local files or send requests to other servers.

What are the conditions for exploiting CVE-2026-40682?

An attacker must be able to supply a crafted dictionary file containing a malicious DOCTYPE declaration to an application using a vulnerable version of Apache OpenNLP. The vulnerability is triggered when the `Dictionary(InputStream)` constructor, designed for loading user-supplied dictionaries, parses this crafted file.

Who should be concerned about this CVE?

Organizations using Apache OpenNLP in applications that process externally supplied dictionary files should be concerned. While not typically a direct internet-facing service, the Halo Surface Signal indicates that this vulnerability is unlikely to be directly exploitable from the internet without specific application configurations that allow untrusted file uploads.

What is the first step to address CVE-2026-40682?

The primary recommendation is to upgrade Apache OpenNLP. Users of the 2.x versions should upgrade to 2.5.9, and users of the 3.x versions should upgrade to 3.0.0-M3. If immediate upgrading is not possible, developers should validate all incoming dictionary files to reject any containing a DOCTYPE declaration before they are parsed.

References

Written by Nicholas Merritt

Nicholas Merritt is VP of Security Solutions at Halo Security. With more than 25 years across application security, vulnerability prioritization, and offensive security, he helps teams translate threat data into practical exposure validation and remediation focus. He authors Halo Threat Intelligence CVE advisories using Halo Surface Signal and H/A/L/O context to prioritize internet-facing risk.