External risk intelligence

Attackers can take over SGLang systems by loading a malicious model file.

CVE advisorySeverity: CRITICAL (CVSS 9.8)

CVE-2026-5760

The vulnerability exists in an API endpoint (/v1/rerank) of an LLM inference framework. These services are typically deployed as backend infrastructure components for AI applications, often residing behind internal networks or proxies, though they are plausibly reachable in some deployment scenarios where the API is exposed to the network.

Code Injection

Lmsys Sglang

before 0.5.11

Halo Surface Signal: 3 out of 5 — possibly public-facing.

External exposure likelihood

Horizon Alert

Summary of the vulnerability and why it matters

This vulnerability in SGLang could allow an attacker to execute arbitrary code on your systems. It happens when a malicious model file is loaded, and the system's template rendering is not properly secured. This is a serious issue because it can lead to a complete compromise of the affected server.

Critical severity and network access.
Could impact AI services and applications.
Requires loading a malicious model file.

Attack Path

How an attacker could exploit the issue

An attacker can achieve remote code execution by tricking the SGLang service into loading a malicious model file. This file contains a specially crafted tokenizer chat template that, when processed by the unsandboxed Jinja2 environment, allows the attacker to run arbitrary code on the server.

No authentication required.
Targets the `/v1/rerank` endpoint.
Requires loading a malicious model.

Live Threat

Current exploitation, exposure, and threat context

This vulnerability allows for Remote Code Execution (RCE) by loading a model with a malicious tokenizer, leveraging an unsandboxed Jinja2 environment. While the impact is severe, threat actor interest may be tempered by the specific nature of the dependency on model files and the potential for detection in production LLM inference environments.

RCE vulnerability in LLM framework.
Requires loading a crafted model file.
No public exploit code observed yet.

Operational Fix

Recommended remediation, mitigation, and detection steps

Teams should prioritize blocking or isolating services exposed to the network that utilize SGLang's reranking endpoint, especially if model files are loaded from untrusted sources, to prevent critical RCE. Since a patch is available, apply it immediately to all affected systems.

Apply patch from [https://github.com/sgl-project/sglang/pull/23660](https://github.com/sgl-project/sglang/pull/23660).
If patching is delayed, restrict network access to the rerank endpoint.
Monitor for suspicious requests to the `/v1/rerank` endpoint.

Supplementary metadata

CVSS vector

Validate whether this threat affects your internet-facing exposure.

Halo Threat Intelligence helps prioritize remediation with Halo Surface Signal and H/A/L/O context. Start exposure validation with a free external attack surface trial.

Free EASM Trial See How Signal Works

Frequently asked questions

What is SGLang and its function in AI development?

SGLang is an open-source framework designed for developing and deploying large language models (LLMs). It empowers developers to efficiently manage and run various language models, facilitating the creation of AI applications and services.

How does CVE-2026-5760 lead to Remote Code Execution?

CVE-2026-5760 is a critical vulnerability classified under CWE-94, Improper Neutralization of Special Elements used in a Command. It allows attackers to achieve Remote Code Execution (RCE) by exploiting SGLang's reranking endpoint (/v1/rerank). This is possible when a malicious model file, containing a specially crafted tokenizer chat template, is loaded and processed by an unsandboxed Jinja2 environment.

What is the trigger path for CVE-2026-5760's RCE?

The RCE vulnerability is triggered when an attacker entices SGLang to load a model file that includes a malicious tokenizer chat template. This template is then rendered within an insecure, unsandboxed Jinja2 environment, enabling the attacker to execute arbitrary code on the targeted system.

What is the relevance of CVE-2026-5760 to AI services?

This vulnerability poses a significant risk to AI services and applications that rely on SGLang for LLM inference. The potential for RCE could lead to a complete compromise of the backend infrastructure supporting these services. Halo classifies this CVE as external due to its network attack vector.

What is the recommended response to CVE-2026-5760?

To mitigate CVE-2026-5760, users should immediately apply the patch available via GitHub. If immediate patching is not feasible, it is crucial to restrict network access to the /v1/rerank endpoint and monitor for any suspicious activity. Loading model files from untrusted sources should be strictly avoided.

References

Written by Nicholas Merritt

Nicholas Merritt is VP of Security Solutions at Halo Security. With more than 25 years across application security, vulnerability prioritization, and offensive security, he helps teams translate threat data into practical exposure validation and remediation focus. He authors Halo Threat Intelligence CVE advisories using Halo Surface Signal and H/A/L/O context to prioritize internet-facing risk.