External risk intelligence

Attackers can take over SGLang systems by loading a malicious model file.

CVE advisorySeverity: CRITICAL (CVSS 9.8)

CVE-2026-5760

An external attacker can exploit SGLang by sending malicious instructions to the reranking feature. This flaw allows them to execute unauthorized code on the server, potentially stealing sensitive configuration data or gaining full control over the host system.

3Halo Surface Signal

Code Injection

Lmsys Sglang

before 0.5.11

External exposure likelihood

Halo Surface Signal score for CVE-2026-5760

The vulnerability exists in an API endpoint (/v1/rerank) of an LLM inference framework. These services are typically deployed as backend infrastructure components for AI applications, often residing behind internal networks or proxies, though they are plausibly reachable in some deployment scenarios where the API is exposed to the network.

Horizon Alert

Summary of the vulnerability and why it matters

This vulnerability in SGLang could allow an attacker to execute arbitrary code on your systems. It happens when a malicious model file is loaded, and the system's template rendering is not properly secured. This is a serious issue because it can lead to a complete compromise of the affected server.

  • Critical severity and network access.
  • Could impact AI services and applications.
  • Requires loading a malicious model file.

Attack Path

How an attacker could exploit the issue

An attacker can achieve remote code execution by tricking the SGLang service into loading a malicious model file. This file contains a specially crafted tokenizer chat template that, when processed by the unsandboxed Jinja2 environment, allows the attacker to run arbitrary code on the server.

  • No authentication required.
  • Targets the `/v1/rerank` endpoint.
  • Requires loading a malicious model.

Live Threat

Current exploitation, exposure, and threat context

This vulnerability allows for Remote Code Execution (RCE) by loading a model with a malicious tokenizer, leveraging an unsandboxed Jinja2 environment. While the impact is severe, threat actor interest may be tempered by the specific nature of the dependency on model files and the potential for detection in production LLM inference environments.

  • RCE vulnerability in LLM framework.
  • Requires loading a crafted model file.
  • No public exploit code observed yet.

Priority actions

Operational Fix

Recommended remediation, mitigation, and detection steps

Teams should prioritize blocking or isolating services exposed to the network that utilize SGLang's reranking endpoint, especially if model files are loaded from untrusted sources, to prevent critical RCE. Since a patch is available, apply it immediately to all affected systems.

  • Apply patch from [https://github.com/sgl-project/sglang/pull/23660](https://github.com/sgl-project/sglang/pull/23660).
  • If patching is delayed, restrict network access to the rerank endpoint.
  • Monitor for suspicious requests to the `/v1/rerank` endpoint.

Frequently asked questions

What is SGLang and its function in AI development?

SGLang is an open-source framework designed for developing and deploying large language models (LLMs). It empowers developers to efficiently manage and run various language models, facilitating the creation of AI applications and services.

How does CVE-2026-5760 lead to Remote Code Execution?

CVE-2026-5760 is a critical vulnerability classified under CWE-94, Improper Neutralization of Special Elements used in a Command. It allows attackers to achieve Remote Code Execution (RCE) by exploiting SGLang's reranking endpoint (/v1/rerank). This is possible when a malicious model file, containing a specially crafted tokenizer chat template, is loaded and processed by an unsandboxed Jinja2 environment.

What is the trigger path for CVE-2026-5760's RCE?

The RCE vulnerability is triggered when an attacker entices SGLang to load a model file that includes a malicious tokenizer chat template. This template is then rendered within an insecure, unsandboxed Jinja2 environment, enabling the attacker to execute arbitrary code on the targeted system.

What is the relevance of CVE-2026-5760 to AI services?

This vulnerability poses a significant risk to AI services and applications that rely on SGLang for LLM inference. The potential for RCE could lead to a complete compromise of the backend infrastructure supporting these services. Halo classifies this CVE as external due to its network attack vector.

What is the recommended response to CVE-2026-5760?

To mitigate CVE-2026-5760, users should immediately apply the patch available via GitHub. If immediate patching is not feasible, it is crucial to restrict network access to the /v1/rerank endpoint and monitor for any suspicious activity. Loading model files from untrusted sources should be strictly avoided.

References