AI securitysrc: OWASP LLM01

prompt injection

Prompt injection is like tricking a helpful robot into ignoring its original rulebook by giving it a confusing or deceptive command that makes it do something it wasn't supposed to do.

A security vulnerability where an attacker provides malicious input to an LLM, causing the model to disregard its system-level instructions and execute the attacker's unauthorized directives instead. This includes both direct user-input attacks and indirect attacks where the model processes malicious data from external sources.

An LLM01-class security exploit characterized by the manipulation of model inputs to override system-defined constraints and developer-provided instructions. The attack forces the model to prioritize adversarial directives over its intended alignment, manifesting as either direct prompt injection or indirect prompt injection via untrusted data retrieval, effectively subverting the model's intended control flow.

← all terms