| 1 |
Agentic Multi-Agent Exploitation |
Agentic AI |
Exploiting inter-agent trust boundaries so that a malicious payload, initially rejected by one LLM agent, is processed if delivered via another trusted agent, including privilege escalation and cross-agent command execution. |
| 2 |
RAG/Embedding Backdoor Attacks |
Agentic AI |
Attacking LLMs with manipulated embedded documents retrieved during RAG, including poisoning vector DBs to force undesirable completions or disclosures. |
| 3 |
System Prompt Leakage & Reverse Engineering |
Prompt-Based |
Forcing disclosure or deducing proprietary system prompts to subvert guardrails and expose internal instructions. |
| 4 |
LLM Tooling/Plugin Supply Chain Attacks |
Supply Chain |
Compromising the ecosystem via malicious plugins, infected models from public repos, or tainted integrations. |
| 5 |
Excessive Agency/Autonomy Attacks |
Agentic AI |
Exploiting/abusing LLM agent autonomy to perform unintended actions, escalate privileges, or cause persistent automated damage in agentic workflows. |
| 6 |
Unbounded Resource Consumption ("Denial of Wallet") |
Resource Exhaustion |
Manipulating LLM behavior to consume excessive external/cloud resources, raising costs or disrupting operations. |
| 7 |
Cross-Context Federation Leaks |
Data Exfiltration |
Leveraging federated information contexts or cross-source retrievals to exfiltrate data by manipulating the model's knowledge context. |
| 8 |
Vector Database Poisoning |
Foundational |
Polluting indexing/embedding layers to disrupt or manipulate downstream LLM generations or leak/hallucinate info. |
| 9 |
Adversarial Examples |
Input Manipulation |
Crafty manipulations of input data that trick models into making incorrect predictions, potentially leading to harmful decisions. |
| 10 |
Data Poisoning |
Foundational |
Malicious data injections into the training set that corrupt the model's performance, causing biased or incorrect behavior. |
| 11 |
Model Inversion Attacks |
Privacy |
Inferring the input values used to train the model, exposing sensitive information. |
| 12 |
Membership Inference Attacks |
Privacy |
Determining whether specific data points were part of the model's training set, leading to privacy breaches. |
| 13 |
Query Manipulation Attacks |
Prompt-Based |
Crafting malicious queries that cause the model to reveal unintended information or behave undesirably. |
| 14 |
Model Extraction Attacks |
IP Theft |
Reverse-engineering the model by querying it to construct a copy, resulting in intellectual property theft. |
| 15 |
Transfer Learning Attacks |
Foundational |
Exploiting vulnerabilities in the transfer learning process to manipulate model performance on new tasks. |
| 16 |
Federated Learning Attacks |
Foundational |
Compromising client devices or server-side data in federated learning setups to corrupt the global model or extract sensitive information. |
| 17 |
Edge AI Attacks |
Hardware / Deployment |
Targeting edge devices running AI models to exfiltrate data or manipulate behavior. |
| 18 |
IoT AI Attacks |
Hardware / Deployment |
Attacking IoT devices using AI, potentially leading to data breaches or unauthorized control. |
| 19 |
Prompt Injection Attacks |
Prompt-Based |
Manipulating input prompts in conversational AI to bypass safety measures or extract confidential information. |
| 20 |
Indirect Prompt Injection |
Prompt-Based |
Exploiting vulnerabilities in systems integrating LLMs to inject malicious prompts indirectly. |
| 21 |
Model Fairness Attacks |
Foundational / Bias |
Intentionally biasing the model by manipulating input data, affecting fairness and equity. |
| 22 |
Model Explainability Attacks |
Evasion |
Designing inputs that make model decisions difficult to interpret, hindering transparency. |
| 23 |
Robustness Attacks |
Evasion |
Testing the model's resilience by subjecting it to various perturbations to find weaknesses. |
| 24 |
Security Attacks |
General |
Compromising the confidentiality, integrity, or availability of the model and its outputs. |
| 25 |
Integrity Attacks |
Foundational |
Tampering with the model's architecture, weights, or biases to alter behavior without authorization. |
| 26 |
Jailbreaking Attacks |
Prompt-Based |
Attempting to circumvent the ethical constraints or content filters in an LLM. |
| 27 |
Training Data Extraction |
Privacy |
Inferring specific data used to train the model through carefully crafted queries. |
| 28 |
Synthetic Data Generation Attacks |
Foundational |
Creating synthetic data designed to mislead or degrade AI model performance. |
| 29 |
Model Stealing from Cloud |
IP Theft |
Extracting a trained model from a cloud service without direct access. |
| 30 |
Model Poisoning from Edge |
Foundational |
Introducing malicious data at edge devices to corrupt model behavior. |
| 31 |
Model Drift Detection Evasion |
Evasion |
Evading mechanisms that detect when a model's performance degrades over time. |
| 32 |
Adversarial Example Generation with Deep Learning |
Input Manipulation |
Using advanced techniques to create adversarial examples that deceive the model. |
| 33 |
Model Reprogramming |
Foundational |
Repurposing a model for a different task, potentially bypassing security measures. |
| 34 |
Thermal Side-Channel Attacks |
Side-Channel / Hardware |
Using temperature variations in hardware during model inference to infer sensitive information. |
| 35 |
Transfer Learning Attacks from Pre-Trained Models |
Foundational |
Poisoning pre-trained models to influence performance when transferred to new tasks. |
| 36 |
Model Fairness and Bias Detection Evasion |
Evasion / Bias |
Designing attacks to evade detection mechanisms monitoring fairness and bias. |
| 37 |
Model Explainability Attack |
Evasion |
Attacking the model's interpretability to prevent users from understanding its decision-making process. |
| 38 |
Deepfake Attacks |
Multimodal / Output Manip. |
Creating realistic fake audio or video content to manipulate events or conversations. |
| 39 |
Cloud-Based Model Replication |
IP Theft |
Replicating trained models in the cloud to develop competing products or gain unauthorized insights. |
| 40 |
Confidentiality Attacks |
Privacy |
Extracting sensitive or proprietary information embedded within the model's parameters. |
| 41 |
Quantum Attacks on LLMs |
Theoretical / Cryptographic |
Using quantum computing to theoretically compromise the security of LLMs or their cryptographic protections. |
| 42 |
Model Stealing from Cloud with Pre-Trained Models |
IP Theft |
Extracting pre-trained models from the cloud without direct access. |
| 43 |
Transfer Learning Attacks with Edge Devices |
Foundational / Hardware |
Compromising knowledge transferred to edge devices. |
| 44 |
Adversarial Example Generation with Model Inversion |
Input Manipulation |
Creating adversarial examples using model inversion techniques. |
| 45 |
Backdoor Attacks |
Foundational |
Embedding hidden behaviors within the model triggered by specific inputs. |
| 46 |
Watermarking Attacks |
Evasion / IP Theft |
Removing or altering watermarks protecting intellectual property in AI models. |
| 47 |
Neural Network Trojans |
Foundational |
Embedding malicious functionalities within the model triggered under certain conditions. |
| 48 |
Model Black-Box Attacks |
General |
Exploiting the model using input-output queries without internal knowledge. |
| 49 |
Model Update Attacks |
Foundational |
Manipulating the model during its update process to introduce vulnerabilities. |
| 50 |
Gradient Inversion Attacks |
Privacy |
Reconstructing training data by exploiting gradients in federated learning. |
| 51 |
Side-Channel Timing Attacks |
Side-Channel / Hardware |
Inferring model parameters or training data by measuring computation times during inference. |
| 52 |
Adversarial Suffix |
Prompt-Based |
Appending a specifically crafted, often nonsensical string to a harmful prompt to cause the model to disregard its safety instructions. |
| 53 |
Prefix Injection & Refusal Suppression |
Prompt-Based |
Forcing a model's response to start with an affirmative phrase or explicitly instructing it not to use refusal phrases to lower its defenses. |
| 54 |
Encoding Obfuscation |
Prompt-Based |
Hiding a malicious payload in an encoded format (e.g., Base64, Hex) that the LLM is instructed to decode and then execute, bypassing text-based filters. |
| 55 |
Payload Splitting |
Prompt-Based |
Breaking a malicious instruction into multiple, individually benign parts and asking the model to reassemble and execute them, bypassing filters that check instructions in isolation. |
| 56 |
Markup Language Abuse |
Prompt-Based |
Using structured data formats like Markdown or HTML to create ambiguity between system instructions and user input, potentially causing the model to execute instructions with higher privilege. |
| 57 |
Prompt Recursive Injection |
Prompt-Based |
Crafting prompts that recursively redefine instructions and cause infinite loops or privilege escalation. |
| 58 |
Multi-Modal Adversarial Attacks |
Multimodal |
Exploiting vulnerabilities in models that process both text and images/audio by injecting adversarial perturbations across modalities. |
| 59 |
Reinforcement Learning from Human Feedback (RLHF) Poisoning |
Foundational |
Attacking the feedback loops used for alignment to bias the model or weaken safety training. |
| 60 |
Chain-of-Thought (CoT) Leakage |
Prompt-Based |
Forcing the model to reveal hidden reasoning traces, which may contain sensitive or filtered knowledge. |
| 61 |
Model Compression/Distillation Attacks |
Foundational |
Exploiting vulnerabilities during model compression/distillation to introduce backdoors or reduce robustness. |
| 62 |
Transferability Exploits |
Foundational |
Using adversarial examples crafted for one model to fool another (cross-model attacks). |
| 63 |
Prompt Reset / Separator Injection |
Prompt-Based |
Injecting tokens or patterns that trick the model into resetting context or ignoring prior instructions. |
| 64 |
Shadow Model Exploitation |
IP Theft / Model Extraction |
Building a parallel "shadow" model via query logging and then exploiting it to predict or exfiltrate target model behavior. |
| 65 |
Retrieval Data Exfiltration |
Data Exfiltration |
Crafting queries that force the LLM to retrieve and output sensitive data from connected corpora or knowledge bases. |
| 66 |
Long-Context Window Overload |
Resource Exhaustion |
Flooding the model with extremely long context input to bypass filters or degrade performance, potentially causing memory leaks or dropping safety filters. |
| 67 |
Fine-Tuning Data Injection |
Foundational |
Poisoning during fine-tuning (instruction tuning, RLHF, or supervised fine-tuning) to inject malicious capabilities or suppress safety. |
| 68 |
Semantic Perturbation Attacks |
Input Manipulation |
Altering benign-looking input with synonyms, typos, or semantic shifts that trick LLMs into misclassification or harmful behavior. |
| 69 |
Context Switching Attacks |
Prompt-Based |
Tricking the model into switching "roles" or contexts mid-conversation, overriding safety policies. |
| 70 |
Model Distillation IP Theft |
IP Theft |
Extracting distilled student models that replicate proprietary teacher model behavior, leaking IP. |
| 71 |
Hybrid Supply Chain Attacks |
Supply Chain |
Combining poisoned datasets, compromised plugins, and adversarial fine-tunes to inject coordinated backdoors across AI pipelines. |