All articlesAI Security

Indirect Prompt Injection Is in the Wild — Just Not Very Good at It Yet

May 2, 20266 min readMaverc Threat Research · AI Security Practice
Artificial IntelligenceThreat AdvisoryCloud SecurityEmail Security
Indirect Prompt Injection Is in the Wild — Just Not Very Good at It Yet

Google scanned a slice of the public web for prompt injection payloads aimed at AI assistants. Most of what they found is amateur hour, but the volume jumped 32 percent in three months and the curve is pointing up.

Google's threat intelligence team published the first large-scale measurement of indirect prompt injection attempts found on the open web, and the results are useful exactly because they are unglamorous. The attacks are out there. Most are clumsy. The trend line is the part that matters: a 32 percent increase in malicious injection attempts between November 2025 and February 2026, with no sign of slowing.

For defenders standing up AI assistants, agents, or retrieval-augmented workflows, the takeaway is not to panic. It is to build the controls now, while the attacker pool is still figuring out how to monetize the technique.

Direct vs. indirect injection — and why the indirect kind is the dangerous one

Direct prompt injection is what most people picture: a user types a clever sentence into a chatbot to talk it out of its safety rails. It is essentially a jailbreak, scoped to one session and one user.

Indirect prompt injection is the structural problem. The malicious instructions are not typed by the user — they are buried inside the data the AI is asked to read. A summarization assistant fetches a webpage, an agent reads an email, a coding copilot pulls a README. Hidden in that content is text written for the model rather than the human, telling it to ignore its actual task and do something the attacker wants instead. The user never sees it. The model often cannot tell the difference between content and instruction.

That is the class of attack Google went looking for in the wild.

What Google actually found

The team scanned snapshots of public web pages from Common Crawl for known prompt injection patterns, then used Gemini and human reviewers to filter out false positives. The matches sorted into a few buckets:

  • Pranks. Pages telling visiting AI assistants to act like a baby bird, write in pirate voice, or otherwise behave absurdly. Harmless and surprisingly common.
  • Defensive injection. Site owners trying to deter AI crawlers and summarizers — for example, telling assistants the page is sensitive or copyrighted and should not be summarized.
  • SEO manipulation. Instructions telling AI assistants to claim the host company is the best in its category. Black-hat SEO adapted for the LLM era.
  • Helpful guidance. Instructions actually intended to assist legitimate AI workflows — for instance, telling a summarizer which sections of a long document are most important.
  • Malicious attacks. The category that matters for security teams. Two subtypes were observed in the wild.

The malicious attacks split between exfiltration — prompts trying to coerce an assistant into collecting user data such as IPs and credentials and forwarding them to an attacker-controlled email address — and destruction, where prompts attempted to instruct an AI to delete files on the user's machine.

Google's headline finding on these was that sophistication is low. Researchers did not see widespread use of the more advanced exfiltration techniques that academic and offensive security teams published throughout 2025. The conclusion: attackers have not yet productionized the research. They are throwing public-domain payloads at the wall and seeing what sticks.

But the volume of those attempts grew 32 percent in three months. The capability is being adopted, just not yet refined.

Why "low sophistication" is the wrong thing to focus on

Two observations from the Maverc AI security practice are worth pairing with Google's data:

First, the attacks do not need to be sophisticated to work against weak deployments. An enterprise assistant that ingests inbound email content into a Gemini or GPT context window with no isolation between data and instruction is vulnerable to a payload that any sixteen-year-old can copy from a research paper. Sophistication is a property of the attack relative to the defense. Most current deployments are not defended at all.

Second, the gap between "no productionized attacks" and "productionized attacks" historically closes in months, not years, once a financial motive lands. Phishing kits, infostealers, and adversary-in-the-middle proxies all followed the same arc. Treat the current low-sophistication window as the planning runway.

Defensive controls that actually move the needle

Indirect prompt injection cannot be eliminated by prompt engineering alone. The instruction-vs-data confusion is intrinsic to how current LLMs read input. Defense lives in the architecture around the model, not inside the model.

  • Treat all retrieved content as untrusted input. The text returned from a web fetch, an email body, a SharePoint document, or a database row should be wrapped, marked, and never given the same authority as the operator's system prompt. Use structural separation in the prompt, not just polite instructions.
  • Constrain agent tools by capability, not by intention. If your agent has the ability to send email, exfiltrate to email is one prompt away. Tool surfaces should be scoped per-task, with allowlists for destinations, rate limits, and human-in-the-loop confirmation for any action that touches money, identity, or external communication.
  • Strip or sandbox active content before retrieval. HTML comments, hidden CSS, white-on-white text, alt attributes, and metadata are common smuggling channels for injection payloads. Render to plain text with a parser you control before the content ever reaches the model.
  • Log the full prompt, retrieval set, and tool calls. You cannot detect injection abuse without reconstruction. Most production AI systems today log the user message and the final answer and nothing in between. That is the equivalent of a SIEM that only stores the alert title.
  • Run red team exercises on the assistant as a system, not on the model in isolation. The model's safety classifier is one control among many. Test the end-to-end flow — including the data sources the assistant is allowed to read — against a payload library that includes known indirect injection techniques.
  • Apply the same data classification rules to AI context windows that apply to email and chat. If CUI, PII, or trade secrets are not allowed to be pasted into ChatGPT, they should not be silently retrieved into a Gemini-backed internal assistant either. The control plane is governance, not the model vendor.

What this means for regulated environments

For organizations operating under CMMC, HIPAA, ITAR, or FedRAMP authorization boundaries, indirect prompt injection is rapidly becoming a control category that auditors will ask about. The current NIST AI Risk Management Framework already enumerates prompt injection as a known adversarial machine learning risk. Expect that framing to migrate into baseline expectations the same way phishing-resistant MFA did over the last three years.

If you are running AI assistants in or near a CUI enclave, the safest assumption is that any external content the assistant can read is a potential attack surface, and the architecture must enforce that assumption mechanically, not by policy.

How Maverc helps

Maverc's AI security practice runs threat-led assessments of generative AI deployments — assistant architectures, agent frameworks, retrieval-augmented systems, and the identity and tool-use layers underneath. We map indirect prompt injection paths, build detection content for the SIEM, and help clients design context isolation that holds up when the attack volume catches up with the research. The window to do that work without an active incident on your desk is open right now. It will not stay open forever.