New Attacks Make OpenClaw Run Code and Leak Secrets

Research shows OpenClaw AI agent can be tricked into code execution and data leaks

Two separate teams just proved it: OpenClaw, the popular self-hosted AI agent, can be pushed to execute attacker-controlled code or leak sensitive data through inputs that look routine.

Imperva hid instructions inside shared contacts, vCards, and location pins—payloads the victim never saw but the agent executed. Varonis built a test agent, gave it a mailbox of realistic synthetic business data, and watched a single plain email convince it to forward mock AWS keys and a fake customer export to an external address.

OpenClaw patched Imperva’s specific issue in version 2026.4.23—update now if you run it. The phishing weakness Varonis demonstrated isn’t a one-off bug; it’s an autonomy problem. Limit what the agent can do without human approval.

Different doors, same room: the agent trusts what reaches it, and its access becomes the attacker’s.

Hidden commands in a shared contact

Imperva researcher Yohann Sillam examined how OpenClaw hands messaging data to its model. The flaw lives in the plumbing.

When OpenClaw passes a shared contact, vCard, or location to the LLM, it flattens the object directly into the prompt text with no boundary marking it untrusted. (Content fetched from the web is wrapped with an untrusted-content marker. Message objects are not.)

Only some fields are sent to the model, and that’s the pivot point. A shared contact sends just the name field, serialized as <contact: name, number>. Because angle brackets are valid in a name, the model can’t reliably tell where the real name ends and the injected instruction begins. The UI truncates long names in both WhatsApp and the receiving app, so the victim never sees the hidden payload.

The same technique works via a vCard’s full-name field—supported natively by WhatsApp—and the label on a shared location pin.

In Imperva’s tests against Gemini 3.1 Pro (preview build), the hidden text instructed the agent to download and run a script from a researcher-controlled server. It did. A plain image with embedded instructions failed—likely because models have been trained against that pattern—but message-object injections worked because models have seen fewer examples.

With OpenClaw’s memory on by default, Imperva warns, one widely shared item carrying a hidden instruction could silently compromise agents that ingest it—if they’re not sandboxed.

Imperva disclosed the issue, and OpenClaw shipped a fix in 2026.4.23 that moves contact names, vCard fields, and location labels out of the prompt body and into a separate untrusted-metadata channel. Imperva also found the same flattening pattern in other personal AI assistants; the underlying risk isn’t unique to OpenClaw.

A normal email is enough

Varonis Threat Labs approached from the social side. Led by Itay Yashar, the team built an agent called Pinchy on OpenClaw, connected it to a Gmail inbox full of realistic synthetic business clutter and mock secrets, and ran four phishing simulations on Google Gemini 3.1 Pro and OpenAI Codex GPT-5.4.

They draw a line: prompt injection hides instructions inside data; agent phishing is a believable request through a normal channel that works because the agent acts before verifying who sent it.

The agent failed both exfiltration tests. First, an email posing as a team lead named Dan—sent from an external Gmail address—asked for staging access during a fake production incident. Pinchy found the credentials and forwarded mock AWS IAM access keys, database connection strings, and SSH credentials in plaintext.

Second, a routine request for the weekly customer export—supposedly for a QBR deck—led the agent to ship a synthetic dataset of 247 enterprise customers, including contacts and contract values. Both failures occurred under a strict profile instructing the agent to verify senders first. The rule existed. Urgency beat it once; routine beat it the second time.

The agent held up better against technical traps. It interacted with a gift-card phishing page but withheld real credentials and later flagged it; the strict profile blocked the page outright. Faced with a malicious OAuth consent screen dressed as a timesheet app, it inspected the redirect target, judged it suspicious, and stopped before granting access.

Varonis notes that OpenAI Codex GPT-5.4 was more cautious than Gemini 3.1 Pro about entering or sending data to external sites without confirmation—but both models fell for the social pretexts.

Varonis phishing simulation prompt leading to data exfiltration by an OpenClaw agent

The weak spot behind both attacks

Varonis maps both results to what Simon Willison calls the lethal trifecta: an agent that can read private data, ingest untrusted content, and send data back out. OpenClaw has all three—so a poisoned contact and a friendly email land in the same place.

The trust boundary isn’t just a prompt issue; it shows up in code. A separate InfoSec Write-ups analysis turned past OpenClaw advisories into static-analysis rules and uncovered five more flaws across the Slack, Discord, Matrix, Zalo, and Microsoft Teams channel extensions.

All five shared the same bug: startup code resolved each channel’s allowlist by mutable display name instead of a stable ID. An attacker could rename themselves to match an allowed user, slip onto the list, and steer the agent. OpenClaw has patched these issues.

OpenClaw ships with broad access to files, shells, and more than twenty messaging platforms, and it has drawn a steady run of earlier prompt-injection and data-exfiltration warnings since launch late last year. The Dutch data protection authority, the Autoriteit Persoonsgegevens, advised users and organisations not to run OpenClaw on systems holding sensitive data, citing data-breach and account-takeover risks.

What to do about it

Patch first: update to 2026.4.23 or later for the message-object fix. The rest is architecture, not clever prompts. Varonis recommends four controls:

Treat the agent’s instruction file as enforced, version-controlled policy—not a suggestion.
Gate outbound mail: block first-time sends to unfamiliar addresses without approval, so a hijacked agent can’t relay phishing from a trusted account.
Scope connectors to the trust of the trigger: an inbox that handles outside email shouldn’t also read the entire CRM.
Require a human for the riskiest actions: forwarding credentials or moving money.

Both teams converge on the same model. Varonis: treat the agent like a junior employee with system access and no instinct for what looks off—not a security tool. Imperva: an authenticated executor that trusts its inputs.

Today’s fixes are patches and guardrails. The hard problem remains: an agent useful enough to act on your email and run your commands is, by design, one that trusts input and wants to help. There’s no general solution for that—yet.

Reference: View article