Skip to main content

Command Palette

Search for a command to run...

AI-Powered Vulnerability Hunting in WordPress Plugins/Themes

Updated
17 min read
AI-Powered Vulnerability Hunting in WordPress Plugins/Themes
<7 days spare time 100 plugins scanned 524 candidate findings 16 confirmed vulns 5 scanner patches

This is not a vulnerability disclosure. It's a methodology. I want to share how to build an AI pipeline that lets a single part-time researcher cover dozens to hundreds of WordPress plugins at once, so the "16" today becomes "60" next month as the farm keeps learning.

Quick setup: I pulled 100 random plugins from WordPress.org with ≥10,000 active installs (via the plugins/info/1.2 API, sorted by popularity), threw them into the pipeline, walked away. In under a week of spare time, the farm returned:

12 SQL Injection 1 Path Traversal 1 Insecure Deserialization 1 Broken Authentication 1 Stored XSS

16 confirmed vulnerabilities across 15 different plugins. They're sitting in the responsible-disclosure pipeline right now, so no plugin names in this post. What matters more than the count: I don't grep manually anymore. I type /scan-targets, /triage-findings, /verify-vulnerabilities, then go do other work. Next morning I have verdicts, cross-file taint chains, and HTTP PoCs ready to replay through Burp.


Why "just ask Claude to scan and find bugs" doesn't scale

The naive first attempt everyone tries: dump the whole plugin directory into the prompt with "find vulnerabilities". I tried this. It fails in three ways:

The thing that finally clicked: enumeration and judgment are two completely different problems. Listing every spot that calls $wpdb->query() with concatenation is a job for a static scanner. Cheap, fast, deterministic. But "does this variable actually reach the sink, is there sanitization, is this hook externally reachable, what role triggers it" — that's judgment, and an LLM does it well once you put concrete evidence in front of it.

The pipeline has 4 layers, each with its own processor, connected by strict JSON schemas:

Layer Name Description
Layer 1 Static Scanner 10 deterministic Python scanners. High recall, low precision — FPs are fine.
Layer 2 AI Triage Fleet 8 parallel subagents with budget caps. Precision filter — drops 97% of FPs.
Layer 3 Dynamic Verify HTTP probes through Burp + WP sandbox. Role-priority probing.
Layer 4 Feedback Loop AI patches the scanner. Each confirmed bug fixes a whole pattern family.

4-layer pipeline — tracker.db is the single source of truth, every step is resumable


Layer 1 — Deterministic static scanners (high recall, low precision)

10 Python scanners, one per OWASP-style vulnerability class: sqli, xss, csrf, broken-access-control, lfi, rce, ssrf, php-object-injection, file-upload, arbitrary-file-deletion. Each scanner registers sink patterns in _shared/php_patterns.py:

Class CWE Typical sinks Severity
sqli CWE-89 $wpdb->query / get_results / get_var HIGH
xss CWE-79 echo, print, wp_send_json* MEDIUM
csrf CWE-352 wp_ajax_* without nonce MEDIUM
broken-access-control CWE-862 Handler missing current_user_can HIGH
lfi CWE-22 include, require, file_get_contents HIGH
rce CWE-94 eval, assert, system, exec CRITICAL
ssrf CWE-918 wp_remote_*, curl_exec HIGH
php-object-injection CWE-502 unserialize, maybe_unserialize HIGH
file-upload CWE-434 move_uploaded_file, wp_handle_upload HIGH
arbitrary-file-deletion CWE-73 unlink, rmdir, rename HIGH

I deliberately designed these scanners to accept high false positive rates. If a file has echo \(foo and the scanner can't trace where \)foo is sanitized (say, through a helper method in another file), it flags. Target: recall ≈ 100%. Mediocre precision is fine. The next layer is where judgment happens.

Tracker.db — the farm's state machine

A CHECK constraint in SQLite enforces transitions — you cannot skip a step

Diff scan: When a plugin gets a new version, the farm defaults to a diff scan between version_current and version_previous (only scans changed files). For a 20k-line plugin, a patch usually only touches a few files, so diff scan cuts cost 5-10x. That's what makes the pipeline scale over time. Adding one plugin to the watch list costs one full scan upfront, then only deltas going forward.


Layer 2 — AI triage subagent fleet

This is the layer I invested the most prompt-engineering effort in. Each candidate from Layer 1 is dispatched to an independent triage-finding subagent in Claude Code. Up to 8 subagents run in parallel.

The subagent is defined in .claude/agents/triage-finding.md:

---
name: triage-finding
description: Triage a single static scanner finding from a WordPress
  plugin/theme scan. Reads the source around the sink, traces the
  tainted variable, checks for defenses, and returns a structured
  verdict (true_positive | false_positive | needs_dynamic_verify |
   needs_more_context | out_of_scope). Read-only.
tools: Read, Grep, Bash, WebFetch
---

Five design choices. This is the "force the AI to do the right thing" part:

(1) Hard cap at 10 tool calls

The subagent has a budget of at most 10 tool calls. When that's spent, it must return a verdict. Why? With unlimited budget, the agent tends to "keep researching" and drift. It reads files outside scope, guesses edge conditions, builds up a story. Cap 10 forces it straight to a decision. If 10 calls isn't enough evidence, the right answer is needs_more_context, not a guess.

Prefer needs_more_context over guessing. A wrong true_positive wastes verifier cost; a wrong false_positive hides a real bug.

(2) Ground every claim in source quotes

Every assertion must include file:line + the exact quoted code. The subagent cannot say "there's sanitization somewhere", it must quote it. When I read a verdict, I verify in 5 seconds by opening the file at that line. This rule alone eliminated most hallucinations I saw in the naive approach.

(3) Prefer ast-grep over plain grep

I bundle the ast-grep binary at ./bin/ast-grep and force the subagent to use it for any structural query. Reason: grep matches inside comments, string literals, and reformatted code. ast-grep matches on the AST, so it's structurally correct.

# Find all calls to a sink method, regardless of object name:
./bin/ast-grep --pattern '\(X->query(\)_)' --lang php <plugin_root>

# Find calls where the argument is a string concatenation:
./bin/ast-grep --pattern '\(X->query(\)_ . $_)' --lang php <plugin_root>

# Find direct superglobal taint sources:
./bin/ast-grep --pattern '\(_POST[\)_]' --lang php <plugin_root>

# Find hook registration to locate the AJAX callback:
./bin/ast-grep --pattern "add_action('wp_ajax_\(_', \)_)" --lang php <plugin_root>

(4) JSON-only output

The subagent must emit exactly one JSON object matching a strict schema:

{
  "finding_id": "<from input>",
  "verdict": "true_positive | false_positive | needs_dynamic_verify | needs_more_context | out_of_scope",
  "confidence": "high | medium | low",
  "reason": "1-2 sentences, grounded in source quotes",
  "evidence_quotes": [{"file": "<rel>", "line": 123, "code": "<exact>"}],
  "defense_observed": ["sanitize_text_field", "current_user_can"],
  "external_reachability": "via_endpoint | internal_only | unknown",
  "exploitability_role": "unauthenticated | subscriber | ... | admin | unknown",
  "suggested_next_step": "<one line>"
}

No markdown, no preamble, no "Here's my analysis:". This output gets piped directly into the scanner_accuracy table. Any extra text breaks the parser.

(5) Pre-written decision tree, no "let AI think for itself"

Decision tree applied in order, stops at first matching outcome:

  1. Does the tainted variable actually reach the sink? (Read ±25 lines around the sink, trace assignments — if the variable is rebound or the tainted branch doesn't reach the sink → false_positive)

  2. Is there a sanitizer/escape/cap-check/prepare applied to that variable before the sink? (sanitize_*, esc_*, $wpdb->prepare, current_user_can, check_ajax_referer)

  3. Is the wrapping function reachable from the claimed endpoint? (ast-grep for add_action / register_rest_route — if orphan → out_of_scope: dead_handler)

  4. What role reaches the endpoint? (wp_ajax_nopriv_X → unauthenticated; permission_callback => '__return_true' → unauthenticated)

  5. Final verdict: high confidence taint reaches sink + no defense + externally reachable → true_positive

Forbidden anti-patterns. This matters because AI tends to pattern-match shallowly:

  • Do not assume an intval() somewhere in the file protects an unrelated SQL query. A defense only counts when applied to THAT specific tainted variable.

  • Do not pattern-match keywords like "allowed" or "safe" in code text. Read the actual statement.

  • Do not trust the scanner's evidence field. Verify by reading source.

Cluster propagation — spread verdict to the same family

Often the scanner produces 20-30 candidates with the same pattern in a single plugin (e.g., 20 callsites of the same un-sanitized helper). Rather than run 30 identical subagents, I run one on a representative and propagate the verdict to the cluster. The tracker records decided_by for audit: triage-finding-agent (321), triage-finding-cluster-propagated (80), deep-dive-manual (92), dynamic-verify (11), dynamic-verify-cluster-propagated (20).


Layer 3 — Dynamic verify: from verdict to working PoC

A text verdict still isn't evidence. Layer 3 is real HTTP probing: the /verify-vulnerabilities skill installs the plugin into a local WordPress sandbox via wp-cli, activates it, then fires requests per role.

Tier-1 probes for each vuln class:

  • SQLi: Differential timing probes. A SLEEP(5)-true / SLEEP(0)-false pair (latency delta) works for both blind boolean and time-based. Caught all 12 SQLi cases, including ones where the payload had to travel through a webhook body parser or REST request param.

  • Path Traversal: /etc/passwd baseline + null-byte / encoding variants + diff against a blank baseline response.

  • BAC / Broken Auth: Request the same endpoint with unauth vs subscriber cookies, diff the JSON structure to detect missing capability checks.

  • Stored XSS: Inject payload via admin endpoint, re-render on front-end shortcode/widget, parse HTML response for un-escaped payload.

  • Insecure Deserialization: Marker object O:8:"stdClass"... + canary callback to confirm unserialize actually fires (and isn't just stringified).

Burp proxy routing + Tier-1/Tier-2 separation

Every probe routes through local Burp (127.0.0.1:8080) so I can audit request/response later and use raw transcripts as evidence. Tier-1 (automated probes) runs on Apache :80; Tier-2 (manual PoC crafting) runs php -S on :8082. The two installs use different DB prefixes so they reset independently and don't mix in Burp history. Lesson I learned the painful way: once you've confirmed a bug with Tier-1 and want to craft a cleaner PoC for the report, you really don't want Tier-1 (dozens of probes) and Tier-2 (a few clean PoC requests) sitting next to each other in the same Burp history.


Layer 4 — Feedback loop: AI patches its own scanner

This is the layer where the leverage actually compounds. Every confirmed CVE from Layer 3 becomes training data for the scanner. A 2-agent pipeline:

2-agent feedback loop with 3 guard rails — patches always go through human review before merge

Agent A — vuln-root-cause (read-only, cap 15 calls)

Receives one confirmed finding, fully analyzes the taint flow, cross-checks against the existing scanner rules. Output is a JSON proposal:

{
  "vuln_type": "sqli",
  "root_cause": "<3-sentence narrative grounded in quotes>",
  "taint_trace": [
    {"step": 1, "file": "...", "line": 28, "code": "...", "role": "source"},
    {"step": 2, "file": "...", "line": 32, "code": "...", "role": "cross_file_dispatch"}
  ],
  "scanner_gap": {
    "current_behavior": "MISSED | PARTIAL_CATCH | NO_GAP",
    "root_pattern_missing": "...",
    "false_positive_risk": "low | medium | high"
  },
  "proposed_change": {
    "verdict": "add_sink | add_source | narrow_defense | architectural | no_change | not_worth_it",
    "approach": "<concrete description>",
    "ast_grep_pattern": "<structural pattern>",
    "files_to_touch": [".claude/skills/sqli/scripts/scan_sqli.py"],
    "risk_level": "low | medium | high"
  },
  "regression_fixture": {
    "case_name": "snake_case_id",
    "must_detect_as": "<vuln_type> vulnerability",
    "minimal_php": "<?php\n..."
  }
}

Distinguish "this specific bug" from "the generic pattern". A scanner rule that only matches one helper-method name is worse than no rule (training-data leak). Estimate FP risk for any proposed pattern. If it's high and the bug is rare, recommend not_worth_it rather than adding garbage rules.

5 proposals produced so far (anonymized)

  1. REST_PARAM_UNAUTH source recognizer — a register_rest_route with permission_callback => '__return_true' whose callback feeds \(request->get_param() straight into an interpolated \)wpdb->*. A common bug. Developers think sanitize_text_field is enough, but it doesn't strip apostrophes.

  2. RAW_HTTP_BODY bypass — when a plugin reads file_get_contents('php://input') instead of $_POST, the body doesn't pass through wp_magic_quotes(), so the attacker's quote goes straight into SQL.

  3. Cross-file taint resolver (architectural) — 2-pass plugin-scope registry: pass A builds a registry of methods that are direct sinks or passthroughs; pass B walks back from every callsite. Chain depth capped at 2 hops to avoid blowup.

  4. Magic-quotes bypass + sink correlation — the scanner already detected urldecode(\(_REQUEST['x']) but emitted medium severity. Agent A proposed correlating the bypass site with a specific string-quoted unprepared \)wpdb->* slot, bumping severity to HIGH.

  5. IN-clause loop concat sink — the pattern for { \(ids_str .= "'" . \)arr[\(i] . "'"; } \)sql = "... IN {$ids_str}" is extremely common but the scanner had no rule.

Agent B — scanner-rule-evolve (Edit-scoped, cap 20 calls)

Implements Agent A's proposal, with 3 guard rails:

  • edit_allowlist: Hard-check that the agent only edits files in that specific scanner (e.g., .claude/skills/sqli/**). No spillover into _shared/ or other scanners.

  • pattern_linter: Rejects substring-heuristic patterns. I got burned once by a rule checking "allow" in text. It matched the comment // allow CORS in a plugin and caused a real LFI miss. The linter blocks that class of pattern up front, forcing word-boundary regex or AST patterns.

  • Regression test must pass. If the test suite breaks, the patch is rejected.

Patches commit to a branch improve/<vuln_type>-<id>. Never auto-merged to main. State in tracker.db.scanner_improvements: analyzed → patched_pending_review → merged | rejected | superseded. At post time: 5 patches, 3 at patched_pending_review, 2 at analyzed. What actually matters: each patch closes an entire pattern family. Once merged, rescanning the original 100 plugins plus the next 100 will automatically flag every similar instance.


Farm results: 524 candidates → 16 confirmed across 15 plugins

Stats pulled from tracker.db at post time:

Layer Unit Number
Layer 0 plugin corpus 100 random plugins (≥10K active installs)
Layer 1 candidates produced ~500+ (mostly from sqli/lfi/rce/file-upload/xss/poi scanners)
Layer 2 triaged via subagent 524 records in scanner_accuracy
Layer 2 true positives 16 (sqli=12, path-traversal=1, deserialization=1, broken-auth=1, stored-xss=1)
Layer 2 false positives 508
Layer 2 deep-dive manual 92 (handling needs_more_context)
Layer 3 dynamic verify confirmed 13 (+20 cluster-propagated)
Layer 4 scanner rule patches 5 (3 pending review, 2 analyzed)

TP distribution by class and root pattern

Class # Plugin category (anonymized) Root pattern
SQLi 12 e-commerce, payment, downloads/membership (1 plugin with 2 sinks), analytics, SEO, image optimization, form builder, appointments, shipping Mix: sanitize_text_field doesn't strip apostrophes, magic-quotes bypass, cross-file taint, IN-clause loop concat
Path Traversal 1 order tracking User-controlled filename concatenated into file_get_contents without realpath()/whitelist
POI 1 background task scheduler unserialize() on a DB-stored task payload that another endpoint can inject — POP gadget chain
Broken Auth 1 backup / migration Sensitive endpoint (export/restore) missing nonce + capability check during a state-reset window
Stored XSS 1 page builder addon Widget setting (admin role) stores raw payload, front-end shortcode renders without escape — cross-role impact

Zooming out, 5 root patterns showed up over and over in the SQLi findings. These are exactly what the scanner needs to learn (Layer 4 has produced patches for all 5):

  • REST routes with permission_callback => '__return_true' interpolating $request->get_param() straight into raw SQL.

  • file_get_contents('php://input') reader in a webhook handler hooked into init. Fires on every request, raw body bypasses wp_magic_quotes().

  • Cross-file taint through a shared helper method (e.g., plugin_get_value(\(payload, 'a/b/c')) without a \)validate argument — propagates from webhook source through 2-3 files into $wpdb->get_row() interpolated.

  • IN-clause loop concatenation — pattern for { \(ids_str .= "'" . \)arr[\(i] . "'"; } piped into \)wpdb->get_results("... IN {$ids_str}").

  • Magic-quotes bypass via urldecode(\(_REQUEST[...]) then sanitize_text_field (useless for SQL context) before going into a string-quoted slot in an unprepared \)wpdb->*.

FP rate ~97% sounds bad, but it's by design. The static scanner is intentionally high recall, the AI subagent is the precision filter. If Layer 1 only flagged 16 findings, it would likely miss 5-10 more. I'd rather have AI filter 508 false positives than miss one critical unauth SQLi on a 100K-install plugin.


Proof of Work — Sample Verification Evidence

Example findings of a prompt with the skills mentioned above across a batch of 10 plugins.

Based on that, I verified it again and confirmed that it was correct in WPCargo Track & Trace

I’ve submitted it and am now waiting for Patchstack’s review.

Another SQL injection in WPDM Premium Packages

One of us was rewarded with a bounty for the vulnerability we found.

Honest evaluation: what works, what doesn't, what's next

Separating enumeration from judgment kills almost all hallucination. The 10-call subagent budget enforces focus. Parallel dispatch of 8 agents triages a batch in minutes. Cluster propagation saves cost on duplicate patterns. The feedback loop produces architectural patches (cross-file resolver) that manual work rarely reaches.

That said, cross-file taint through many helpers is still weak. There's no JavaScript scanner yet. Cost isn't tightly controlled; a full pass over 50 plugins can burn a few dozen USD in tokens. Theme scanning has worse signal-to-noise than plugins.

What I want to build next: a generator for exploit scripts, an MCP server exposing tracker.db for natural language queries, a per-function taint summary cache, and expanding dynamic verify to Tier-2 (POI/RCE chains).


Closing

The whole pipeline runs on Claude Code (CLI) with Opus 4.x for both the parent skill and every subagent. No Codex mixed in, no Sonnet. I need enough reasoning depth at the subagent layer to judge cross-file taint flow, especially when taint travels through dynamic dispatch or hook chains. Sonnet is cheaper, but I tried it and it missed noticeably on helper-method indirection cases.

If you want to build a similar farm: don't make the LLM do the static analyzer's job, and don't make the static analyzer do the LLM's job. Let each side do what it's good at. Static enumerates sinks, sources and defenses. The LLM judges on quoted evidence. Wire them together with strict JSON schemas, set budget caps on the AI to force focus. The rest is patiently tinkering with prompts and building resumable state machines.

The farm is still running. Each new plugin version in the watch list auto-runs a diff scan; each newly confirmed pattern feeds back into the scanner. That's the part that scales. Not the "16" today (which will be stale in a week), but the rate at which the scanner learns new patterns, plus near-zero marginal cost to add one more plugin to the watch list.

If you're building something similar, three things I wish I'd done sooner:

(1) build a resumable state machine from day one, not after crashing midway through a 50-plugin batch;

(2) force subagents into a strict JSON schema with budget caps, because that alone kills most hallucination;

(3) invest in the scanner feedback loop, because confirming one bug without feeding the rule back throws away the most valuable part of the whole exercise.