AI Broke Vulnerability Management—CISOs Are Shifting to BAS

For three decades, vulnerability management survived on a buffer: the time between finding a flaw and someone turning it into a weapon. Triage. Schedule. Validate. Move on. That buffer made the model viable.

That buffer is gone.

AI didn’t slow your team. It accelerated the attacker—compressing discovery-to-exploit from months to hours. A process built for breathing room fails when there is none.

AI turned discovery into a volume game

In May 2026, Anthropic reported that it and roughly 50 partners used Claude Mythos Preview to identify more than 10,000 high- or critical-severity vulnerabilities in systemically important software in a single month.

Earlier results were just as blunt. Pointed at Firefox, the gated Mythos model produced 181 working exploits, versus just 2 from the prior frontier model. It surfaced issues across every major OS and browser, including an OpenBSD bug undetected for 27 years. At the time of writing, more than 99% of the findings remained unpatched.

Campaign activity targeting FortiGate devices across countries and time — Figure 1. February 2026, FortiGate campaign

The flip side showed up in AWS telemetry. An AWS threat-intelligence report from February 2026 details large-scale compromise with no zero-days—just weak credentials, industrialized via a custom MCP server running offensive tools autonomously. AWS confirmed 600+ devices across 55+ countries; independent researchers saw logs queuing 2,516 devices across 106 countries. What once required rare expertise now operates at machine speed and scale.

The weaponization window collapsed

Defenders used to have months between a CVE disclosure and the first confirmed exploitation—time-to-exploit (TTE). That window has slammed shut.

Zero Day Clock puts the 2026 average at roughly 24 hours, down from ~53 days in 2024.

Chart showing sharp decline in mean time-to-exploit from days to hours — Figure 2. Mean time-to-exploit (TTE) by Zero Day Clock

The breach data lines up. Verizon’s 2026 DBIR ties 32% of initial access to vulnerability exploitation—and expects that number to rise, because AI coding assistants now put exploit building, porting tools to new languages, and finding fresh flaws within reach for attackers who never had those capabilities before.

Initial access techniques with generative AI assistance in Verizon DBIR — Figure 3. Generative AI-assisted techniques categorized as initial access methods by Verizon’s 2026 DBIR

Patching faster isn’t a strategy—it’s physics denial

Regulators are signaling same-day patches for some criticals. Boards expect it. Executives demand it. But remediation isn’t a switch. Patches wait on regression testing, change windows, approvals, uptime, and compliance constraints. Outrunning an exploit by taking production down is just a different outage.

The data is moving the wrong way. Across 13,000+ organizations in the Verizon 2026 DBIR:

Median fix time for known-exploited vulnerabilities: 43 days (up from 32)
Share fully patched: down from 38% to 26%

When offense runs in hours and remediation runs in weeks, the breach happens in between. Even the best performers close only 30–40% of known-exploited vulnerabilities in the first week after detection—a rate that’s barely moved despite steady investment.

Telling teams to “just patch faster” is like asking a freighter to stop on a dime.

The bottleneck moved. So must the strategy.

For years, the playbook was simple:

Find the flaws,
Score by severity,
Patch the worst first.

That worked when a few dozen criticals landed per quarter. Not when there are hundreds or thousands of disclosures a day. Per Verizon’s DBIR, the median organization had to patch 16 known-exploited vulnerabilities in 2025, up from 11 the year before—nearly a 50% jump. That was before AI-discovered flaws began flooding the catalog.

Severity alone doesn’t tell you if a flaw is reachable in your environment, whether your controls already stop it, or whether it chains into anything that matters. When everything is a 9 or 10, nothing gets prioritized.

The useful question changes from “what’s vulnerable?” to “what’s exploitable against us right now—and would our defenses catch it?”

This is exactly what Breach and Attack Simulation (BAS) was built to answer.

Why BAS becomes the cornerstone against AI-powered attacks

BAS takes real-world adversary TTPs—the ones powering today’s campaigns—and safely runs them against your live prevention and detection stack. Not a scan. Not a theoretical mapping. An exercise that reveals what your tools will block, what they’ll detect, and what will slip through.

In a world drowning in disclosures, BAS does what vulnerability management alone can’t:

Separate theory from reality. A flaw already neutralized by your WAF, IPS, and EDR is not the same risk as one that walks right in. BAS shows which is which, so teams stop treating every CVE like a five-alarm fire.
Validate the controls you’ve already paid for. Most enterprises run ten to seventy tools with overlapping policies. BAS measures whether they fire as configured and surfaces residual risk in the seams.
Buy time to patch safely. If a critical asset is demonstrably covered by hardened controls, the patch can move through normal change control instead of an emergency rollout. If it isn’t covered, you mitigate first.

Field reports show the budget shift: CISOs are carving out dedicated BAS spend that didn’t exist a year ago.

Gartner labels this evolution Adversarial Exposure Validation—combining security effectiveness (“Are my controls working?”) with business context (“Which assets matter most, and what’s truly reachable?”) to prioritize based on your reality, not hypothetical scores. Paired with autonomous penetration testing that proves whether exposures can be chained from initial foothold to crown jewels, BAS completes the picture.

One side asks, “Can they breach us?” The other asks, “Would we catch it?” Together, they replace guesswork with evidence.

BAS has to run autonomously—at machine speed

There’s a catch. If adversaries operate autonomously, a validation cycle that takes a human a week is obsolete on arrival. Machine-speed attacks demand machine-speed defenses. The only thing fast enough to counter autonomous offense is autonomous defense.

Pointing raw generative AI at this is risky. As Picus CTO Volkan Erturk has warned, a model told to invent an exploit might hand back a live malware sample—or hallucinate techniques a group never uses. You don’t want unvetted binaries detonating in production, or defenses built against attacks that don’t (or can’t) exist.

Picus’ answer: put the model in charge of coordination, not creation.

Instead of generating payloads, Picus’ agentic BAS matches a fresh threat report against a curated, pre-vetted library of safe, ready-made test components. A security team names a threat, and a multi-agent system takes it from there: one agent identifies the threat and drafts a research plan, others gather and validate intel across sources, and a builder agent maps adversary TTPs into attack chains ready for simulation.

The output: an accurate, ready-to-run simulation assembled in minutes.

This collapses the loop. A CISA alert—or a forwarded headline—becomes a scoped test, a posture score, prioritized mitigations, and an executive report in minutes, with humans reviewing exceptions rather than driving every step.

Where this lands

Patching remains essential. But when AI finds flaws by the thousands and weaponizes in hours, patching alone can’t carry your strategy. If the offense is autonomous, the defense must operate at least as fast.

What scales is validation: prove what your controls will actually stop, prove what’s exploitable, and spend scarce remediation time only where it changes the outcome. AI-powered, agentic BAS is a core pillar of the Picus Platform, continuously testing whether your defenses block and detect what matters—without waiting for a human to kick off the next cycle. When a gap appears, the platform points to the vendor-specific mitigation required, and then re-validates to confirm closure.

The question—“Does this new headline put us at risk?”—isn’t going away. The right validation strategy answers it before anyone asks.

Reference: View article