Stop Paying Your Agent to Do the Same Job Twice

CVE volume is accelerating and most teams are running scripts that can't keep up. We used an agent to build a scanner that queries three databases, classifies and scores every CVE on actionability. The agent spent tokens once. The workflow runs every morning without it.

A brass mechanical automaton on a wooden workbench, pen arm drawing on paper. Craftsperson's tools set aside, chair pushed back, cold coffee nearby. Warm morning light through a window.
Built once. Still running.

CVE volume was already hard to keep up with before the AI boom. Now it's accelerating. More code is being generated, more dependencies are being pulled in, and more packages are being published by tools that nobody has manually reviewed. The barrier to publishing a package dropped to zero. Every week there are new advisories for AI tooling that didn't exist six months ago: MCP servers, agent frameworks, AI-generated libraries with vulnerabilities baked in from the start.

Most teams have something running for this. A script in a cron job, a CI step, a scheduled GitHub Action. The problem isn't that it doesn't run. The problem is that when your stack changes, someone has to go rewrite the script. The classification logic that made sense six months ago doesn't account for the MCP servers you added last month. The scoring doesn't reflect that you moved to a new cloud provider. The script is frozen at the point in time someone wrote it, and keeping it current is a job nobody has.

We needed something that queried multiple sources, classified what it found by how it affected our stack, scored each CVE on actionability, and ran every morning. That's a lot of domain logic. So we used an agent to build it, and then took the agent out of the loop.

How the agent built it

I started with the outcome I wanted, not the implementation. We'd already published specific CVE scanner extensions on swamp-club for individual vulnerabilities like Mini Shai-Hulud and DirtyFrag. What I needed was a way to know when a new CVE was serious enough to warrant building the next one. A daily research report that scanned the major databases, classified what it found, and told me which CVEs were strong candidates for a dedicated swamp extension.

I told the agent I wanted to track supply-chain and infrastructure CVEs, plus anything relevant to the swamp software stack, and asked it to recommend which vulnerability databases to use. It came back with NVD, GitHub Advisory DB, and CISA's Known Exploited Vulnerabilities catalog, then built the extension using swamp's extension model and the API documentation for all three. If I want to add a fourth database tomorrow, I describe what I want to track and the agent adds it. I don't rewrite anything by hand.

The three databases don't agree on anything. NVD returns CVSS v3.1 metrics nested inside a vulnerability object. GitHub Advisory returns a completely different structure with severity as a lowercase string and scores in a separate cvss field. CISA KEV doesn't have scores at all. Everything in there is critical by definition because it's actively exploited. The agent figured out each response format, normalized the data, and handled the deduplication problem: the same CVE often appears in multiple databases with different severity scores and different metadata. It chose to prefer CISA KEV over NVD over GitHub, and to merge category tags across sources rather than dropping duplicates.

NVD also rate-limits aggressively, which was throwing errors in our early runs. The agent added retry logic with exponential backoff and it stopped being a problem. In a hand-written script, that's the kind of thing that breaks quietly until someone asks why the report stopped.

Then the classification. A supply-chain compromise where a malicious package was published to npm is a fundamentally different problem from a vendor product patch for Splunk. The agent built a classifier that sorts CVEs into five buckets: supply-chain, infrastructure, open-source libraries, vendor products, and web app plugins. Each one gets scored on three dimensions: impact, scannability, and remediation complexity. The total score determines whether the CVE surfaces as an extension candidate, gets flagged for monitoring, or gets filtered as routine patching.

It also built a report extension. Actively exploited CVEs in CISA KEV get flagged regardless of category. WordPress plugin vulnerabilities get a summary count and nothing more, because they're not actionable for our stack.

What runs every morning

The daily pipeline is a swamp workflow with two steps:

jobs:
  - name: research
    steps:
      - name: scan
        task:
          type: model_method
          modelIdOrName: cve-watcher
          methodName: research
          inputs:
            lookbackDays: ${{ inputs.lookbackDays }}
      - name: notify-discord
        task:
          type: model_method
          modelIdOrName: discord-cve-alerts
          methodName: send
          inputs:
            content: ${{ data.latest("cve-watcher", "scan").attributes.discordSummary }}
            username: CVE Watcher
        dependsOn:
          - step: scan
            condition:
              type: succeeded

Step one runs the research method. Step two posts the summary to Discord using @keeb/discord, a community-published extension. Data flows between them with a CEL expression. The webhook URL comes from a GitHub Actions secret.

We changed the Discord output format three times in the first week. The summary needed to fit under Discord's 2000 character limit, show the right level of detail, and be scannable at a glance. Each time, we described what we wanted the output to look like and the agent rewrote the formatter. In a traditional script, reformatting the output is real work. You're rewriting string concatenation, testing edge cases, making sure you don't break the existing logic. Here, we described the change and regenerated the formatter.

On a typical day it evaluates over a thousand CVEs. The summary that lands in Discord:

CVE Research Report
61 CVEs in the last 1d (0 new)

🟢 Extension Candidates
None found — no CVEs scored high enough to warrant a dedicated scanner.

🟠 2 Monitor
npm/@budibase/server (HIGH 7.5)
Budibase: POST /api/attachments/:datasourceId/url is unauthenticated and
lets anonymous callers mint S3 PUT pre-signed URLs using stored datasource
IAM credentials

npm/@budibase/server (HIGH 7.4)
Budibase: Unauthenticated S3 signed upload URL generation allows arbitrary
writes with stored datasource credentials

🔴 59 Patch Only
Vendor Product: 37
Open-Source Lib: 21
Infrastructure: 1

NVD • GitHub Advisory • CISA KEV • Today at 12:46 AM

The full report goes deeper. Each CVE that matters gets a classification, an actionability score, and a recommendation:

### Crawl4AI (NEW)
  CVE            GHSA-r253-r9jw-qg44
  Severity       CRITICAL (10)
  Classification Open-Source Libraries
  Actionability  4.4/10 — Monitor
  Impact         ██░░░ 2/3
  Scannable      ██░░░ 2/3
  Remediation    â–‘â–‘â–‘â–‘â–‘ 0/3

  Unauthenticated RCE via Chromium launch-argument injection
  Why: dependency scanning can detect this; remote code execution

No tokens. No inference. Deterministic scoring against fresh data.

Tracking changes over time

Every run queries everything published in the last 24 hours. The extension also tracks every CVE ID it's ever seen, so the report tags which ones are genuinely new versus ones that appeared in a previous run's window. If a CVE was published late in the day and shows up in two consecutive runs, you're not reading about it twice.

Because every run is stored, we can look back at how classifications and scores changed over time. A month in, you can query the data with CEL expressions to see which categories are trending, whether supply-chain advisories are growing faster than everything else, and how your exposure has shifted.

Where the tokens went

The agent spent tokens understanding three different API formats, designing a classification taxonomy, writing a scoring methodology, building report logic. That was a one-time cost.

Everything after that is a workflow executing typed methods against typed schemas. The encoded knowledge survives the conversation that created it. Anyone on the team can run it, and there are already other CVE extensions on swamp-club built by other people on the same primitive: specific vulnerability scanners, a UniFi-specific CVE watcher built by a community member, each one built the same way.

Use an agent to build the automation. Use swamp to run it.

If your agent is still in the loop every time the automation runs, you're paying for the same understanding twice.