Mozilla Found 271 Firefox Bugs with Anthropic's Mythos. Almost No False Alarms.
Mozilla used Anthropic's Mythos, a model built specifically for vulnerability detection, to find 271 Firefox security flaws over two months. The low false positive rate is the more interesting number.
What Mozilla Built
Mozilla did not simply point Mythos at Firefox and wait. They built a custom agent harness that wraps the model and drives it through vulnerability analysis tasks.
The harness gives Mythos the same tools and pipeline used by human Mozilla developers, including a special Firefox build for testing. Brian Grinstead, a Distinguished Engineer at Mozilla, described it as code that provides instructions, grants file read/write access, and runs the model in a loop until a task completes.
The setup matters because it explains the results. Mythos is not reviewing code in isolation. It is operating in the same environment a human security researcher would use.
The False Positive Problem
Mozilla's previous AI-assisted vulnerability detection attempts produced high rates of hallucinated bug report details. Human reviewers had to verify each result manually, which limits how useful the tool actually is at scale.
Mythos produced "almost no false positives," according to Mozilla's reporting. That is a significant change in the workflow. If results require minimal human verification, the throughput advantage of running an automated model becomes real rather than theoretical.
What Mozilla's CTO Said
In April 2026, Mozilla's CTO stated that AI-assisted vulnerability detection means "zero-days are numbered" and that "defenders finally have a chance to win, decisively."
That is a strong claim. The 271-vulnerability result over two months supports optimism. Whether the economics hold at scale, across more complex codebases and adversarial conditions, is a different question. The demo was impressive. The broader implications remain to be seen.
The Practical Takeaway
Security tooling has had many "AI will find all the bugs" moments. Most of them ran into the false positive problem: automated tools that find real bugs but also generate enough noise to overwhelm the teams reviewing them.
Mozilla's result suggests Mythos handled that tradeoff well, at least for this use case. A model purpose-built for vulnerability analysis, given real developer tooling and a looping agent harness, found 271 confirmed flaws with minimal garbage output.
271 flaws in two months across one codebase is a meaningful number. Whether other organizations can replicate the setup and results will determine if this is a new baseline or an outlier.
Source: Arstechnica