Substrate
technology

Microsoft MDASH Outperforms Mythos Preview on CyberGym Benchmark

Microsoft announced MDASH, a multi-modal agentic scanning harness for vulnerability discovery and remediation. The system scored 88.4 percent on the CyberGym benchmark compared with 83.1 percent for Mythos Preview. MDASH uses more than 100 specialized agents and is in limited private preview with customers.

forbes.com
1 source·May 15, 4:52 PM(14 days ago)·2m read
|
Microsoft MDASH Outperforms Mythos Preview on CyberGym Benchmarkforbes.com
Audio version
Tap play to generate a narrated version.
Developing·Limited corroboration so far. This page will refresh as more sources emerge.

Microsoft on Tuesday announced MDASH, also known as Microsoft Security multi-modal agentic scanning harness. The system is the first multi-modal service included in the CyberGym benchmark developed by UC Berkeley’s Center for Responsible, Decentralized Intelligence.

It scored 88.4 percent on the benchmark, compared with 83.1 percent for Mythos Preview. CyberGym evaluates AI agents on real-world vulnerability analysis. The benchmark includes 1,507 real-world vulnerabilities across 188 open-source projects. Microsoft said the result indicates MDASH is more effective at identifying vulnerabilities than Mythos Preview.

MDASH is not a single model but an agentic system that runs more than 100 specialized agents. Some agents hunt for vulnerabilities while others debate whether identified flaws are real or exploitable. The company said this approach reduces false positives.

In its first run against the Windows operating system, MDASH surfaced 16 previously unknown vulnerabilities. Four of those were critical remote-takeover flaws addressed in this month’s Patch Tuesday. The announcement came the same week that OpenAI announced its Daybreak security initiative.

MDASH was built by Microsoft’s Autonomous Code Security team. The team includes several members of Team Atlanta, which won the $29.5 million DARPA AI Cyber Challenge by building an autonomous cyber-reasoning system. The system found and patched bugs in open-source projects.

Taesoo Kim, VP of Security Research at Microsoft, leads the Autonomous Code Security team. Kim said the adoption of MDASH is rapid, with everyone internally at Microsoft leveraging it for security tests. The Windows team has integrated the tool into its entire build process and pipeline.

“This enables developers to get feedback and perform security testing by using MDASH as part of the CI/CD pipeline,” Kim said. The company started a private preview of MDASH last week and already has a handful of customers. Available models include GPT 5.5, 5.6, 5.5-Cyber, Sonnet and Opus.

Customers must sign up to apply for access during the private preview phase. Kim said the ability of agents to debate vulnerabilities filters out false positives identified during scanning and offers a higher chance of identifying certain vulnerabilities than Mythos.

Dolan-Gavitt, AI researcher at autonomous offensive security platform XBOW, commented on the result. “The performance of MDASH on CyberGym is good confirmation of something we’ve also found at XBOW: the capability of the model is important, but the harness, the scaffolding you put around the model to point it in the right direction, ensure it has the tools and context it needs to do a good job, and check its work, can be at least as important,” Dolan-Gavitt said via email.

Daniel Spicer, CSO at AI-powered IT security platform Ivanti, also responded. “Based on Microsoft’s description, MDASH is more about the tooling used to run multiple models to identify vulnerabilities. This doesn’t say anything about Mythos specifically, for all we know, Microsoft is using Mythos as part of MDASH,” Spicer said via email.

Andrew Rubin, CEO and founder of breach containment provider Illumio, described the broader environment. “Without commenting on specific models/capabilities, I believe we are seeing the start of a true arms race, both between attackers and defenders, and between the providers themselves.

The size of the steps and the speed of the changes will only continue to increase exponentially,” Rubin said via email.

Key Facts

MDASH CyberGym score
88.4% vs 83.1% for Mythos Preview
CyberGym benchmark
1507 vulnerabilities in 188 open-source projects
MDASH agents
more than 100 specialized agents
Windows vulnerabilities
16 unknown including 4 critical remote-takeover
DARPA AI Cyber Challenge
Team Atlanta won $29.5 million prize

Story Timeline

4 events
  1. May 15, 2026

    Microsoft announced MDASH and its CyberGym benchmark results.

    1 sourceforbes.com
  2. April 2026

    Mythos Preview entered limited release.

    1 sourceforbes.com
  3. May 2026

    Microsoft began private preview of MDASH with select customers.

    1 sourceforbes.com
  4. May 2026

    MDASH identified 16 unknown Windows vulnerabilities including four critical flaws.

    1 sourceforbes.com

Potential Impact

  1. 01

    Microsoft customers in private preview will test MDASH on their own codebases.

  2. 02

    Security teams may incorporate multi-agent systems into CI/CD pipelines for automated vulnerability scanning.

  3. 03

    Enterprises could gain earlier detection of software flaws before public exploitation.

  4. 04

    Competitors may accelerate development of agentic scaffolding around AI security models.

Transparency Panel

Sources cross-referenced1
Confidence score75%
Synthesized bySubstrate AI
Word count549 words
PublishedMay 15, 2026, 4:52 PM
Bias signals removed3 across 2 outlets
Signal Breakdown
Amplifying 1Framing 1Loaded 1

Related Stories

World Urban Forum 2026 Draws 57,000 Participants from 176 CountriesEuronews
technology4 hrs agoDeveloping

World Urban Forum 2026 Draws 57,000 Participants from 176 Countries

The 13th World Urban Forum concluded with discussions on housing, climate resilience and urban governance. Organisers reported that the sessions informed future strategic priorities.

Euronews
1 source
Trump Mobile website still lists T1 phone as American-madetheverge.com
technology4 hrs agoDeveloping

Trump Mobile website still lists T1 phone as American-made

The product page for the T1 phone continues to describe the device as American-made. The Verge reported that the site may conflict with FTC advertising rules. The phone was announced in June 2025.

The Verge
1 source
EU Discusses Readiness for Artificial Intelligence ChangesFrance 24
ai4 hrs agoDeveloping

EU Discusses Readiness for Artificial Intelligence Changes

A France 24 program examined whether European Union policies can address the effects of artificial intelligence. The discussion covered potential impacts across daily life and economic sectors.

France 24
1 source