When researchers found that Microsoft 365 Copilot could be tricked into leaking corporate data from a single email, the flaw got a clean public identifier: CVE-2025-32711, severity 9.3. When a bug hunter coaxed ChatGPT into producing valid Windows product keys by framing the request as a guessing game, it got nothing.
Both were prompt injections. Only one is trackable. That Vulnerability Tracking Gap in AI Security, and what it costs defenders, is the subject of this article.
What Is a CVE and Why Does It Matter for Software Security?
A CVE (Common Vulnerabilities and Exposures) is a unique public identifier for a specific software flaw. It gives the whole industry one name for one bug, so a researcher in Berlin and an analyst in Bahrain know they mean the same thing.
The Role of MITRE’s CVE Program in Traditional Vulnerability Management
The CVE program is run by the MITRE Corporation, a US nonprofit. Since 1999 it has assigned hundreds of thousands of IDs, each tied to a discrete, reproducible defect in a defined product and version.
A CVE is the connective tissue of coordinated disclosure: a researcher reports the flaw, the vendor patches it, the ID is published, and defenders map it to their own assets. Without that shared label, the same bug ends up with three names and no clear owner.
The National Vulnerability Database (NVD) and CVSS Scoring
The National Vulnerability Database, maintained by NIST, enriches each CVE with a CVSS (Common Vulnerability Scoring System) score from 0 to 10. That lets teams triage: a 9.3 jumps the queue, a 4.0 waits.
Why Prompt Injection Breaks the Traditional CVE Model
The CVE model assumes a bug lives in code, sits in a version, and can be fixed. Prompt injection violates all three.
Prompt Injection as a Class of Attack, Not a Discrete Bug
Prompt injection smuggles instructions into the data an LLM reads, so the model follows the attacker rather than the user. OWASP ranks it as LLM01, the top entry in its 2025 Top 10 for LLM Applications. It is a property of how language models work, not one line of faulty code, so you cannot file a CVE against it.
A SQL injection either works or it does not. A prompt injection might succeed nine times in ten, fail on the eleventh, then stop working after a silent model update, which makes the “reproducible” part of reporting genuinely hard.
Model Versioning vs. Software Versioning
Software has clean version numbers. A weight update to a hosted model can ship silently, with no version a researcher can cite. Two calls to “gpt-4o” a week apart may not behave the same way, and there is no changelog to point at.
Why “Patching” an LLM Differs From Patching Code
Patching code closes a specific hole. A developer rewrites the faulty line, ships the diff, and the exploit path is gone for good. That clean, binary, auditable loop is the entire premise on which the CVE system rests. “Patching” a model offers none of it. There is no single line to fix, because the behavior the attacker abused is the same behavior that makes the model useful: it reads text and follows instructions. A vendor’s only levers, retraining, hardening the system prompt, or wrapping the model in input and output guardrails, all lower the odds of a successful attack rather than removing the possibility.
The fix reduces the success rate from 80 percent to 5 percent and marks it as remediated. The hole is narrower, not closed.
The recent record shows how thin that margin is. EchoLeak got past Microsoft’s dedicated cross-prompt-injection classifier by hiding its exfiltration channel in reference-style Markdown that the filter did not recognize, and the AgentFlayer exploit slipped through OpenAI’s URL safety check by routing stolen data through trusted Azure Blob Storage links. Each guardrail worked against the obvious version of the attack and fell to a rephrasing. There is a tuning tax on top of that: crank the filters too tight and the model starts refusing legitimate work, so vendors settle for a balance point rather than elimination.
The practical takeaway is to treat “we’ve addressed this” as risk reduction, not closure.
SOC 2, ISO 27001 and HIPAA done for you. Fixed fee, 100% audit pass rate.
Audit-ready in 6 weeks. Not 6 months.
The Current State of AI Vulnerability Tracking
Several frameworks exist. None is a true registry of individual, citable prompt injection vulnerabilities.
OWASP LLM Top 10 and the LLM01 Classification
The OWASP GenAI Security Project’s LLM01:2025 entry is the most cited reference point. It is a category, not a catalog: it does not enumerate specific incidents with IDs.
MITRE ATLAS for Adversarial AI Threats
MITRE ATLAS is an ATT&CK-style knowledge base of adversarial tactics against AI systems, documenting 16 tactics and more than 80 techniques with real-world case studies as of late 2025. It maps how attacks work, but is not a per-vulnerability ledger with scores.
AVID (AI Vulnerability Database) and Its Limitations
AVID, run by a nonprofit, is the closest thing to a dedicated AI vulnerability database, cataloging failure modes with reproducible evidence. But it leans on community submissions, skews toward bias and broader failure modes, and notes that the definition of an “AI vulnerability” is itself still a working one.
Vendor-Specific Disclosures vs. Industry-Wide Registries
Disclosure happens vendor by vendor. OpenAI patched the Windows-key jailbreak server-side; Microsoft fixed EchoLeak and issued a CVE. There is no common venue where these land side by side.
The Consequences of No Shared Threat Registry for Prompt Injection
Fragmented Disclosure Across AI Vendors
Each lab discloses on its own terms, on its own blog, if at all. A defender protecting a multi-model stack has to monitor a dozen channels and hope nothing slips by.
Duplicate Discovery and Wasted Research Effort
Researchers rediscover the same attack repeatedly. The guessing-game jailbreak, the “dead grandma” trick, and other framing attacks are variations on one theme nobody numbered.
No Standardized Severity Scoring for LLM Attacks
CVSS was built for deterministic flaws. There is no agreed way to score an attack that is probabilistic and context-dependent, so “how bad is this” has no common answer.
Slower Defender Response Times
Without a feed to subscribe to, teams learn about LLM attacks from news and conference talks rather than a structured alert.
Challenges for Enterprise Risk Assessment and Procurement
Buyers cannot ask a vendor, “which known prompt injection issues affect your product, and are they fixed?” the way they can with CVEs. That makes enterprise risk assessment and procurement an exercise in trust rather than evidence.
Why a CVE-Like System for Prompt Injection Is Hard to Build
Reproducibility Challenges Across Model Updates
A CVE entry promises that anyone can reproduce the flaw. That guarantee is what lets a researcher verify it, a vendor confirm it, and a defender test whether their own systems are exposed. A hosted model breaks the promise on both ends. The same prompt can fail on the eleventh attempt because of normal sampling variance, and the weights themselves can change between Tuesday and Wednesday with no version bump to point to. A proof of concept that worked at disclosure may quietly stop working a week later, not because anyone fixed it, but because the model drifted. An identifier is only as useful as the thing it points to, and here the thing keeps moving.
Closed-Weight Models and Disclosure Asymmetry
With closed-weight models from OpenAI, Anthropic, Google, and others, only the lab sees the internals. Outsiders report behavior; the provider decides what to confirm and disclose. That puts the entity with the most information in sole control of how much reaches the public, and the incentives do not favor openness. Confirming a flaw invites scrutiny, while a silent server-side fix attracts none. A neutral registry depends on independent parties being able to validate and publish, and closed weights leave them able to observe symptoms but never inspect the cause.
The Blurred Line Between Bug, Feature, and Misuse
Is a model following an instruction inside a document a bug, or is the feature working as designed? A registry needs a clear yes or no on “is this a vulnerability,” and prompt injection rarely offers one. The model is doing exactly what it was built to do: read text and act on it. Whether that counts as a defect depends entirely on context the model cannot see, namely, whose instruction it was and whether the user wanted it followed. That ambiguity also gives vendors an easy out, since “working as intended” is a defensible label for behavior nobody can cleanly call broken. A catalog cannot index something the industry will not agree to name.
Shared Responsibility Between Model Providers and Application Developers
A prompt injection usually turns dangerous only when an application wires the model to tools, data, and actions through RAG, connectors, or agents. Responsibility is split between the provider and the developer, and neither side owns the whole failure. The model provider can argue the model behaved normally, and the integration was unsafe; the developer can argue they were relying on the model to resist manipulation. Both have a point, which is precisely the problem. With no clear owner, there is no clear party to file the disclosure, assign the severity, or ship the fix, and the issue falls into the gap between them.
SOC 2, ISO 27001 and HIPAA done for you. Fixed fee, 100% audit pass rate.
Audit-ready in 6 weeks. Not 6 months.
Proposed Frameworks for an AI Threat Registry
Extending CVE to Cover Model-Level Vulnerabilities
One option is to stretch the existing CVE schema to cover model behaviors, accepting probabilistic, version-fuzzy entries. It reuses trusted infrastructure but strains reproducibility norms.
Creating a Dedicated Prompt Injection Disclosure Standard
Another is a purpose-built standard with its own identifiers, severity model, and reproducibility rules, designed for non-determinism from the start.
Lessons From CVE Numbering Authorities (CNAs) Applied to AI Labs
The CVE program already delegates ID assignment to CNAs (CVE Numbering Authorities), often the vendors themselves. AI labs could become CNAs for their own models, issuing identifiers under shared rules, as Microsoft does for Copilot.
Coordinated Vulnerability Disclosure (CVD) for LLMs
Underpinning all of it is Coordinated Vulnerability Disclosure: agreed timelines, safe harbor for researchers, and a standard report format adapted to AI’s quirks.
What Enterprises Can Do Until a Registry Exists
Building Internal Prompt Injection Threat Catalogs
Keep a catalog of every injection technique that affects your deployed AI, with prompts, conditions, and mitigations, so you are not rediscovering attacks each quarter.
Subscribing to AI-Specific Threat Intelligence Feeds
Follow AI security research as a dedicated intelligence stream, not incidental news. Outlets like Wired and academic preprints on arXiv tend to surface novel attacks well before any vendor advisory does.
Participating in AI Red Team Communities
Red teaming is the only reliable way to know how your specific stack fails. Testing against your own guardrails, RAG pipelines, and agents finds issues no external list would hold.
Tracking OWASP, MITRE ATLAS, and AVID Updates
Treat OWASP’s LLM Top 10, MITRE ATLAS, and AVID as your standing reference set and check them on a schedule.
Pro Tip: Map your internal catalog to ATLAS technique IDs and OWASP LLM categories as you build it. When a real standard arrives your records translate instead of needing a rebuild, and meanwhile auditors get a recognized vocabulary to assess against.
Pro Tip: Map your internal catalog
Map your internal catalog to ATLAS technique IDs and OWASP LLM categories as you build it. When a real standard arrives your records translate instead of needing a rebuild, and meanwhile auditors get a recognized vocabulary to assess against.
The Path Forward: Standardizing AI Vulnerability Disclosure
Industry Collaboration Between AI Labs, Researchers, and Regulators
No single lab can run a credible cross-vendor registry; rivals will not report into a competitor’s database. It needs a neutral steward, plausibly MITRE or a NIST-backed consortium, with labs participating as authorities.
Regulatory Pressure From the EU AI Act and NIST AI RMF
The EU AI Act imposes obligations on high-risk and general-purpose AI, including incident reporting, while the NIST AI Risk Management Framework and ISO/IEC 42001 push toward documented, auditable AI risk processes. Structured disclosure is the natural next requirement.
A Call for a Public AI Vulnerability Database
The destination is a public, neutral, AI-native vulnerability database: shared IDs, a severity model built for probabilistic attacks, and disclosure rules every major lab signs onto. We are not there yet, so everything above is a stopgap.
Conclusion
Prompt injection is the top-ranked risk in AI security and the least trackable. It earns a CVE only when it surfaces inside a discrete product; the model-level root cause and server-side fixes leave no public trace. Until the industry builds an AI-native registry with a severity model fit for non-deterministic attacks, defenders must stitch together OWASP categories, ATLAS techniques, AVID entries, and their own catalogs. Build that internal catalog now. It is the one piece you fully control.
Frequently Asked Questions
Is there a CVE for prompt injection?
Sometimes. When it manifests in a specific product, like EchoLeak in Microsoft 365 Copilot (CVE-2025-32711) or CurXecute in Cursor (CVE-2025-54135), it can receive a CVE. The general attack class against language models has none.
Why doesn't prompt injection have its own vulnerability ID?
Because it is a class of behavior, not a discrete code defect. It is probabilistic, it can break across silent model updates, and it often has no single fix, all of which clash with the reproducibility a CVE assumes.
What is the closest equivalent to CVE for AI systems?
AVID is the nearest dedicated database, while OWASP’s LLM Top 10 and MITRE ATLAS are the dominant classification frameworks. None is a complete, citable registry of individual vulnerabilities.
How do AI vendors currently disclose prompt injection vulnerabilities?
Inconsistently. Some patch silently server-side, some publish blog write-ups, and some issue CVEs when the flaw sits in a versioned product.
Can MITRE ATLAS replace CVE for LLM threats?
No. ATLAS catalogs tactics and techniques, not individual scored vulnerabilities, so it complements a CVE-style registry rather than replacing one.
Will the CVE program expand to cover AI models?
Possibly. AI labs could act as CVE Numbering Authorities, but the non-deterministic nature of prompt injection makes full coverage unlikely without a purpose-built standard.