Should Mythos be made public?
Why Anthropic is correct in making its bold move to restrict its latest rollout
Contrary to the prevailing media narrative, Mythos is not a model produced specifically for cybersecurity tasks. Like Claude or ChatGPT, it is another SOTA frontier model. Its delayed and restricted rollout is entirely a function of Anthropic's heavy focus on alignment. This model identified vulnerabilities in every website, so in theory could be open to abuse by malicious actors. Hence, the restricted rollout.
More of these frontier LLMs with similar capabilities will soon be produced by ChatGPT and Google, if they have not already done so. Arguably ChatGPT 5.5 is comparable to Mythos. Are these companies as concerned with cybersecurity risk? Or do they consider their current guardrails, and predominant cybersecurity practices1 amongst firms, to be sufficient to mitigate the risks at hand?
My overarching point is that the degree to which models pose a cybersecurity risk is somewhat arbitrarily defined by the lab itself. This implicit alignment regulatory system operates entirely voluntarily and on self-reporting. Yes the US government operates some leverage and influence over Anthropic on this, yet aside from designating supply chain risks, the government has yet to enact executive or legislative action. Therefore, OpenAI and Google can easily arrive at a different conclusion.
Does Anthropic's heavy focus on alignment risks and its EA ethos place it at a disadvantage? Is it rational for Open AI and Google to release their Mythos-class models once they arrive, or should they follow Anthropic's path? What is the equilibrium here? Is this optimal? What are the existential risk implications?
Why public choice favours Anthropic, and why this is good for society
My first thought is that coordination across the companies breaks down - adversely impacting the exact multi-agent institutional governance we need.
In the short-run, Anthropic is at a clear disadvantage in maximising market share and volume, which substantially affects its bottom line. However, this equilibrium holds only for as long as deviation by the competitors is not punished. Anthropic hence yields an incentive to penalise Mythos-class releases via any means it can.
An obvious high-return example of such a scenario is via lobbying. Anthropic, alongside the wider EA community, seems to me are clearly yet subtly investing substantially in their influence amongst lobbyists in Washington, political campaigns, the think-tank world, and the wider media. They will then likely argue for governments favouring “more responsible” actors, more statutory regulation of AI companies, and even restrictions on the capabilities of each new model release.
Moreover, if a disastrous tail-risk event does occur with a more lenient release plan, then the more cautious party stands to benefit. So much is contingent on their respective probabilities of existential risk scenarios? Prediction markets and the superforecasting industries will benefit, and in my view AI researchers and economists2 will prevail over both the hardcore accelerationists and the doomers, who are becoming increasingly luddite in their opposition to data centres.
On the margin, this grants more of an advantage to Anthropic, conditional on increased political influence. I will be following (and writing about) these developments closely within the next few years. AI will feature as an increasingly salient entity in mainstream political discourse in the years ahead. Given that caution untimately yields positive externalities in terms of limiting the influence of a nascent pause agenda, backed by anti-abundance and degrowther types on the left, this equilibrium is also the socially optimal outcome.
This voluntary regulatory institution operating on self-restraint is preferable for all firms to following government mandates, and its wider political implications. Therefore, Open AI and Google will also restrict their initial releases of their Mythos-tier plans.
Governments, coordinating globally, will likely create statutory bodies to enforce evaluation standards and alignment priorities. Some will set extensive disclosure and reporting obligations for some functions, as is already the case in the EU. UK AI regulation is surprisingly lax, yet expect it to get stringent. Alongside the Online Safety Act, the left-leaning parties will successfully legislate for mandatory disclosure requirements: both for safety concerns and perhaps to improve damaged economic ties to the EU. Where protectionism plays a more powerful political force, in the United States and China, export and import controls will be tightened. Open source models will diminish in clout. All of this is far preferable to an outright pause on all further AI development, or even bans.
Primarily via building multi-layered systems so an entire exploit chain is required to compromise the website; also the standard PIN codes, passwords, biometrics, encryption, and the like.
Whom as I have argued in my last piece, have the most accurate probabilities in my view.

