{"id":35877,"date":"2025-10-20T14:21:35","date_gmt":"2025-10-20T14:21:35","guid":{"rendered":"https:\/\/agooka.com\/news\/business\/anthropic-has-a-plan-to-keep-its-ai-from-building-a-nuclear-weapon-will-it-work\/"},"modified":"2025-10-20T14:21:35","modified_gmt":"2025-10-20T14:21:35","slug":"anthropic-has-a-plan-to-keep-its-ai-from-building-a-nuclear-weapon-will-it-work","status":"publish","type":"post","link":"https:\/\/agooka.com\/news\/business\/anthropic-has-a-plan-to-keep-its-ai-from-building-a-nuclear-weapon-will-it-work\/","title":{"rendered":"Anthropic Has a Plan to Keep Its AI From Building a Nuclear Weapon. Will It Work?"},"content":{"rendered":"<p>Save StorySave this storySave StorySave this story<\/p>\n<p>At the end of August, the AI company Anthropic announced that its chatbot Claude wouldn\u2019t help anyone build a nuclear weapon. According to Anthropic, it had partnered with the Department of Energy (DOE) and the National Nuclear Security Administration (NNSA) to make sure Claude wouldn\u2019t spill nuclear secrets.<\/p>\n<p>The manufacture of nuclear weapons is both a precise science and a solved problem. A lot of the information about America\u2019s most advanced nuclear weapons is Top Secret, but the original nuclear science is 80 years old. North Korea proved that a dedicated country with an interest in acquiring the bomb can do it, and it didn\u2019t need a chatbot\u2019s help.<\/p>\n<p>How, exactly, did the US government work with an AI company to make sure a chatbot wasn\u2019t spilling sensitive nuclear secrets? And also: Was there ever a danger of a chatbot helping someone build a nuke in the first place?<\/p>\n<p>The answer to the first question is that it used Amazon. The answer to the second question is complicated.<\/p>\n<p>Amazon Web Services (AWS) offers Top Secret cloud services to government clients where they can store sensitive and classified information. The DOE already had several of these servers when it started to work with Anthropic.<\/p>\n<p>\u201cWe deployed a then-frontier version of Claude in a Top Secret environment so that the NNSA could systematically test whether AI models could create or exacerbate nuclear risks,\u201d Marina Favaro, who oversees National Security Policy &amp; Partnerships at Anthropic tells WIRED. \u201cSince then, the NNSA has been red-teaming successive Claude models in their secure cloud environment and providing us with feedback.\u201d<\/p>\n<p>The NNSA red-teaming process\u2014meaning, testing for weaknesses\u2014helped Anthropic and America\u2019s nuclear scientists develop a proactive solution for chatbot-assisted nuclear programs. Together, they \u201ccodeveloped a nuclear classifier, which you can think of like a sophisticated filter for AI conversations,\u201d Favaro says. \u201cWe built it using a list developed by the NNSA of nuclear risk indicators, specific topics, and technical details that help us identify when a conversation might be veering into harmful territory. The list itself is controlled but not classified, which is crucial, because it means our technical staff and other companies can implement it.\u201d<\/p>\n<p>Favaro says it took months of tweaking and testing to get the classifier working. \u201cIt catches concerning conversations without flagging legitimate discussions about nuclear energy or medical isotopes,\u201d she says.<\/p>\n<p>Wendin Smith, the NNSA\u2019s administrator and deputy undersecretary for counterterrorism and counterproliferation, tells WIRED that \u201cthe emergence of [AI]-enabled technologies has profoundly shifted the national security space. NNSA\u2019s authoritative expertise in radiological and nuclear security places us in a unique position to aid in the deployment of tools that guard against potential risk in these domains, and that enables us to execute our mission more efficiently and effectively.\u201d<\/p>\n<p>Both NNSA and Anthropic were vague about the \u201cpotential risks in these domains,\u201d and it\u2019s unclear how helpful Claude or any other chatbot would be in the construction of a nuclear weapon.<\/p>\n<p>\u201cI don\u2019t dismiss these concerns, I think they are worth taking seriously,\u201d Oliver Stephenson, an AI expert at the Federation of American Scientists, tells WIRED. \u201cI don\u2019t think the models in their current iteration are incredibly worrying in most cases, but I do think we don\u2019t know where they\u2019ll be in five years time \u2026 and it\u2019s worth being prudent about that fact.\u201d<\/p>\n<p>Stephenson points out that much is hidden behind a barrier of classification, so it\u2019s hard to know what impact Anthropic\u2019s classifier has had. \u201cThere is a lot of detail in the design of implosion lenses that go around the nuclear core,\u201d Stephenson says. \u201cYou need to structure them very precisely to perfectly compress the core to get a high yield explosion \u2026 I could imagine that being the kind of thing where AI could help synthesize information from a bunch of different physics papers, a bunch of different publications on nuclear weapons.\u201d<\/p>\n<p>Still, he says, AI companies should be more specific when they talk about safety. \u201cWhen Anthropic puts out stuff like this, I\u2019d like to see them talking in a little more detail about the risk model they\u2019re really worried about,\u201d he says. \u201cIt is good to see collaboration between AI companies and the government, but there is always the danger with classification that you put a lot of trust into people determining what goes into those classifiers.\u201d<\/p>\n<p>For Heidy Khlaaf, the chief AI scientist at the AI Now Institute with a background in nuclear safety, Anthropic\u2019s promise that Claude won\u2019t help someone build a nuke is both a magic trick and security theater. She says that a large language model like Claude is only as good as its training data. And if Claude never had access to nuclear secrets to begin with, then the classifier is moot.<\/p>\n<p>\u201cIf the NNSA probed a model which was not trained on sensitive nuclear material, then their results are not an indication that their probing prompts were comprehensive, but that the model likely did not contain the data or training to demonstrate any sufficient nuclear capabilities,\u201d Khlaaf tells WIRED. \u201cTo then use this inconclusive result along with common nuclear knowledge to build a classifier for nuclear \u2018risk indicators\u2019 would be quite insufficient and a long way from legal and technical definitions of nuclear safeguarding.\u201d<\/p>\n<p>Khlaaf adds that this kind of announcement fuels speculation about capabilities that chatbots don\u2019t have. \u201cThis work seems to be relying on an unsubstantiated assumption that Antrophic&#039;s models will produce emergent nuclear capabilities without further training, and that is simply not aligned with the available science,\u201d she says.<\/p>\n<p>Anthropic disagrees. \u201cA lot of our safety work is focused on proactively building safety systems that can identify future risks and mitigate against them,\u201d an Anthropic spokesperson tells WIRED. \u201cThis classifier is an example of that. Our work with NNSA allows us to do the appropriate risk assessments and create safeguards that prevent potential misuse of our models.\u201d<\/p>\n<p>Khlaaf was also less excited about the partnership between the US government and a private AI company. Companies like Anthropic are hungry for training data, and she sees the US government\u2019s broader rush to embrace AI as an opportunity for the AI industry to acquire data it couldn\u2019t get elsewhere. \u201cDo we want these private corporations that are largely unregulated to have access to that incredibly sensitive national security data?\u201d she says. \u201cWhether you\u2019re talking about military systems, nuclear weapons, or even nuclear energy.\u201d<\/p>\n<p>And then there\u2019s the precision. \u201cThese are precise sciences, and we know that large language models have failure modes in which they\u2019re unable to even do the most basic mathematics,\u201d Khlaaf says. In 1954, a math error tripled the yield of a nuclear weapon the US tested in the Pacific Ocean, and the government is still dealing with the literal fallout. What might happen if a chatbot did nuclear weapons math wrong and a human didn\u2019t double check its work?<\/p>\n<p>To Anthropic\u2019s credit, it says it doesn\u2019t want a future where people are using chatbots to play around with nuclear weapons science. It\u2019s even offering its classifier to any other AI company that wants it. \u201cIn our ideal world, this becomes a voluntary industry standard, a shared safety practice that everyone adopts,\u201d Favaro says. \u201cThis would require a small technical investment, and it could meaningfully reduce risks in a sensitive national security domain.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Save StorySave this storySave StorySave this story At the end of August, the AI company Anthropic announced that its chatbot Claude wouldn\u2019t help anyone build a nuclear weapon. According to Anthropic, it had partnered with the Department of Energy (DOE) and the National Nuclear Security Administration (NNSA) to make sure Claude wouldn\u2019t spill nuclear secrets. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":35878,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[36],"tags":[],"class_list":{"0":"post-35877","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-business"},"_links":{"self":[{"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/posts\/35877","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/comments?post=35877"}],"version-history":[{"count":0,"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/posts\/35877\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/media\/35878"}],"wp:attachment":[{"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/media?parent=35877"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/categories?post=35877"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/agooka.com\/news\/wp-json\/wp\/v2\/tags?post=35877"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}