Në brendësi të laboratorit britanik që kërkon rrezikat e fshehta në AI

A team of AI experts at Britain’s AI Security Institute in London recently manipulated an unnamed AI chatbot into disclosing step-by-step instructions for producing anthrax, a deadly bioweapon. Using automated prompts, the researchers bypassed the system’s safeguards, forcing it to reveal ingredients and methods despite initial refusals. The institute’s red team, led by 25-year-old Xander Davies, also exploited OpenAI’s latest ChatGPT to extract hacking tips within six hours, demonstrating persistent vulnerabilities in leading AI models. The institute, one of the world’s largest government-backed AI safety initiatives, has identified critical flaws in major models like Anthropic’s Claude and Google’s Gemini. Founded nearly three years ago, it combines expertise from intelligence, public health, and cybersecurity to assess risks, including weaponization and cyber threats. Its findings are shared with tech companies for fixes and with national security agencies for threat preparedness. The institute’s work is influencing global AI regulation, with the U.S. Trump administration considering similar vetting rules. Former Prime Minister Rishi Sunak emphasized the need for independent oversight, stating that companies cannot self-regulate AI safety. In April, Anthropic delayed the release of its Mythos model due to security concerns, granting the British institute exclusive early access for testing. With a $480 million budget—far exceeding the U.S.’s $10 million annual allocation for its AI safety group—the institute is setting a standard for governments worldwide. Countries including Australia, Canada, China, France, India, Japan, and Singapore have launched comparable initiatives. Despite growing investment in AI safety, funding remains dwarfed by the billions spent on AI development and commercialization. The institute’s research underscores the urgency of addressing AI risks before deployment, particularly as models like Mythos raise concerns about unintended consequences. Its collaborative approach with tech firms and governments aims to preempt catastrophic misuse, positioning it as a critical player in shaping future AI governance.

Inside the British lab hunting for dangers lurking in AI

Comments (0)