Linguistic jailbreaking of AI - being coercively nasty to software
The journalist Jamie Bartlett has wriiten a book about how to talk to AI amorally. He describes the subject in a newspaper article at - https://www.theguardian.com/technology/2026/apr/29/meet-the-ai-jailbreakers-i-see-the-worst-things-humanity-has-produced Maybe it’s unsurprising that the ‘AI hackers’ he interviewed were emptionally drained and sometimes damaged by the experience of being deliberately coercive in language for apparently bad - yet ultimately good - ends. He says something I’ve seen elsewhere, but find hard to fully believe - “No one – not even the people who build them – knows precisely how these models work, which means no one knows how to make them fully safe, either. We pour vast amounts of data in and something intelligible (usually) comes out the other end. The bit in the middle remains a mystery.” Is that really true? He concludes - ““I’ve seen other jailbreakers go beyond their limits and have breakdowns,” says Tagliabue. Originally from Italy, he recently moved to Thailand to work remotely. “I see the worst things that humanity has produced. A quiet place helps me stay grounded,” he says. Every morning he watches the sunrise from the nearby temple, and a picture-perfect tropical beach is five minutes’ walk away from his villa. After yoga and a healthy breakfast, he switches on his computer, and wonders what else is going on inside the black box, and what makes these mysterious new “minds” say the things they do.”.