Fully automatic censorship removal for language models

Heretic is a tool that removes censorship (aka "safety alignment") from transformer-based language models without expensive post-training. It combines an advanced implementation of directional ablation, also known as "abliteration" (Arditi et al. 2024, Lai 2025 (1, 2)), with a TPE-based parameter optimizer powered by Optuna.

This approach enables Heretic to work completely automatically. Heretic finds high-quality abliteration parameters by co-minimizing the number of refusals and the KL divergence from the original model. This results in a decensored model that retains as much of the original model's intelligence as possible. Using Heretic does not require an understanding of transformer internals. In fact, anyone who knows how to run a command-line program can use Heretic to decensor language models.

https://github.com/p-e-w/heretic

0 comments

AI DevOps Ansible Community

skool.com/ai-devops-ansible-community-6317

AI DevOps Mastermind by Luca Berton: AI, DevOps, Kubernetes & Terraform. Access 50+ hours of courses, hands-on labs, and career-boosting mentorship!

AI Automations by Jack

Ashish Builds Academy – Lite

AI Money Lab

AI Automation Agency Hub

Bring people together around your passion and get paid.