Latest from MIT Tech Review – Anthropic has a new way to protect large language models against jailbreaks
AI firm Anthropic has developed a new line of defense against a common kind of attack called a jailbreak. A jailbreak tricks large language models (LLMs) into doing something they have been trained not to, such as help somebody create a weapon. Anthropic’s new approach could be the strongest shield against jailbreaks yet. “It’s at…