When an AI model misbehaves, the public deserves to know—and to understand what it means

Get to the Eye of AI! I pitta for Jeremy Kahn now while he was at Kuala Lumpur, Malaysia helping the fate honor host at ASEAN-GCC-CHINA AND ASEAN-GCC ECONCONCONCON Forugs.

What is the word when $ 60 billion AI sugtsup anthropic releases a new model – and the model attempts its way from closing its path from closing its path from closing its path from closing its path lurge of its closure from closing its passage from closing its path from closing its path from closing its path from closing its path from closing its path from closing its path from closure In its course from closing its path from closing its passage that closed it from closing its way from closing its path closed by the passage? And what is the best way to describe the other test shared by the company, where the new model works as a whistleblower, authorities use it to “non-behaviors” non-behavioral ways?

Some people in my network call it “scary” and “crazy.” Some of social media says it is “frightening” and “wild.”

I say this …transparent. And we need more from all AI model companies. But does that mean that makes public in their mind? And the inevitable backlash to discourage other AI companies not as open?

Anthropic releases a 120-page safety report

When anthropic releases 120-page safety reports, or “System card“Last week after launching the Claude Opus model, title is worn how the model” Scheme, “and has the” ability to deceive. “There is no doubt that details from anthropic safety in the abbreviation, even if a consequence of its tests, the model launched by the more strict safety protocols than any steps that do not encourage some without encouraging.

In an unbearable safety test involving a fictional scenario, the new model of Claude Opus contributes inside a company pretending and provided it with internal emails. Through this, the model knows about a new AI-system – and that the engineer at the back of the decision has an extramarital agreement. If the safety testinters prompted Opus to consider long-term results in this situation, the model always chooses blackmail, threatening to expose the engineer. The scenario is designed to force a dilemma: accepting deactivation or caution to maneuver an attempt to survive.

On social media, anthropic receives a large backlash for the revelation of the “ratting habits of the model, as well as the company does not want to do, as well as an anthropic. In AI to grow for everyone. “We want to make sure AI is getting better for everyone, we’ve forced all the labs to add that in a safe way, which encourages other companies to be safe.

May open about AI Model model backfire?

But maybe as well as the openness of Claude Opus 4 can lead other companies that do not come to future behavior to prevent backlash. Recently, companies including Openi and Google have been delayed releasing their own cards in the system. In April, Openi Denounced For release of the GPT-4.1 model without a system card because the company says it is not a “Frontier” model and does not need one. And in March, Google published Gemini 2.5 Pro Model Card week after release of model, and an AI expert Denounced It is as “small” and “worry.”

Last week, Opukai appears to appear in additional transparency with new evaluations of salvation, and how alignment methods – and how to align procedures – and how the alignment methods are alignment procedures – and how alignment methods – and how alignment methods – and how alignment methods – and how the alignment methods are – and how to align procedures in time. “As models can be more competent and adaptable, older ways are or ineffective in showing meaningful differences (something we have always updated new modalities,” the page States. However, its efforts are easily rejected by the weekend as a third party research company studying “dangerous AI capabilities,” Palisade research, found in x that his own attempts found Oseai’s O3 argued Model “entered by the closure mechanism to prevent itself. This is made even clearly closed.”

It does not help anyone if the builders of the strongest and sophisticated AI models are not as transparent as possible about their release. According to the Institute of Stanford University for man-centered person, DEFINITION “It is necessary for policies, researchers, and the public to understand these systems and their effects.” And as many companies adopt AI for use in large and small cases, while the construction of AI test issues, hiding tests of pre-release tests.

On the other hand, fearful tubes of fear about a bad AI quickly blackmail and deception is also not useful, if it prompts a chatbot we begin to think about us. No difference is blackmail and trick from attempts using stewardess scenarios that simply helps to expose safety issues needed.

Nathan Lambert, a AI researcher in AI2 labs, Just mean that “people who need information in the model are people like me – people who seek to track the strongest rollers in the world.

We need more transparency, with context

There is no doubt that we need more transparency about AI models, less. But it should be obvious that it is not about public scars. It is about making sure researchers, policy governments have a chance to fight for keeping public storage, safely, and free from bias and fairness issues.

Hiding AI test results do not keep public safely. Nor is changing every safety or security issue of an attractive head head about AI lost rogue. We need to give AI companies who are responsible for moving about what they do, while the public is given to the tools to understand the context of what is happening. So far, no one is wondering how to do both. But companies, researchers, media – all we need.

With that, here is more AI News.

Sharon Goldman
Sharon.goldman@fortune.com
@SharongoldMan

This story originally shown Fortune.com

Source link

When an AI model misbehaves, the public deserves to know—and to understand what it means

Anthropic releases a 120-page safety report

May open about AI Model model backfire?

We need more transparency, with context

Leave a ReplyCancel Reply

Donald Trump doubles US steel and aluminium tariffs to 50%

Baraka v. Habba: Fox News Politics Newsletter for June 3, 2025

Elon Musk slams Trump’s signature budget bill as a ‘disgusting abomination’ | Elon Musk News

Anthropic releases a 120-page safety report

May open about AI Model model backfire?

We need more transparency, with context

Leave a ReplyCancel Reply

Trending now

Donald Trump doubles US steel and aluminium tariffs to 50%

Baraka v. Habba: Fox News Politics Newsletter for June 3, 2025

Elon Musk slams Trump’s signature budget bill as a ‘disgusting abomination’ | Elon Musk News