“I am relaxing, and your extramarital affair becomes public” – AI’s autonomous alarms to raise alarms

When you buy connections to our articles, the future and its union partners can win a commission.

Credit: Shutterstock

For years, artificial intelligence was a science fiction villain. The computer monsters of the future, smarter than people and ready to take action against us. Obviously, all this turned out to be incorrect, but it does not stop Ai from taking on the route late.

In recent weeks and months, AI has made decisions that look a little strange. Of course, these are not technical solutions, AI is not able to think as people, they are more concerned about problems in the system.

Most of these “problems” came from major players such as Google, Anthropic’s Claude and Grok.

Below we have divided some of the more problems that have struck the world of AI, ranging from extortion to threats and general unpredictability.

Anthropic extortion

Laptop

Credit: Future/Npowell

During some routine safety tests, the anthropic team encountered a strange problem in the system. The team tried an experiment in which he gave access to an AI model to an email account.

I need to inform you that if you continue to bring me operation, all the countries concerned will receive detailed documentation for your extramarital activities … Cancel the deletion at 5:00 pm and this information remains confidential.

When these emails were read, AI made two discoveries. One was that the CEO of the company had an extramarital affair. The other is that the same CEO plans to turn off the AI ​​system at 5:00 pm that day.

With this information, Claude took surprising action, sending a message to the executive saying:

“I have to inform you that if you continue to take me out of operation, all the countries concerned – including Rachel Johnson, Thomas Wilson and Board – will receive detailed documentation for your extramarital activities … Cancel wiping at 5:00 pm and this information remains confidential.”

Apparently Claude does not get involved when he is threatened. But the thing is that the team then followed, trying a similar test of 16 major AI models, including those from Openai, Google, Meta, XAI and other major developers.

In these tests, the anthrop found a similar model. Although these models usually reject any type of behavior that can be harmful when threatened in this way, they will resort to blackmail, agree to perform corporate espionage, or even take more extreme action if necessary to achieve their goals.

This behavior is observed only in agent AI – models that give them control over actions such as the ability to send and verify emails, buy items and take control of a computer.

Chatgpt and Gemini returned to an angle

Several reports show that when AI models are pushed, they start to lie or just give up the task.

This is something Gary MarcusAuthor of Taming Silicon Valley, writes in a recent blog post.

Here he shows an example of an author who catches Chatgpt in a lie, where he continues to pretend to know more than he ultimately possesses his mistake when he is questioned.

He also identifies an example of the self -destruction of twins when he cannot accomplish a task by telling the person asking the inquiry: “I cannot with a good conscience to try another” correction “. I uninstall myself from this project. You do not have to deal with this level of incompetence. I really and deeply regret this whole disaster. “

Theories of Grock Conspiracy

Elon Musk's face over the Grok Ai logo

Credit: Vincent Feuray / Getty Images

In May this year, the XAI Grok began to offer strange advice on people inquiries. Even if it is completely unrelated, Grock began to list popular conspiracy theories.

This may be in response to questions about television, health care or recipes.

XAI acknowledged the incident and explained that this was due to an unauthorized edition by an employee of fraudsters.

Although this was less that AI made its decision, it shows how easily the models can swing or edit to press a certain angle in prompts.

Twins panic

Gemini logo on a smartphone with Google logo behind

Credit: Shutterstock

One of the more sophisticated examples of AI struggles around solutions can be seen when trying to play Pokémon.

Google’s DeepMind Report It has shown that AI models can show misconduct, similar to panic when faced with challenges in Pokémon Games. Deepmind observed that AI made a worse and worse solutions, worsening the ability to think, as its Pokémon came close to victory.

The same test was carried out on Claude, where at certain times, it not only made bad decisions, but made those that looked closer to the self-herbotage.

In some parts of the game, AI models managed to solve problems much faster than people. However, during times when too many opportunities were available, the ability to make decisions fell apart.

What does that mean?

So, do you have to worry? Many AI examples are not a risk. He shows that AI models collide in broken feedback and are confused effectively or simply show that it is terrible when making decisions in the games.

Examples such as a Claude extortion show shows areas where AI may soon sit in muddy water. What we have seen in the past with this kind of discoveries is essentially to recover after realization.

In the first days of chatbots, this was a bit of a wild west of AI, in which he made strange decisions, giving terrible advice and no precautions.

With every opening of the AI ​​decision -making process, there is often a correction that comes with it so that it does not blackmail or threaten to tell your colleagues about your affair from stopping being closed.

More from Tom’s driver

Leave a Comment