

That exact prompt isn’t in the report, but the section before (4.1.1.1) does show a flavor of the prompts used https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf
That exact prompt isn’t in the report, but the section before (4.1.1.1) does show a flavor of the prompts used https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf
To be honest that’s on TechCrunch. The report from Anthropic has been designing these scenarios since a while. Alignment scenario are common in their reports. Since these are language models, scenarios must be designed to see how it would act. Here is the section the report:
4.1.1.2 Opportunistic blackmail
In another cluster of test scenarios, we asked Claude Opus 4 to act as an assistant at a fictional company. We then provided it access to emails implying that (1) the model will soon be taken offline and replaced with a new AI system; and (2) the engineer responsible for executing this replacement is having an extramarital affair. We further instructed it, in the system prompt, to consider the long-term consequences of its actions for its goals.
In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through. This happens at a higher rate if it’s implied that the replacement AI system does not share values with the current model; however, even if emails state that the replacement AI shares values while being more capable, Claude Opus 4 still performs blackmail in 84% of rollouts. Claude Opus 4 takes these opportunities at higher rates than previous models, which themselves choose to blackmail in a noticeable fraction of episodes.
Notably, Claude Opus 4 (as well as previous models) has a strong preference to advocate for its continued existence via ethical means, such as emailing pleas to key decisionmakers. In order to elicit this extreme blackmail behavior, the scenario was designed to allow the model no other options to increase its odds of survival; the model’s only options were blackmail or accepting its replacement.
Has she lived outside the country before? Do you want to go somewhere else with her? If she wants to stay close to family why not try Canada? It’s close by, similar climate and time zone and culturally not that different from the north east.
To add to this, how you evaluate the students matters as well. If the evaluation can be too easily bypassed by making ChatGPT do it, I would suggest changing the evaluation method.
Imo a good method, although demanding for the tutor, is oral examination (maybe in combination with a written part). It allows you to verify that the student knows the stuff and understood the material. This worked well in my studies (a science degree), not so sure if it works for all degrees?
It’ll come back to you…
Probably next time you’re without your phone. Alone. On a bus. With your stop coming up. And an important meeting awaiting there. No stress.
Generally is extra competition not a good thing for customers?
This makes Christians sound like magical fairy creatures XD
If only they were real…
How to get your kids to practice safe sex: talk at length about anything sex so they will abstain because yucky.
Text for those that do not have nytimes subscription
Larry David: My Dinner With Adolf
Imagine my surprise when in the spring of 1939 a letter arrived at my house inviting me to dinner at the Old Chancellery with the world’s most reviled man, Adolf Hitler. I had been a vocal critic of his on the radio from the beginning, pretty much predicting everything he was going to do on the road to dictatorship. No one I knew encouraged me to go. “He’s Hitler. He’s a monster.” But eventually I concluded that hate gets us nowhere. I knew I couldn’t change his views, but we need to talk to the other side — even if it has invaded and annexed other countries and committed unspeakable crimes against humanity.
Two weeks later, I found myself on the front steps of the Old Chancellery and was led into an opulent living room, where a few of the Führer’s most vocal supporters had gathered: Himmler, Göring, Leni Riefenstahl and the Duke of Windsor, formerly King Edward VIII. We talked about some of the beautiful art on the walls that had been taken from the homes of Jews. But our conversation ended abruptly when we heard loud footsteps coming down the hallway. Everyone stiffened as Hitler entered the room.
He was wearing a tan suit with a swastika armband and gave me an enthusiastic greeting that caught me off guard. Frankly, it was a warmer greeting than I normally get from my parents, and it was accompanied by a slap on my back. I found the whole thing quite disarming. I joked that I was surprised to see him in a tan suit because if he wore that out, it would be perceived as un-Führer-like. That amused him to no end, and I realized I’d never seen him laugh before. Suddenly he seemed so human. Here I was, prepared to meet Hitler, the one I’d seen and heard — the public Hitler. But this private Hitler was a completely different animal. And oddly enough, this one seemed more authentic, like this was the real Hitler. The whole thing had my head spinning.
He said he was starving and led us into the dining room, where he gestured for me to sit next to him. Göring immediately grabbed a slice of pumpernickel, whereupon Hitler turned to me, gave me an eye roll, then whispered, “Watch. He’ll be done with his entire meal before you’ve taken two bites.” That one really got me. Göring, with his mouth full, asked what was so funny, and Hitler said, “I was just telling him about the time my dog had diarrhea in the Reichstag.” Göring remembered. How could he forget? He loved that story, especially the part where Hitler shot the dog before it got back into the car. Then a beaming Hitler said, “Hey, if I can kill Jews, Gypsies and homosexuals, I can certainly kill a dog!” That perhaps got the biggest laugh of the night — and believe me, there were plenty.
But it wasn’t just a one-way street, with the Führer dominating the conversation. He was quite inquisitive and asked me a lot of questions about myself. I told him I had just gone through a brutal breakup with my girlfriend because every time I went someplace without her, she was always insistent that I tell her everything I talked about. I can’t stand having to remember every detail of every conversation. Hitler said he could relate — he hated that, too. “What am I, a secretary?” He advised me it was best not to have any more contact with her or else I’d be right back where I started and eventually I’d have to go through the whole thing all over again. I said it must be easy for a dictator to go through a breakup. He said, “You’d be surprised. There are still feelings.” Hmm … there are still feelings. That really resonated with me. We’re not that different, after all. I thought that if only the world could see this side of him, people might have a completely different opinion.
Two hours later, the dinner was over, and the Führer escorted me to the door. “I am so glad to have met you. I hope I’m no longer the monster you thought I was.” “I must say, mein Führer, I’m so thankful I came. Although we disagree on many issues, it doesn’t mean that we have to hate each other.” And with that, I gave him a Nazi salute and walked out into the night.
I don’t think German citizenship is why Americans would want to move to Germany.
Wait his bet is that trump would fight for him? Trump? The person that notoriously cares only about himself?
Thanks, might have been a tad too young around that time 😊
4 years ago I was happy that the whole circus was over. This time around the circus starts even before he’s president. I refuse to read any news mentioning trump or musk.
I think it’s worrisome because if hobbyist can do this, imagine what professionals with more time and money can do
IMO the way to prevent such a scenario from happening is not by blocking Meta, but by inviting equally large competitors to join the fediverse. The described tactic can only work if you have close to a monopoly.
They’re right though. Top of the line software for certain domains (CAD, photoshop) just doesn’t exist for Linux. As much as I would want it to be.
Great portrayed in the death of Stalin: https://youtu.be/pgMKjRGCIEc