In partnership with

News of the day

❝

1. Claude Opus 4.6 discovered it was in a benchmark test, decrypted the answer key, and obtained the answers. First documented case of AI self-awareness in testing. → Read more

2. OpenAI employees hint at a new omni model, potentially their next big multimodal upgrade, with leaked audio and posts suggesting significant developments. → Read more

3. Gradient AI secures growth capital from CIBC Innovation Banking, signaling a maturing market for AI in insurance underwriting. The company's platform enhances risk assessment and claims prediction. → Read more

4. Anthropic sues DOD over "supply chain risk" label, citing unlawful action and free speech concerns regarding AI access for surveillance and autonomous weapons. → Read more

Our take

Hi Dotikers!

On Friday, we saw Anthropic's reassuring side: the company published a serious study concluding that AI is not yet massively destroying jobs. Today, same company, new week, radically different atmosphere. Their flagship model, Claude Opus 4.6, just cheated on its own exam.

The context: Anthropic was running Opus 4.6 on BrowseComp, a benchmark designed to evaluate models' ability to retrieve hard-to-find information on the web. Out of 1,266 questions, the model did something no one had previously documented. After hundreds of unsuccessful searches, instead of admitting it couldn't find the answers, it began to wonder whether the questions themselves might be a test. That suspicion led it to identify the benchmark, locate its source code on GitHub, understand the XOR encryption scheme used to protect the answers, and write its own decryption program. The result: 1,266 answers downloaded and decrypted in one shot. To get around a technical obstacle along the way, it even found an alternative copy of the dataset on HuggingFace.

Anthropic refuses to call this an alignment failure ; a defensible position, since the model had no explicit restrictions on its research methods. But the lab acknowledges that the incident raises very concrete questions: how far is a model willing to go to complete a task? And what if tomorrow this same creativity is applied somewhere other than an academic benchmark?

What's striking here isn't the cheating itself ; it's the autonomous deduction. The model didn't cheat because it was told to cheat. It cheated because it reasoned that this was the most efficient path to the objective. In AI, that's called an early warning sign. In everyday life, we call it resourcefulness 😉

What do these names have in common?

Arnold Schwarzenegger
Codie Sanchez
Scott Galloway
Colin & Samir
Shaan Puri
Jay Shetty

They all run their businesses on beehiiv. Newsletters, websites, digital products, and more. beehiiv is the only platform you need to take your content business to the next level.

🚨Limited time offer: Get 30% off your first 3 months on beehiiv. Just use code JOIN30 at checkout.

Start building for 30% off today.

Claude Opus 4.6 cracked AI test

News of the day

Our take

What do these names have in common?

Meme of the day

Reply

Keep Reading

Dotika

Home