News of the day

❝

1. Anthropic just released Claude Sonnet 4.5, the world's best coding model that can work autonomously for 30+ hours and leads on computer use benchmarks → Read more

2. Former Microsoft executives launch Maximor with $9 million in seed funding to replace Excel-based financial tasks with AI agents → Read more

3. Scott Aaronson used GPT-5 for complex quantum complexity proofs, highlighting iterative learning and the collaborative potential of AI in advanced research → Read more

4. States are adopting varying laws for AI therapy applications, but the rapidly evolving technology landscape poses challenges to effective regulation → Read more

Our take

Hi Dotikers!

Anthropic needed this moment. After grueling weeks where their infrastructure buckled under the pressure of Claude Code, and as GPT-5 Codex seriously nibbled away at their historical lead in coding, the pressure was at its peak. Critics came out swinging, the developer community grew frustrated, and their leadership position wavered.

But today, Anthropic strikes back with Claude Sonnet 4.5, and on paper, it's exactly what was needed.

Performance-wise, the numbers speak for themselves: 77% on SWE-bench Verified (the gold standard for coding), with the ability to work autonomously for over 30 hours without losing the thread. On its ability to manipulate a computer like a human would, Claude now reaches 61% on OSWorld, a staggering 45% improvement in just 4 months. In concrete terms, the model can now navigate the web, fill out spreadsheets, and accomplish complex tasks directly in your browser.

The ecosystem is also expanding with integrated Office file creation (Excel, PowerPoint, Word), a Chrome extension for Max subscribers, and most importantly the Claude Agent SDK, the infrastructure Anthropic uses internally that's now accessible to all developers. All at the same price as Sonnet 4: $3 input and $15 output per million tokens.

Anthropic also emphasizes alignment: this would be their safest model to date, with a drastic reduction in problematic behaviors like excessive flattery, deception, or power-seeking.

The real question now: will the infrastructure hold up? Will Sonnet 4.5 truly reclaim the coding throne from GPT-5 Codex? The benchmarks are impressive, but we all know the gap between theory and reality.

Tweet of the day

— # (#)

The new coding king!

News of the day

Our take

Tweet of the day

Reply

Keep Reading

Dotika

Home