Anthropic evaluated the model’s programming capabilities using a benchmark called SWE-bench Verified. Sonnet 4.5 set a new ...