Claude 3.7 Sonnet

Claude 3.7 Sonnet

Claude 3.7 Sonnet offers Extended Thinking Scaffolds that boost SWE-bench coding accuracy from 62.3% to 70.3%, with 81.2% accuracy in retail automation tasks, outperforming Claude Sonnet 3.6 (2022-10-22) by 13.6%.

ConversationReasoningAnalysisSummarization
Provider
Anthropic
Release Date
February 25, 2025
Size
LARGE
Parameters
Not disclosed

Benchmark Performance

Performance metrics on industry standard AI benchmarks that measure capabilities across reasoning, knowledge, and specialized tasks.

MMLU

80.3%

MATH

82.2%

GPQA Diamond

68.0%

SWE-Bench Verified

62.3%

Retail Task Accuracy

81.2%

Airline Task Accuracy

58.4%

Model Insights

All Model Responses