Step 3.5 Flash Benchmark Details

Step 3.5 Flash currently shows benchmark results led by AIME2025 (6 / 106, score 99.80), LiveCodeBench (13 / 120, score 86.40), τ²-Bench (5 / 40, score 88.20).

Benchmark Results

Step 3.5 Flash

Benchmark Results

Thinking
Tool usage

General Knowledge

2 evaluations
Benchmark / mode
Score
Rank/total
56.50
39 / 65
53.50
42 / 65

Coding and Software Engineer

2 evaluations
Benchmark / mode
Score
Rank/total
86.40
13 / 120
74.40
38 / 108

Math and Reasoning

4 evaluations
Benchmark / mode
Score
Rank/total
99.80
6 / 106
97.30
18 / 106
86.70
6 / 20
85.40
8 / 20

Agent Level Benchmark

1 evaluations
Benchmark / mode
Score
Rank/total
88.20
5 / 40

AI Agent - Information Search

1 evaluations
Benchmark / mode
Score
Rank/total
69
22 / 45

AI Agent - Tool Usage

1 evaluations
Benchmark / mode
Score
Rank/total

Claw-style Agent Evaluation

2 evaluations
Benchmark / mode
Score
Rank/total
Pinch Bench
Thinking EnabledTools
85.30
15 / 37
Claw Bench
Thinking EnabledTools
84.90
16 / 29