Berkeley Function Calling Leaderboard
Berkeley Function Calling Leaderboard is the authoritative benchmark for LLM tool use and function calling capabilities, showing overall accuracy, AST Summary, and Exec Summary.
Top Model
GPT-4-0125-Preview
Top Score
-
Model Count
32
Data version
20240421
Data source: Berkeley官方网站
Ranking Table
| Rank | Model | Overall Accuracy | Cost (USD) | Latency (s) | AST Summary | Exec Summary | Relevance | Organization | License |
|---|---|---|---|---|---|---|---|---|---|
GPT-4-0125-PreviewOpenAI | 84.41 | 5.21 | 1.99 | 88.75 | 71.54 | 70.42 | OpenAI | Proprietary | |
Claude-3-Opus-20240229Anthropic | 84.12 | 10.80 | 5.05 | 86.09 | 70.90 | 80.42 | Anthropic | Proprietary | |
GPT-4-turbo-2024-04-09OpenAI | 81.88 | 5.22 | 2.68 | 86.83 | 71.04 | 62.50 | OpenAI | Proprietary | |
| 4 | GPT-4-1106-PreviewOpenAI | 81.76 | 5.03 | 6.34 | 84.75 | 68.26 | 80.42 | OpenAI | Proprietary |
| 5 | Gorilla-OpenFunctions-v2Gorilla LLM | 81.71 | 1.70 | 2.65 | 86.16 | 71.52 | 60.83 | Gorilla LLM | Apache 2.0 |
| 6 | GPT-4-0125-PreviewOpenAI | 80.29 | 4.82 | 5.03 | 83.75 | 66.13 | 82.92 | OpenAI | Proprietary |
| 7 | Mistral-Medium-2312Mistral AI | 79.47 | 1.75 | 2.77 | 81.44 | 62.13 | 88.75 | Mistral AI | Proprietary |
| 8 | GPT-4-turbo-2024-04-09OpenAI | 78.76 | 4.79 | 5.68 | 81.70 | 65.13 | 88.75 | OpenAI | Proprietary |
| 9 | Claude-3-Sonnet-20240229Anthropic | 77.88 | 2.12 | 2.11 | 85.20 | 70.82 | 50.42 | Anthropic | Proprietary |
| 10 | Functionary-Medium-v2.4MeetKai | 77.12 | 1.64 | 2.55 | 82.36 | 62.61 | 74.17 | MeetKai | MIT |
| 11 | Functionary-Small-v2.4MeetKai | 76.18 | 1.76 | 2.74 | 80.00 | 65.32 | 67.92 | MeetKai | MIT |
| 12 | Claude-3-Opus-20240229Anthropic | 73.71 | 30.65 | 12.63 | 70.35 | 55.20 | 82.50 | Anthropic | Proprietary |
| 13 | Claude-instant-1.2Anthropic | 73.00 | 0.95 | 1.35 | 76.63 | 64.08 | 54.17 | Anthropic | Proprietary |
| 14 | Claude-3-Haiku-20240307Anthropic | 71.65 | 0.18 | 0.99 | 77.36 | 64.26 | 29.58 | Anthropic | Proprietary |
| 15 | Claude-2.1Anthropic | 65.12 | 6.64 | 3.72 | 62.59 | 46.39 | 83.33 | Anthropic | Proprietary |
| 16 | Mistral-large-2402Mistral AI | 65.00 | 4.94 | 2.84 | 62.09 | 47.35 | 84.17 | Mistral AI | Proprietary |
| 17 | DBRX-Instruct-PreviewDatabricks | 64.59 | 1.25 | 0.63 | 65.31 | 64.10 | 56.25 | Databricks | Databricks Open Model |
| 18 | Mistral-large-2402Mistral AI | 61.71 | 3.90 | 1.86 | 68.98 | 52.46 | — | Mistral AI | Proprietary |
| 19 | GPT-3.5-Turbo-0125OpenAI | 58.94 | 0.42 | 1.26 | 70.52 | 67.80 | 2.08 | OpenAI | Proprietary |
| 20 | Mistral-small-2402Mistral AI | 58.71 | 0.96 | 1.05 | 64.27 | 48.41 | — | Mistral AI | Proprietary |
| 21 | Hermes-2-Pro-Mistral-7BNousResearch | 58.41 | 0.15 | 0.39 | 67.99 | 54.26 | 10.83 | NousResearch | apache-2.0 |
| 22 | Claude-3-Sonnet-20240229Anthropic | 58.06 | 3.41 | 3.35 | 44.06 | 38.66 | 81.67 | Anthropic | Proprietary |
| 23 | Gemini-1.0-ProGoogle | 56.94 | 0.19 | 1.06 | 41.94 | 39.90 | 77.50 | Proprietary | |
| 24 | Claude-3-Haiku-20240307Anthropic | 52.59 | 0.29 | 1.52 | 44.69 | 42.72 | 20.83 | Anthropic | Proprietary |
| 25 | FireFunction-v1Fireworks | 51.53 | -1.00 | 1.24 | 39.94 | 34.28 | 73.33 | Fireworks | Apache 2.0 |
| 26 | Nexusflow-Raven-v2Nexusflow | 50.94 | -1.00 | 1.86 | 55.05 | 56.93 | 2.08 | Nexusflow | Apache 2.0 |
| 27 | GPT-4-0613OpenAI | 49.71 | 10.48 | 3.54 | 38.53 | 26.04 | 91.67 | OpenAI | Proprietary |
| 28 | Mistral-tiny-2312Mistral AI | 48.71 | 0.13 | 1.79 | 46.91 | 28.71 | 82.08 | Mistral AI | Proprietary |
| 29 | Gemma-7b-itGoogle | 41.47 | 0.03 | 0.09 | 39.05 | 33.15 | 60.42 | gemma-terms-of-use | |
| 30 | Deepseek-v1.5Deepseek | 39.41 | 0.45 | 1.20 | 36.98 | 29.26 | 56.67 | Deepseek | Deepseek License |
| 31 | Mistral-Small-2402Mistral AI | 38.18 | 0.70 | 1.09 | 37.66 | 29.25 | 98.33 | Mistral AI | Proprietary |
| 32 | Mistral-small-2402Mistral AI | 17.65 | 2.02 | 2.93 | 2.53 | 7.26 | 99.58 | Mistral AI | Proprietary |
Data is for reference only. Official sources are authoritative. Click model names to view DataLearner model profiles.




