An Interactive Report on the Evolution of LLMs (2023 - 2025)
The pace of improvement is breathtaking. The highest MMLU Pro scores jumped from the low 70s to over 90 in just over a year (2024-2025). As seen in the first chart, closed-source models from OpenAI, Google, and xAI still define the cutting edge, but the performance gap is narrowing rapidly due to powerful open-source alternatives.
While parameter counts reach astronomical levels (e.g., Kimi K2 at 10 Trillion), a strong counter-trend has emerged. Models in the 7B to 70B parameter range are achieving performance once exclusive to trillion-parameter giants. Over 45% of models released since 2024 have fewer than 100 billion parameters, yet many, like Qwen, Llama 3, and GLM-4, offer excellent performance, highlighting a new focus on architectural innovation and data quality over raw scale.
The open-source landscape has matured dramatically. As the timeline chart shows, "Free for Commercial Use" is now the dominant license type for new open models, making up over 70% of all open-source releases in the dataset. This trend, led by companies like Meta, Alibaba, and Mistral AI, is democratizing access to SOTA technology.
The market is moving beyond monolithic "do-everything" models. We see a clear rise in specialized models designed for specific tasks. Coding and Multimodal models together account for over 30% of all models released since Jan 1, 2024. This specialization allows for higher performance on targeted tasks with smaller, more efficient architectures, as shown in the specialization chart below.
These interactive charts visualize the key trends identified from the data. You can hover over data points for more details.
This scatter plot tracks the MMLU Pro score against the model release date, color-coded by open-source status. It clearly shows the accelerating performance curve and the increasingly competitive role of open-source models.
This chart plots model performance on the challenging LiveCodeBench
against their parameter count (log scale). It reveals that while size often correlates with performance, highly optimized models can punch far above their weight class.
This stacked bar chart shows the number of models released per quarter, broken down by their licensing status. The dramatic growth of "Free for Commercial Use" models since late 2023 is evident.
This pie chart illustrates the distribution of models released since Jan 1, 2024, by their primary type. It highlights the shift from general-purpose "Base" models towards more specialized Chat, Coding, and Multimodal variants.
This table highlights a curated list of models that represent the key trends in performance, efficiency, and specialization.
Model Name | Organization | Type | Params (B) | Open Source | Key Score |
---|