Comparing GPT-5, GPT-4.1 - LLM benchmark results