Comparing GPT-5.4, GPT-5.1 - LLM benchmark results