Comparing GPT-5.4, GPT-5.2 - LLM benchmark results