Comparing GPT-5.1, Claude Sonnet 4.5 - LLM benchmark results