Comparing GPT-5.5, Opus 4.7 - LLM benchmark results