Comparing GPT-5.5, GPT-5.1 - LLM benchmark results