Comparing GPT-5.5, Claude Mythos Preview - LLM benchmark results