IMO Committee Confirms Gold-Level Performance by Frontier AI Models from OpenAI and Google

Tuesday, July 22, 2025

For the first time in history, artificial intelligence systems developed by OpenAI and Google DeepMind have achieved gold-medal-level performance at the International Mathematical Olympiad (IMO), most notably without the use of the internet and external tools. Both companies have stated that the models used in the competition were “general reasoning models” and not specific to mathematical reasoning, signaling a major leap towards Artificial General Intelligence

The IMO, a global mathematics competition for high school students, features six highly challenging problems. Both companies reported that their AI models correctly solved five out of six, meeting the threshold for a gold medal score under the competition’s official scoring criteria.

Google’s DeepMind collaborated directly with the IMO to have its system officially graded and certified. Their model, Gemini Deep Think, used natural language to process and solve the problems within the official 4.5-hour time window, representing a significant departure from previous approaches that relied on formal logic or symbolic computation.

OpenAI did not formally enter the competition but published its results independently, stating that three external IMO gold medalists verified its AI’s performance on this year’s problems. OpenAI’s system used an experimental architecture that scaled up “test-time compute,” enabling the model to explore multiple reasoning paths in parallel for extended durations. Researcher Noam Brown of OpenAI noted the approach was computationally expensive but effective.

Experts describe the achievement as ‘revolutionary’.

Of the 630 students competing at this year’s IMO, held on the Sunshine Coast in Queensland, Australia, 67 contestants—or roughly 11%—earned gold medals. The official AI results from participating companies are expected to be made public by July 28.

The achievement suggests a potential inflection point not only in mathematical reasoning but also in broader scientific domains such as physics and formal logic, where similar techniques may be applied.