The International Mathematical Union (IMU) has endorsed a recommendation calling on academics and organisations to protect scientific integrity in mathematical research from artificial intelligence.
It comes amid rapidly advancing AI abilities and recent claims that an AI tool solved a decades old geometry problem. The list of recommendations, called the Leiden Declaration, envisions a future that preserves human values in scientific research, such as accountability for correctness, attribution of ideas to their authors, and ethics.
With a growing number of signatories, this declaration is increasingly seen as much-needed guidance on proper AI use in a rapidly changing field. It will be presented to the International Congress of Mathematicians in July.
The vision proposed by the document embraces AI tools for research, but puts human values at its centre. There are recommendations for mathematicians, for organisations and for policymakers to maintain the integrity of science.
READ MORE
Researchers are encouraged to clearly and accurately report AI tool use in research and to participate in public discourse on AI-generated results. Policymakers are meanwhile urged to regulate the AI industry and to not believe the hype generated by an industry with “strong commercial incentives” to overstate their capability.
The declaration originated from a workshop last September in Leiden, the Netherlands which gathered thinkers and policymakers from around the world to discuss the effects AI may have on mathematical research. In the months since, there has been rapid progress in the mathematical capabilities of automated tools.
In December, Johannes Schmitt, a research fellow at ETH Zurich, announced a result that he claims was “discovered and proved in collaboration with AI”. He formulated a research-level question in algebraic geometry and ran it through multiple AI tools, which began to converge on a correct solution. Although he stressed that the “neat little result”, which is not yet peer-reviewed, is only a minor addition to the literature, it marked a significant step for AI in fundamental mathematics research.
The most recent leap came in May, when OpenAI, the company behind ChatGPT, claimed that an internal version of their model solved an open problem that had stumped geometers for eighty years. Their solution to the Unit Distance Conjecture, posed by Hungarian mathematician Paul Erdős in 1946, has not yet faced traditional peer review, but instead was posted on OpenAI’s website alongside remarks from prominent mathematicians discussing its value.
The paper does not list any human author, and the details of the “internal model” are sparse. The paper also does not disclose how much time or resources went into generating this solution. This cryptic style of communication of results is what the Leiden Declaration aims to discourage. The lack of details has so far led to excitement and scepticism about the importance of this result.
Prof David Holmes was one of the 16 members of the working group that drafted the Leiden Declaration. Speaking from his office in the Mathematical Institute in Leiden, he stresses caution when interpreting claims from OpenAI. “There’s no refereeing process as such,” he says. Instead, he says, the company “hand-picks a few people to give commentary”.
The declaration makes a clear recommendation to policymakers who might be swayed by such unorthodox publishing of results.
“Don’t believe the hype,” it says. “There is currently a strong commercial incentive on the part of the technology industry to overstate the capabilities of their products. Consult with experts, including mathematicians, in forming policy decisions rather than relying on press releases or popular reporting of mathematical results.”
Holmes says OpenAI and others “have a product to sell and are going to be trying to spin things as hard as they can to make themselves look good”.
The report adds fuel to the debate surrounding AI agents’ true aptitude for research mathematics. While some claims may be overstated, Holmes says “it is definitely not all hype”.
Earlier AI-generated proofs didn’t always live up to expectations, he says. “It sounded cool but then when you look more closely, I got the impression it was finding things already in the literature. Whereas this new one, it really seems genuinely new.”
Some mathematicians have attempted to accurately assess AI capabilities by running each agent through a set of standardised tests. Schmitt leads IMProofBench, a project whose aim is to evaluate these agents’ ability to “create research-level mathematical proofs”.
IMProofBench is a collection of research-level questions across fundamental mathematics aimed at measuring “genuine mathematical reasoning”, according to its website. The questions are kept private in order to prevent them from being used for AI training data. Despite this, the AI tools’ scores are steadily improving.
Another set of benchmark tests is Humanity’s Last Exam, which also contains research-level questions from academics in mathematics, the humanities and natural sciences from around the world. Though similar to IMProofBench, this test also asks each AI agent to estimate how confident it is in its answers. The current leader is Gemini 3.1 Pro, which scored 45.9 per cent while expressing confidence in about 50 per cent of its answers.
[ Humans may be on the way out. But at least the humanities are backOpens in new window ]
Despite producing imperfect results, AI tools appear to be getting better at solving research-level problems. They are certainly not able to replace human mathematicians yet, but the Leiden Declaration acknowledges that the effects of AI can already be seen on the employment market. The authors note that academic researchers have been offered “lucrative jobs” in industry. They warn of a future where funding is directed towards questions that are more easily addressed using AI tools than with human effort.
The IMU endorsement of this declaration marks yet another call for regulation of AI. In recent weeks, the United Nations has decried the environmental destruction posed by AI, while Pope Leo has called for AI to be “disarmed”. The declaration acknowledges the environmental impact of these AI agents and the ethical concerns of their use in warfare and mass surveillance.

It does not, however, call for a ban or boycott of using AI in research. “If academic mathematicians all ... refused to have any interactions with this, in the end they are going to carry on without us anyway,” says Holmes. “If your goal was to protect our little corner of the world for as long as we can, then maybe you could argue for something like that.”
On the contrary, the authors of the declaration have a vision of the future of mathematics as one that embraces AI tools, but where human values play a guiding role.
The declaration encourages mathematicians to stay informed of emerging technologies. In this way, scientists can more accurately assess claims about the depth and difficulty of AI-generated results, and use this expertise to inform organisations and policymakers.
“One of the things we want is that mathematicians should get more engaged with this,” says Holmes. “People should try to stay on top of latest developments.”
The growing list of signatories from every branch of mathematical research – which now numbers more than 2,000 and includes professors, institute directors and Fields medallists – reflects a growing appetite for regulations and guidelines for AI use, not only in mathematics but in other areas of research and broader society.
Holmes recommends that researchers in other fields “accept that this big change is on its way and think about how we want it to look”.
Shane Gibbons is a PhD candidate in Mathematics at Leiden University


















