Beyond the Truth: Why Science Must Protect Data Provenance in the Age of Generative AI

Pubblicato il 3 marzo 2026 alle ore 10:44

When data lose their innocence, the scientific method must learn to defend itself.

This reflection stems from an article published on the SETI Institute website(link at the bottom). It is neither a translation nor a summary, but a personal commentary inspired by that reading.

Ghosts in the laboratories

In 2022, it was discovered that a 2006 study on Alzheimer's disease, long considered a milestone, contained manipulated images. For sixteen years, pharmaceutical research and development programs walked on shaky foundations.
Science self-corrects, certainly. But what truly destabilizes it is not incorrect theories—it is the contamination of the data upon which all theories are built.
Today, this risk has a new dimension: the tools to fabricate credible data are no longer the preserve of a few experts. They are widespread, accessible, powerful.

The problem is not AI that thinks, but AI that “observes”

When we talk about artificial intelligence, it's easy to slip into science fiction: machines surpassing human intuition or discovering laws of nature beyond our comprehension.
The real risk is more subtle.
Modern generative AIs are not limited to texts or images. They can produce complete datasets, instrumental signals, satellite images, spectroscopic traces, and audio recordings statistically indistinguishable from real ones.
The danger is not that AI becomes a new scientist. It is that it becomes a perfect eyewitness, capable of inventing “evidence” that seems authentic.

The Alzheimer’s legacy and the ghost of “Cold Fusion”

The history of science is full of wrong turns born from bold but sincere hypotheses. The 2006 Alzheimer's case is different: it is a dead end created by manipulated data. An example of how an error at the foundation can divert an entire field of research.
Now imagine the same dynamic with tools capable of generating data so perfect that they seem real. MRI datasets, radio astronomy signals, microscopic images… produced by generative models.
How many years of work could be wasted chasing shadows?

The “SETI problem” for everyone

An article from the SETI Institute offers a telling example: the day we receive a possible extraterrestrial signal, the scientific community's first reaction will not be celebration, but suspicion. Interference? Instrumental error? Human-made?
Today we must add a new question: was it generated by a model?
This concern is not limited to astronomy. Every discipline must now confront the same question:
-Is that climate data real or an artifact?
-Is that astronomical image captured or synthetic?
-Was that genetic sequence measured or “imagined”?
If we cannot demonstrate the chain of custody of a data point—from the observed phenomenon to the published result—science does not merely slip into error. It dissolves into doubt.

A possible path: traceability and transparency

A possible response is already being tested in areas related to scientific research: immutable recording systems to certify the origin of data.
When an instrument measures a phenomenon, the information can be anchored to a timestamp, a location, and a specific chain of verifiable instruments and processes. Once recorded, that trace cannot be altered without leaving evidence.
This does not guarantee the “truth” of the data in a philosophical sense. But it can guarantee that it was not manipulated after its acquisition.

Beyond truth, toward protected objectivity

Science does not seek absolute Truth. It seeks objectivity: ensuring that conclusions follow the data, that hypotheses yield to measurements, that evidence disciplines imagination.
But objectivity begins at the source. At the moment a photon hits a sensor and that signal becomes data.
If we lose the ability to certify that moment, we lose the very foundation of the method.
The question is not whether science will adopt tools to verify data provenance. It is when.
Because the alternative is a world where every discovery can be dismissed with a sigh:
“I wonder if it was generated by an AI?”
And that systematic doubt would be an enormous price to pay for knowledge.

My opinion

The central point, in my view, is this: we should not fear that AI thinks better than us. We should worry that it becomes a perfect witness, capable of making reality and fiction indistinguishable.
The technology that creates the problem can also contribute to the solution. Tools to track and certify data provenance are not a fad, but a possible defense of scientific objectivity.
In the coming years, we will witness a race: on one side, AI's ability to generate ever more credible evidence; on the other, the scientific community's ability to secure the authenticity of its observations.
The future of knowledge may depend on the balance between these two forces.

Source and inspiration

Article published by the SETI Institute: “Guarding the Source: Why the Future of Science Depends on Proven Data”

Click here to go to the SETI article