AI Won’t Transform Science Until the Data Does
The newly launched Genesis Mission can accelerate science — but only if it unleashes the data AI depends on.
This week marks the launch of the Genesis Mission, the U.S. Department of Energy’s ambitious new initiative to transform scientific discovery by uniting advanced AI, exascale computing, and the nation’s most powerful scientific instruments into a single integrated research engine. Led by Darío Gil, Genesis represents a generational investment in accelerating how science is done — not by incrementally improving existing workflows, but by fundamentally rethinking the relationship between computation, instrumentation, and scientific insight.
In an editorial published in Science Magazine to accompany the launch, Gil and Stanford’s Kathryn Moler articulate the mission’s deeper premise: AI can only become a true partner in scientific discovery if the results it produces — and the data it relies on — are fully transparent and verifiable. Scientific progress in the Genesis era will depend not only on more powerful models, but on the integrity of the data that flows through them.
That’s the crux of this moment. Genesis promises a new engine for discovery. But engines don’t run without fuel. And today, the fuel — our scientific data — is too often fragmented, inaccessible, inconsistently documented, or locked away entirely.
The success of Genesis hinges on something deceptively simple:
We must unleash the data that AI depends on.
Only then can AI accelerate science responsibly, reliably, and at the pace the mission envisions.
And this is where a deeper issue comes into focus: while scientific computing has leapt forward, scientific data — the substrate on which all these capabilities depend — has not kept pace.
Ask any researcher and the story is familiar. Most data never leaves the lab where it was generated. The data that does get shared often arrives in partial or inconsistent form, without the metadata needed to understand how it was produced. Formats vary wildly. Context is lost. Provenance is unclear. Reproducibility remains too often aspirational. These gaps might be survivable for human readers who are skilled at making interpretive leaps. They are far less survivable for AI systems, which require structure, clarity, and completeness to reason effectively.
This tension becomes even more apparent when we look at the emergence of AI “scientists.” These systems are learning faster than any human ever could, absorbing the literature at scale, synthesizing results across disciplines, and beginning to propose hypotheses and experimental pathways. Yet they are learning primarily from assertions in papers: conclusions, summaries, and narrative descriptions of results rather than the underlying datasets themselves.
They are reasoning from what scientists say about their data — not from the data.
For AI systems capable of designing proteins, predicting materials, or coordinating complex experiments, this is a profound limitation. It means we are building powerful scientific intelligence atop a partial and uneven foundation. And no matter how sophisticated the models become, they cannot transcend the constraints of the evidence they are trained on.
This is why the Genesis Mission represents more than an investment in AI. It is a recognition that modern scientific infrastructure must include a robust, transparent data layer — one capable of supporting human and machine intelligence alike. The call Gil and Moler make is not abstract. It is an articulation of what science must become if it is to take full advantage of the tools now emerging.
As Founder & CEO of Senscience, I see this shift as an inflection point. Scientific infrastructure used to be defined by journals, repositories, instruments, and compute clusters. In the age of AI-powered discovery, the definition must expand. We need a foundation that transforms raw research outputs into transparent, documented, interoperable, machine-actionable assets — not years after publication, but as part of the scientific workflow itself.
That belief is what led us to build the AI infrastructure behind Frontiers FAIR² Data Management. FAIR² is designed not as another repository, but as a true data foundation: a platform that prepares, certifies, and publishes research outputs in ways that make them verifiable and computable. It automates metadata and compliance through Clara, our AI Data Steward; it provides interactive portals where humans and AI can explore data directly; it assigns credit through peer-reviewed Data Articles; and it ensures long-term preservation so that scientific contributions are never lost to time.
And importantly, it exists today. Researchers are already using it to make their data transparent, interpretable, and AI-ready.
Genesis sets the ambition. Gil and Moler articulate the requirements. FAIR² demonstrates that the necessary infrastructure is not hypothetical — it is available, operational, and ready to support the next stage of scientific acceleration.
We are entering a decade in which AI will shape discovery in ways that were unimaginable even a few years ago. But if we want that acceleration to be trustworthy, reproducible, and worthy of public confidence, we must build the data foundation that allows AI to reason from evidence, not merely from summaries of evidence.
AI will transform science. But only if the data transforms first.
And that transformation has already begun.



