In May 2024, Google DeepMind released AlphaFold 3, a tool that could predict protein structures. It used an artificial intelligence (AI) model to predict how different proteins were shaped, how they might interact with each other and with DNA, RNA, and other biomolecules of merit. Nobel laureates John Jumper and Demis Hassabis built the new model based on DeepMind’s previous versions of the tool, namely AlphaFold and AlphaFold 2. Both those models were released open source, i.e. with their associated programming scripts and inner workings open and transparent to all.
AlphaFold3 was different: its senior authors didn’t release the full code when they published their findings in Nature. How exactly the model worked was unclear to scientists who wished to probe deeper. They also couldn’t make full use of AlphaFold 3’s new abilities because its protein-drug interactions simulator wasn’t fully accessible.
Google had a reason to withhold information in the paper. A DeepMind spinoff company called Isomorphic Labs was using AlphaFold 3 to develop its own drugs.
“We have to strike a balance between making sure this is accessible and has an impact in the scientific community as well as not compromising Isomorphic’s ability to pursue commercial drug discovery,” Pushmeet Kohli, DeepMind’s head of AI science and a study co-author, told Naturein a news article earlier this year. But many scientists weren’t convinced, leading them to sign an open letter saying publishing the paper without the code prevents scientific efforts to reproduce and verify the original findings.
The controversy brought a broader conundrum surrounding scientific research today, especially research with commercial potential. Commercialisation is driven by competition and profit, so the creators and/or owners invoke property and patent laws to protect their intellectual property (IP).
The fundamental tension here is that IP necessitates secrecy whereas, historically, science isn’t encouraged to stay behind closed doors. Science progresses when scientists are open and transparent about their work, and when their methods and results are reproducible and falsifiable.
“If you make this fantastic discovery and you’re the only person in the universe who can do it, nobody cares. It’s not helpful for mankind,” Benjamin Haibe-Kains, a professor using AI to study cancer at the University of Toronto, said. He openly advocates for scientists to be more open with their software and data when they publish papers based on AI. “How can you advance science if you keep everything closed source? Nobody can see your data. Nobody can see the algorithm. Nobody can see the model, right?
“As a scientist, there is fundamentally a major conflict between doing things in secret versus advancing science. Those things are incompatible,” he added.
Then again, hospitals, research institutes, and universities also need money to operate and hence bank on commercialisation for revenue. “Universities and research institutions are putting us [academics] in a very, very tricky spot,” he said. “They actually want us to patent so that we can generate revenue and sustain this research enterprise.”
How can scientists toe the line between guarding their trade secrets in the current economy and advocating for transparency and reproducibility?
One option Haibe-Kains suggested, especially for computational scientists, is to publish all the code and details of any algorithm they are working on — but hold on to a premium, ready-to-use version of a software that could be commercialised. With the help of software engineers in his lab, he works on bringing the software to a level that’s accessible to a broader group of people, which he then sells.
“Most of the discoveries have been disclosed already; it’s just the packaging that I’m selling, right?” Haibe-Kains explained. “That’s the way we do it in the lab — we do everything open source at the beginning and if there is commercial potential, we work on an enterprise version that’s more robust and deployable. That added value we keep secret and that’s what we would sell as a product.”
“I can do my mission as a scientist, but I can also commercialise and potentially generate revenue that way,” he added.
Thomas Hemmerling, MD, a professor in the Department of Anesthesiology at the same university, expressed a belief that divulging some of the basic algorithms but holding back some specific source code is a way to strike a balance between the “black box” that comes with full patent protection and scientific transparency.
He also agreed there is always a risk in such cases, where someone else could commercialise the published work. But other scientists will at least be able to understand and potentially replicate the findings.
Hemmerling and his team developed an anaesthesia robot in 2008 that they named “McSleepy” (after Patrick Dempsey’s character Derek “McDreamy” Shepherd, in the popular medical TV drama ‘Grey’s Anatomy’). The robot could autonomously administer drugs to induce general anaesthesia and monitor the effects. The scientists decided to explain the algorithms at work in the robot in detail in their paper.
“Because we described it quite well, certain parts were then put into other automated machines, but they referenced our method. So that that’s then basically a matter of scientific integrity,” Hemmerling said. “If you use somebody else’s algorithm, you should at least quote them and say, ‘that’s based on that machine or on that technology or that finding’.”
But not all scientists have access to large amounts of public funding, which can affect their inclination to be fully open about any research that can be patented. Based on the researchers’ financial needs, Hemmerling said the closer they are to a commercial product, the fewer details they’d feel comfortable divulging in their paper.
Collaborations with smaller start-ups or large corporations help some researchers get more money for their science. “These [large corporations] will fund your research, so you can move the research forward but on the other hand, they will obviously tighten your [research] much more into some kind of IP protection, probably more than you want to.”
That’s the dilemma in front of many researchers around the world.
Some scientists strike deals with the companies: they study and develop a product the way the company likes it. In exchange the company gives their lab unrestricted funds to continue other avenues of research (in which the company has no say).
“All over the world, there’s very little governmental funding to do research,” Hemmerling said. “So researchers need to find creative ways to find funding.”
More government funding is a way to circumvent the conflict between patented and open science, according to Hemmerling. “At the end of the day, it gives you a different head start. Whenever I have governmental funding, it has secured me funding for a certain time. I don’t have to declare a conflict of interest. Science is just… science — you innovate and you’re free to be creative, you’re free to develop anything you want. Whereas if you have company funding, it might limit you to develop certain areas because the company might have a conflicting interest.”
The government can also subsidise the costs of products made by companies such that the latter can still hold on to their IP even as the products are available for sale at a lower price. This is what happened with the COVID-19 vaccines made by Moderna and Pfizer.
But according to Haibe-Kains, even with more public funding, universities will still want to continue commercialising some research. “I think it’s human nature. If you think you’re doing amazing research and you see those industries generating billions of dollars in revenue, you cannot stop universities thinking ‘oh, maybe I should generate revenue on my own stuff,’ right?”
He believes additional funding will help academic researchers breathe a little easier and invest in doing science the right way: by being as open as possible. “It’s more a matter of creating the right paradigm, so that there is a healthy environment for researchers to do the right thing,” Haibe Kains said. “But also, there is a path to commercialisation so that we can generate revenue.”
For researchers working in a company, however, the primary objective is likelier to be to generate revenue, not necessarily to advance science, according to Haibe-Kains. Yet he also said it was unfair that sometimes big companies can blur the lines between industry and academia to their advantage, such as using academic tools like journals to advertise their science and also get away with withholding most of the data.
Thus, to him, the manner of AlphaFold 3’s release exposed a deep misalignment of incentives between researchers, journals, and the industry.
Responding to criticism from the academic community, senior authors of the AlphaFold 3 paper had said they would publish their code within six months, and did so early in November.
Haibe-Kains said publishing the paper first and fixing it six months later by releasing the full code is still a problematic move. “But look, at the end of the day, it’s a good thing they published the code out there.”
Rohini Subrahmanyam is a freelance journalist in Bengaluru.
Published - December 05, 2024 05:30 am IST