Thirteen genetic sequences — isolated from people with COVID-19 infections in the early days of the pandemic in China — were mysteriously deleted from an online database last year but have now been recovered.
Jesse Bloom, a computational biologist and specialist in viral evolution at the Fred Hutchinson Cancer Research Center in Seattle, found that the sequences had been removed from an online database at the request of scientists in Wuhan, China. But with some internet sleuthing, he was able to recover copies of the data stored on Google Cloud.
The sequences don’t fundamentally change scientists’ understanding of the origins of COVID-19 — including the fraught question of whether the coronavirus spread naturally from animals to people or escaped in a laboratory accident. But their deletion adds to concerns that secrecy from the Chinese government has obstructed international efforts to understand how COVID-19 emerged.
Bloom’s results were published in a preprint paper, not yet peer-reviewed by other scientists, released on Tuesday. “I think it’s certainly consistent with an attempt to hide the sequences,” he told BuzzFeed News.
Bloom learned about the deleted data after reading a paper from a team led by Carlos Farkas at the University of Manitoba in Canada about some of the earliest genetic sequences of SARS-CoV-2. Farkas’s paper described sequences sampled from hospital outpatients in a project by researchers in Wuhan who were developing diagnostic tests for the virus. But when Bloom tried to download the sequences from the Sequence Read Archive, an online database run by the US National Institutes of Health, he was given error messages showing they had been removed.
Bloom realized that the copies of SRA data are also maintained on servers run by Google, and was able to puzzle out the URLs where the missing sequences could be found in the cloud. In this way, he recovered 13 genetic sequences that may help answer questions about how the coronavirus evolved and where it came from.
Bloom found that the deleted sequences, like others collected at later dates outside the city, were more similar to bat coronaviruses — presumed to be the ultimate ancestors of the virus that causes COVID-19 — than sequences linked to the Huanan Seafood Market in Wuhan. This adds to earlier suggestions that the seafood market may have been an early victim of COVID-19, rather than the place where the coronavirus first jumped over from animals into people.
“This is a very interesting study performed by Dr. Bloom, and in my opinion the analysis is totally correct,” Farkas told BuzzFeed News by email. Scott Gottlieb, formerly head of the Food and Drug Administration, also praised the findings on Twitter.