The Perils of Open Science

Originally published on June 9, 2021 on Medium

///

We read several newspapers every morning. Like many other people, we get notifications in our emails and then click on articles that catch our attention. That wasn’t any different recently when we clicked on what sounded like some good news. Several hours later, however, we found out that the news wasn’t nearly as good as it initially seemed. Here is what happened: The good news started out with a headline in the New York Times: “Covid-19 Live Updates: Study Finds AstraZeneca Shots Drastically Cut Transmission.” The article claimed that “[t]he University of Oxford research is the first to document a vaccine substantially reducing the virus’s spread. British officials said the results underscored how mass inoculations were key to ending the pandemic.” Several hours later, the New York Times corrected the headline to “[t]he AstraZeneca vaccine may slow transmission of the virus.” So much for ending the pandemic.

We went within several hours from a vaccine drastically cutting transmission to having the potential to slow the transmission. This was yet another example where the media jumped too quickly and reported a finding from a preprint that, in this case, was not even claimed in the preprint. Sure, the preprint had some positive news, namely that “overall cases of any PCR+ were reduced by 67% (95%CI 49%, 78%) after a single SD vaccine suggesting the potential for a substantial reduction in transmission.” But the authors of the preprint left it at that and did not make any stronger claims. The strong and assertive claim that the vaccine “drastically cut[s] transmission” crept into the narrative somewhere along the line of a NYT journalist getting their hands on a preprint, translating the science into plain English for their readers, and (likely) another NYT staff member putting a catchy headline on top of the article to make sure the article would attract the readers’ attention.

In the current world of social media, what constitutes a fact is up for debate. Social media is very effective in promoting information of any kind, with little to no quality control over the accuracy or understanding of the practical impact of that information. In this specific case, we could blame the hyping of findings on the current model of digital advertising with its metric of counting clicks. Being first out of the gate counts more than ever, and there is a receptive pandemic audience that is glued to the news and eager for any glimmer of hope that would end the pandemic. The media is quick to grab and disseminate any findings that might give some hope, and, because of the need to be first, journalists sometimes spend little time on carefully putting together news articles. In addition, those who add headlines have even less time and need to come up with something that grabs the readers’ attention.

But, as the saying goes, it takes two to tango. The media is one party involved in this. The research community is the other party. Speed of publishing research findings is also of the essence in the fast-moving research world. Researchers put out preprints for anyone to grab with an internet connection as soon as they finish typing the paper. This gets findings in the hands of the media before the research community has a chance to vet the results, which leaves the door open for publishing results that may not hold up under further scrutiny.

Not holding up under further scrutiny may happen to the preprint of the AstraZeneca study as well. A preprint may not make it through the editorial review for numerous reasons, and if it passes this hurdle, it still has to go through peer review. It remains to be seen which findings in the AstraZeneca study preprint hold up over time, especially since the preprint included some puzzling findings for the 2-dose regimen: “Overall reduction in any PCR+ was 54.1% (44.7%, 61.9%), indicating the potential for a reduction of transmission with a regimen of two SDs.” Having a higher reduction after one dose than after two doses raises questions. The authors did not attempt to reconcile the 1-dose and 2-dose results, and a resolution of this observation may only happen in further studies.

As a research community, it is imperative that we make sure that the public receives accurate, yet timely information, but how to communicate preliminary results so that they are not elevated to facts before there is sufficient consensus among experts is still not clear. But first, how did we end up with such a rapid news cycle in the first place?

The open science movement

Today, it is very easy to rapidly disseminate research findings to the public, well before they are vetted during the editorial/peer review process. Authors tweet their findings on social media with links to their preprints that anyone can access on open archives and media can grab and spread. These archives are part of the Open Science movement with the goal to accelerate science and provide more transparency and access to scientific findings by making preprints openly available prior to peer review.

The Open Science movement goes well beyond the convenience of accessing preprints from one’s computer at home or in the office. As stated by the UNESCO on their Global Open Access Portal, the Open Science movement “foster[s] the development and implementation of scientific research communication strategies that are inclusive, effective, and conducive to scientific collaboration and discovery across scientific fields.”

This level of openness and transparency is a relatively new phenomenon. In fact, for centuries and until a few decades ago, academia acted like a medieval guild, even long after guilds were abandoned, with its tight control over the knowledge that was created by its members. Findings were only shared in preprint form among small groups of experts in the field and did not appear in scientific journals for others to see until after peers weighed in and gave their stamp of approval during the review process. Even then, publications were only accessible through journal subscriptions or academic libraries, which made ready access for the public nearly impossible.

This all started to change in the early 1990s when researchers could post articles in archives on the web and access them directly from their computers. The first such archive was arXiv, which still exists today, and serves as a repository of preprints in several primarily quantitative fields. Today, there are many such archives available across all fields of inquiry. Before archives were available, we used to mail and, after email became available, email preprints to colleagues working on the same or similar topics, especially in fields where it might have taken many months before an article was reviewed and appeared in print. These archives made sharing a lot easier, and so became quickly popular among academic researchers.

Archives for medical and health-related preprints took a lot longer to become available, in part because of the risks of sharing non-peer reviewed research with clinical implications with the public, risks that we have all now become familiar with. It is for this reason that medRxiv, one such preprint server for the health sciences that was established in 2019 by the Cold Spring Harbor Laboratory, carries a clear warning on its front page: “Caution: Preprints are preliminary reports of work that have not been certified by peer review. They should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.” A similar warning accompanies every preprint.

The pandemic led to uncontrolled spread of “facts”

The preprint archives are great for those who are in the field and want to build quickly on findings of others. They are not so great for those who want to write about findings to a lay audience but don’t have the technical knowledge to evaluate the findings for what they say and more importantly, what they don’t say. Until the pandemic, this wasn’t much of a problem. The public rarely took notice of findings in any of these preprints, and if a finding attracted attention, it was rarely time sensitive. This isn’t surprising: Most academic research falls under basic research, which adds to the body of fundamental knowledge of the discipline but rarely has immediate impact on daily lives.

The pandemic changed all this. Not only has the pandemic produced a torrent of publications that are made publicly available prior to peer review, the urgency to end the pandemic has shone a bright light on any findings that could impact the course of the pandemic, regardless of their robustness. Both the news and social media have been quick to grab anything newsworthy, and, because social media connects so many people now, unvetted findings can spread like wildfire before consensus is built within scientific circles about whether the findings hold up.

Moreover, disagreements along partisan lines about the severity of the pandemic and how to handle it has led politicians and regular people to cherry-picking findings that would lend support, however meager, to each of the camps’ positions, and turning findings with little or no support into “facts.” Hydroxychloroquine is just one example. Once these “facts” spread, it becomes very difficult to pull them back if they turn out not to have any scientific support. And we have seen worse. Much worse. In addition to unsupported findings from preprints, conspiracy theories have infiltrated our daily discourse, ranging from COVID-19 being a hoax to Bill Gates using the vaccine to implant microchips into everyone. Conspiracy theories have a life of their own and are even harder to pull back.

Living with misinformation and disinformation

The New York Times’ misleading report about AstraZeneca’s vaccine is obviously not in the same league as the conspiracy theories. But by the time the NYT corrected the article, it had already been shared many times. Few people look at corrections that may appear hours or days later, and even fewer share corrections. And so, the first article will much more likely stick in people’s minds than the correction.

False information has always been spread, just not at the scale we are seeing it now. It is therefore unlikely that we will get the spread of false information under control any time soon, if ever, especially since, as Woo reported, “most misinformation emerges from regular people,” and a lot of regular people are on social media now.

Even though, as Woo continues to report, “the vast majority—80 percent—of the misinformation came from just 0.1 percent of users,” we don’t think that banning people from social media sites and improved fact-checking will be sufficient to curtail the spread of false information, regardless of whether people spread it intentionally (disinformation) or unintentionally (misinformation). Even 0.1 percent of users amounts to a very large number of users who spread false information, and many of those who are banned from one platform will simply move to another platform. Whether nefarious intent or simple ignorance, people will spread false information, and, especially if it appeals to our insatiable appetite for sensationalistic news, people will click on it and share it further.

Misinformation or disinformation will stay with us, and all of us will need to learn how to live with it and develop means to assess the trustworthiness of any information that comes to us through the many channels of communication we interact with every day. Trying to determine if information on news and social media is accurate is challenging and can be time consuming. Going back to the original source document may be required to determine whether results are accurately reported. As an example, a recent media report about mask use incorrectly claimed that a CDC report concluded that the requirement of “[…] wearing face masks [does] not make any statistical difference.”

Gaining control over information

The scientific community has limited ways to control what the people or the media pick up and how it is disseminated. We do not want to go back to the time before preprint archives were standard and tightly control what the public can hear. The openness and transparency of research accelerates the dissemination of new findings, makes access to research findings more equitable, and allows the scientific community and politicians to work together on managing crises when time is of the essence.

However, the scientific community can and must play an important role in helping the media and the public assess the trustworthiness of findings and understand their potential impact. We need to learn how. Communication, quality control, and education are key factors to regaining control over this flood of false information.

A good place to start would be if every preprint were accompanied by a statement like the one in medRxiv we mentioned earlier to indicate that “[p]reprints are preliminary reports of work that have not been certified by peer review.“ If the findings have the potential to affect clinical practice, the statement should also include that the findings “should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.” This immediately tells the reader that the information is preliminary and should not be interpreted as absolute truth. But much more is needed.

Preprint peer review

When there is a health emergency, like the current pandemic, nobody wants to wait for the long editorial and peer review process until a research article is published in an academic journal. A fast pre-review is needed that screens preprints. Some archives already do this before posting them. For instance, medRxiv’s “screening resulted in a 31% denial rate,” according to a recent article on JAMA Network by Krumholz and co-authors. (The authors of this study were not able to determine the reasons for the denials since that data was not collected by the archive.) Screening, however, is not peer review. MedRxiv screening, for instance, only checks for “offensive and/or non-scientific content, […] material that might pose a health risk” and plagiarism.

Currently, there is no reward for participating in a standard peer review process, let alone in the quick review preprints would need once they are posted on an archive. This shows in the lack of comments on preprints that are published in archives that allow comments. For instance, the article by Krumholz and his co-authors reported that only 9% of preprints received comments on the medRxiv site.

Unless we start rewarding researchers for making meaningful comments, it is unlikely that preprints can receive the quick scrutiny they would need to allow non-experts to judge the credibility of the findings. There is no consent on how to reward reviewers. We suggested some time ago in Peer Review and Public Trust ways to improve the peer review system, including having reviews “count toward promotion and tenure for the reviewer and [making it] part of the annual merit review.” Some have suggested to pay reviewers. This is not without controversy, as discussed in The $450 question: Should journals pay peer reviewers?

The sheer volume and the urgency for a speedy review of preprints uploaded to a preprint archive necessitates a different approach. We suggest a two-stage approach, namely a professional pre-review prior to the standard peer review, which would still be necessary to certify the quality of the paper. For the pre-review, we propose the establishment of paid professional editorial boards where groups of diverse researchers with knowledge in the field of the preprint but with no conflicts of interest act as reviewers. The names of people on the editorial boards would be public and reviews would be signed by the entire editorial board. Paying preprint pre-reviewers would acknowledge the significant effort it would take to review the large number of preprints. Having the names of the members of the editorial board public would provide transparency of who might be a reviewer, and having the entire editorial board sign reviews would make it more difficult to pin a review on a specific person. This type of preprint pre-review would need to be done quickly, hence our suggestion of creating a professional group that is paid to do the pre-reviews. New certificate programs to qualify such preprint pre-reviewers might also help create a new job class and employ more scientists in useful work outside of the laboratory.

Significance statements

Some journals, like the Proceedings of the National Academy of Sciences (PNAS), include significance statements in publications that summarize the findings in an accessible language and explain what the research says and what it doesn’t say. These are short statement. PNAS limits the statement to 120 words and asks authors to “[e]xplain the significance of the research at a level understandable to an undergraduate-educated scientist outside their field of specialty.” This is a start but many of the significance statements still read more like technical abstracts of a paper than a statement that communicates the significance to a lay person. We recommend that all preprint archives and journals ask authors to write statements that explain the findings and the significance at a level that is accessible to a lay audience. Instead of aiming at an “undergraduate-educated scientist outside their field of specialty,” these statements should be aimed at someone who went to college, but not necessarily in a science field.

Significant statements help researchers make their research more accessible and may allow them to reach a larger audience, which could positively affect the impact their findings make. Impact is important to researchers as it is a metric that is used in promotion and tenure, for national awards, and the annual merit review. While impact is traditionally measured by the number of citations, and the impact factor of the journals they publish in, alternative metrics, such as Altmetric, that track “[w]ho’s talking about your research?” and who is an “influencer” have been added more recently. These metrics may need to be reevaluated, however, in light of the media’s business strategy to get as many clicks as possible on a particular item. This creates incentives for individuals to “market” their ideas in ways that attract attention, whether the information is accurate or not.

We also suggest that the editorial board of pre-reviewers of the preprint described above would include science communicators/writers to make sure that these significance statements are understood by the general public and are not too “hyped” in nature. Including these statements already in preprints and having them reviewed prior to posting the preprints on archives would increase the likelihood that the lay press can report findings more accurately.

Trustworthy academic journals and the role of today’s university libraries

While preprints became more readily available starting in the 1990s, it wasn’t until 2009 when the NIH pushed for making research results available to the public with the Public Access Policy that requires that “final peer-reviewed manuscripts upon acceptance for publication, [..] be made publicly available no later than 12 months after the official date of publication.” Academic journals responded with allowing open access by charging a fee to authors. Unfortunately, this has also spawned a new industry of so-called predatory journals that promise open access and fast publication, often with only scant peer review, but a hefty fee to get the paper published.

It would be very difficult for a lay person to tell whether a scientific journal is reputable. This is even at times difficult for an academic researcher. The journal Nature recently reported on a study that showed that about 3% of studies that were indexed on Scopus (a widely used academic database) over a period of three years were published in so-called predatory journals. Having predatory journals indexed in an academic database can “mislead scientists and pollute the scientific literature.”

Scholarly organizations, like the Open Access Scholarly Publishers Association (OASPA) or databases, like the Directory of Open Access Journals (DOAJ), are trying to address this. For instance, OASPA members must abide by a code of conduct and have a rigorous peer review process. DOAJ awards the DOAJ Seal “to journals that demonstrate best practice in open access publishing.” Journals that are members of OASPA, DOAJ or other such organizations, e.g., the Committee on Publication Ethics (COPE), have attested to the fact that they abide by these organizations principals and are therefore not likely to be predatory in nature. One can also ask if your colleagues have heard of the journal, know people on the editorial board and that they are scientists of high repute, have clear peer review processes, and have an impact factor that is in line with good journals in their discipline.

Libraries used to be the pride and center of an academic institution. Today, hardly any faculty spends time in a university library as much of what has been published is now available online. Few faculty talk to their librarians about disseminating research. We urge them to do so. Librarians are experts in managing knowledge acquisition, curation, and dissemination. They can prevent faculty from falling into the trap of predatory journals and help them disseminate their findings and the data that support the findings. They can also be a resource for tenure and promotion and merit committees when they review colleagues’ publications in journals they are not familiar with.

Education

Professional societies can play an important role in promoting quality control through workshops at their annual or regional conferences to make researchers aware of how findings can be appropriated and misused. They can also offer workshops where researchers can learn from science communicators how to bridge the gap between talking to a technical audience and a lay audience.

Universities need to do their part to educate the next generation of scholars and provide the information to their current researchers. For more than a decade, it has become standard to require ethics courses for students in graduate programs. These courses cover topics such as human subjects and animal research, conflict of interest, authorship and publishing, data management, and research misconduct. We need to add discussions on the responsibilities of scientists when they communicate their findings and consequences when findings are inappropriately communicated to lay audiences. Universities may also add such discussions to the annual training of current researchers.