Dr. Elisabeth Bik, a science integrity consultant, has been recognized for exposing plagiarism, image, and data manipulation in more than 4,000 scientific papers. She shares her findings with the public to improve public understanding of the importance of research integrity.
As the Co-founder and CEO of Alation, Satyen lives his passion of empowering a curious and rational world by fundamentally improving the way data consumers, creators, and stewards find, understand, and trust data. Industry insiders call him a visionary entrepreneur. Those who meet him call him warm and down-to-earth. His kids call him “Dad.”
Producer: (00:01) Hello and welcome to Data Radicals. In today's episode, Satyen sits down with Dr. Elisabeth Bik. Dr. Bik is an experienced microbiologist, whose groundbreaking work in scientific integrity has led to more than 4,000 potential cases of improper research conduct. In this episode, she gives a rundown of image manipulation in scientific papers, the impact of AI on scientific integrity, and why paper mills must be stopped. This podcast is brought to you by Alation. We bring relief to a world of garbage in, garbage out with enterprise data solutions that deliver intelligence in, intelligence out. Learn how we fuel success in self-service analytics, data governance and cloud data migration at Alation.com.
Satyen Sangani: (00:57) So today on Data Radicals, we have Dr. Elisabeth Bik. Dr. Elisabeth Bik is a Dutch microbiologist and scientific integrity consultant. Dr. Bik is known for her work in detecting photo manipulation in scientific publications and identifying over 4,000 potential cases of improper research. She is the founder of Microbiome Digest, a blog with daily updates on microbiome research, and the Science Integrity Digest Blog. Dr. Bik was awarded the 2021 John Maddox Prize for outstanding work in exposing widespread threats to research integrity in scientific papers. Dr. Bik, welcome to Data Radicals.
Elisabeth Bik: (01:32) I'm happy to be here.
Satyen Sangani: (01:34) So let's start with the basics of the work because I think a lot of people would hear these words (“image manipulation”) and probably wouldn't know what they mean. So can you describe for us what image manipulation is in the context of scientific papers?
Elisabeth Bik: (01:47) So I actually also detect duplication. That's actually a better word, but there's three types of image duplications or manipulations that I can find. So one would be where the exact same image has been used twice to represent two different experiments. In a way, it's manipulation of the data, right? The second one is where two images overlap with each other, and then the third one is the real image manipulation where within a photo of, let's say, tissue, you would see the same part of the tissue twice or the same cell twice or three times or the same protein bands twice in a western blot. So it will be the same as looking at a photo of a dinner party and seeing Uncle John twice in the same photo. You would not expect that, right? Unless he maybe has a twin brother. But in most cases, if you see the same part of the image twice within the same photo, that is done intentionally. It's hard to imagine that that was done by accident.
Elisabeth Bik: (02:44) So those are the types of duplications that I detect. And again, I do want to stress that in some cases, it's an error, but when we're really talking about image manipulation — so the same cell or blot band is visible twice within the same photo — that is almost always done intentionally.
Satyen Sangani: (03:01) And that seems like the most egregious case of the three, although frankly, even the first and the second could be obviously quite damaging. In this third case, is it literally the case that these scientists are possibly taking Photoshop and taking sections of the image and just duplicating them in order to illustrate a phenomenon?
Elisabeth Bik: (03:19) Yes, that's exactly what I think is happening. It's either Photoshop or some other photo editing app. Nowadays, these things can even be done on your cell phone, right? It's so easy to manipulate photos. And we're so used to that, to making ourselves look better, and why wouldn't you make your experiments look better? If your protein band is not quite showing up as you had hoped, if your results are not quite what you expected, I think with how easy it is to Photoshop nowadays, it's very tempting to make your results look better.
Satyen Sangani: (03:49) Interesting. So, I don't have Instagram, but if I did, I would have to use filters pretty widely. And so what you're saying now is scientists basically do the exact same thing.
Elisabeth Bik: (03:58) Sort of. [chuckle]
Satyen Sangani: (04:00) Instagram filters for a scientist.
Elisabeth Bik: (04:01) Exactly.
Satyen Sangani: (04:02) This is a new startup that we have yet to create, I guess, or maybe destroy, hopefully. So you started your career as a microbiologist and now have transitioned into this place where you are effectively on some level policing — although I'm sure you wouldn't wanna describe it that way — the world of scientific research. How did you choose to make this switch, and I guess why did you choose to make the switch?
Elisabeth Bik: (04:24) I was working on image duplication detection when I was still fully employed, so I worked at Stanford at the time. Then I moved to a startup and then to another startup. And there was this one moment where I was at a dinner party and I was describing my regular work, my paid job, and then half an hour later or so I was describing my image detection hobby. And I just realized, "Wait, I sound so much more enthusiastic [chuckle] when I talk about these images that I'm looking for." And I just decided I just should be doing that full-time because it would give me obviously much more time to look at them. I think I can mean so much more for science by detecting these and flagging them as potentially problematic. And also when you're independent, when you don't have a boss who tells you, "Calm down, Elisabeth. Don't rock the boat too much," I think it would give me a lot of freedom to say what I think is a problem.
Elisabeth Bik: (05:18) Obviously, I try to stay polite and objective, but if there's nobody, no boss who tells me not to do things, I think that would free up a lot of my time. So I decided to quit my job. Of course, I had to check if that was financially feasible. I decided to do it at least for a year. And I had a little bit of money saved, so I could do this and see if I could work as a consultant doing exactly this work for publishers or institutions. And it turns out to be sustainable. I can make a living doing this. I have a Patreon account where people can donate money, so that helps a lot as well. And yeah, I can now do this full-time without having to worry about money.
Satyen Sangani: (05:56) What an incredible thing because you started, I presume, because you had significant interest in the topic and wanted to discover things yourself. And on some level, in my mind, this is the truest form of being a data radical. You saw the evidence and you found something that you would prefer to do even more, and in some sense, that furthers the science in a fundamental way. So I've looked at the images that you've detected, and in many cases, I look at them and I can't possibly imagine how I would detect any differences. And you're able to do this with the naked eye. Is this a trained skill, or is this one that you feel like you naturally had genetically? How did you come to this capability?
Elisabeth Bik: (06:35) I do think it's something I have inside of me, some skill. I'm actually very bad in facial recognitions, which most people do fine. I'm very bad in that, and so I guess I got another gift in my brain. But I've also always detected things like duplicated tiles — floor tiles or bathroom tiles on the wall. Like that tile is the same as that tile or that plank on the floor is the same as that plank on the floor, but switched around.
And so I've always seen that, and I'm using this talent now, I guess, to look at scientific images. But obviously, I also have now a lot of experience. In the beginning, I was probably calling out too many of them where I thought they looked similar, but then I might now look at them and say, "Well, I'm not quite sure. The quality of the image, for example, might not be good enough to really know if two images were the same."
Elisabeth Bik: (07:26) And I've also learned... like sometimes now I go to papers I've scanned, let's say, 5 years ago, and I'm finding some other problems in them that I didn't see 5 years ago. So I've built up experience. There's a lot of images that are duplicated that are actually totally fine. They are the same, let's say, the same control experiment for different experiments, so it's fine to replicate or duplicate that particular photo. Sometimes you see these photos that are taken through a microscope with two different labels — let's say a green label and a red label — and then the third image is the merged image. So those images always look very similar because there are different proteins in the same cell, and these photos can look quite similar. But that is totally fine. So I've also had to build up an experience of knowing which duplications were appropriate and which ones weren't.
Satyen Sangani: (08:15) Yeah, which requires also understanding the literature because there's this pattern-matching, recognition capability, but on top of that, you also have to be able to understand what claims the scientist is trying to make.
What is your most rewarding story? So give us some sense for the actual work. What's the most rewarding experience you've had where you've seen two images or many images that have been duplicated and detected something that's been off or awry.
Elisabeth Bik: (08:38) I guess the paper that was essential in the beta amyloids hypothesis in Alzheimer's. I wasn't the first one to find problems in that paper. But it was a finding by Mathew Schrag, and I was asked as a second opinion if I agreed with them. And this was a paper published in 2006 in Nature, and it was an investigation by Science, so I guess two competing journals. And the Science journalist, Charles Piller, contacted me to look at these images, and I found more problems in papers by this particular person.
I went back in time, and I found some additional papers. And it was in a way satisfying, not because I like to bust people, but it was very clear which person it was. And the person he worked for and whose lab he worked, Karen Ashe, her papers didn't have any problems. So I think it's satisfying in that you have an idea of who it was and that they did it throughout their career. You could see that these papers had problems in them. That's just one of many cases. There's a bunch of others.
Satyen Sangani: (09:37) Does that happen often where, for example, the lead or the lead investigator within a lab is doing good, high-integrity work and there's somebody within their lab who's making claims but unbeknownst to the leader or is it more common that the leader is aware?
Elisabeth Bik: (09:51) It can be both. I wouldn't know exactly what the ratio is. I've seen cases where it's very clear that it's the first author, usually in biology, that would be the younger person, the graduate student or the postdoc, where you can follow the image problems over their papers over the years. But in some other cases, I've seen 50 or 60 or so papers from the same lab with the same last author, so the same senior author, but from different first authors. So the junior persons are different on all of these papers, but the senior author is always the same. I think that's a particular situation where I sort of imagine a bully as a professor who says, "I want these results to look like that. Why are your results not confirming my hypothesis? I can hire another postdoc. I can fire you and get another person." If you're on a Visa in the U.S., that might mean you have to leave the country within, I don't know, a month or two months.
Elisabeth Bik: (10:47) I think in a situation like that, you might feel you have to change the results to make them look better, to please the professor. In academia, we're so dependent on the hierarchy, what the professor thinks of us, because that's the person who would recommend us to new labs, who would write a letter of recommendation. And you also want, as a postdoc, you want to publish papers. And so you can sort of see the internal discussions that people might have and think, "Well, I guess I'll just Photoshop the results and make everybody happy."
Satyen Sangani: (11:19) Yeah, but the fact that these dynamics exist where there are certain cases where it's a singular bad actor, but in other cases, you've got an institutionalized culture of deception, it's amazing that that can exist in the academy where on some level, the goal is to find the truth, and here you have it in the commercial sector where there's a profit motive and often this kind of behavior exists. I imagine that you've been able to, or I'm assuming that you've been able to, find other people with this capability. Are there any common traits for the individuals who are also detecting images or other false claims? Like are there skills there that one can cultivate?
Elisabeth Bik: (11:57) Yes, there's many problems one could encounter in a scientific paper. So I look at images, and there's a bunch of other people who do the same. And most of them work anonymously, and not all of them are scientists. Some of them have some free time available, and they use their talents to find these things. And obviously they had to learn which duplications were totally fine and which ones weren't, but they're as good as a molecular biologist who could detect these things. But there could also be all kinds of other problems in a paper. So there's people looking at statistical problems or DNA sequences that don't make any sense and that appear to have been made up, or plagiarism, or there's just a lot of things that could go wrong with a paper, some of which are maybe not really science misconduct, but things like animal ethics like a large tumor. I've seen those.
Elisabeth Bik: (12:48) And anybody I feel could see that where there's just guidelines for how big a tumor in a mouse should be. And you can see the graph goes way over that, and I think that's a big problem. It's maybe not technically science misconduct, but definitely something that I feel a paper should be retracted for. So there are several people. We sort of have a community of people doing this: data detectives or image detectives. And I think what we have in common is a desire to make science better and to flag these papers so that other people can see that there's a potential problem with that paper.
Satyen Sangani: (13:23) Have you looked into leveraging — or have you seen the impact of — AI, because we've obviously all been captivated by ChatGPT, and certainly I have friends that have started computer vision companies even farther back where they were trying to match Gucci handbags with knock-offs, and so this technology has existed for some period of time. How has it impacted your work?
Elisabeth Bik: (13:48) It has impacted it in two ways, so one way is good and one way is bad, [chuckle] I guess you could summarize it. So in a good way is that AI can be used to detect these problems. Like you said, you can detect duplications or uniqueness of images by using AI or any type of software, and this goes beyond what I understand. This is clearly not my topic, but AI has been used to drive software to detect these images. And you can not only detect duplications within a paper but also across papers, which is something as humans that becomes very hard. You can only store so many images, and you cannot possibly as a human compare all images against all other images that have ever been published. But software could do that. So that's one thing. But there's also a bad side to AI, which is you can use AI to create unique images that have never before been published. And so the software that you might use to detect those things cannot detect them as duplicates. They look unique, and they might look pretty real.
Elisabeth Bik: (14:51) I'm not quite sure if the technology is already so far that it can create, let's say, a realistic looking mouse or realistic looking plants or images of western blots. But I think with western blots, we've seen examples where, let's say, hundreds of papers all have very similar blots that appear to be composed of different bands that are sort of put together in a photo, and I think... I'm not sure if AI is driving that or some other software, but definitely that's a computer-generated image.
And so it can be used for good things, but it can also be used to generate false data. I'm very worried about that. I'm not worried about textual things like ChatGPT. That's not really falsifying the data. It's just writing it in a slightly better way, maybe. But to create fake data using AI, like data that looks realistic enough with enough biological variation to make it look real, I'm very worried about that and we cannot distinguish that from real anymore.
Satyen Sangani: (15:46) It's funny. I've heard from professors who have seen ChatGPT being leveraged to basically write students’ essays — like they'll fake the first draft and then they'll revise it to make it seem just different enough. It does seem like a double-edged sword. How do you see this playing out? You've obviously watched the developments with AI in a field where you're both using it to create content and to detect content validity. What's your sense for the evolution of how this moves forward?
Elisabeth Bik: (16:14) Well, I think the technology of AI is going to increase very, very rapidly, and I'm very worried that, let's say, next year we won't be able to tell a fake image from a real image anymore. But maybe AI can also be used to detect AI-generated images or photos, and maybe we need to go even farther back and ask authors to prove that it's original, that is not computer-generated.
Let's say you take an image with a microscope. There are probably ways that the microscope leaves in the metafile or so. It creates, "This files was created then and then with an Olympus microscope in that resolution." And that data is sort of embedded in the photo. And you could probably also fake that, I'm sure, but maybe you have to go as far as to ask the authors to provide original images taken from the microscope or taken from whatever detector they used to make a photo. I'm not sure. I'm just very worried that frauds just can circumvent that in all kinds of ways.
Satyen Sangani: (17:12) Yeah, it's an interesting application of, or a potential interesting application of blockchain or crypto technology, to be able to sort of capture the original source material and then validate that this is valid and real through the chain of custody. So it's a super interesting idea. Just switching gears a little bit, I'd love to talk a little bit about this topic of scientific integrity more generally. And I think you have a very interesting reference case study. Can you tell us a little bit about paper mills and what that is and what happened in that story?
Elisabeth Bik: (17:41) Paper mills are companies that produce fake papers. And there's different models, but they sell these fake papers to authors who need to be an author on a paper for their career. So they're very active in countries that have national requirements that are governed by the government, I guess. For example, in China, you have to have published a scientific paper when you're done with medical school and you want to become a doctor in a clinical hospital. And these folks are not necessarily interested in research. They're interested in treating patients. And research is very different from treating patients, and they don't have time to do research. They don't work in a research hospital. There's no animal facility or no things to run a DNA gel or things like that. And so they have to have a paper published. And it's an impossible requirement for them.
Elisabeth Bik: (18:31) So apparently in China but also in Russia, Iran, a lot of countries where these requirements are very strong, there's advertisements on social media where you can just click and say, "Buy one of our papers. You need a paper in neurology? Here, $5000 and we put your name on the paper." These advertisements are rampant, and the papers that they produce are massive. There's probably thousands and thousands and thousands of these papers being published, but we've found lots of these papers to look very similar to each other. For example, the titles are very similar. They were all in one particular field, non-coding RNA fields, but maybe they're active in other parts of the literature as well. We haven't maybe been able to discover them. But in the field of non-coding RNA, we've discovered thousands of these papers that look fake. In some cases, you can tell they're fake because the images just look very non-realistic. But as we are finding these examples, the paper mills are getting better, too. And they're getting better in making the data look real.
Elisabeth Bik: (19:31) And so you can only detect it when they make errors. Let's say they have a paper on prostate cancer and previously they wrote a paper about gastric cancer and now they have basically the same paper, the same template, some slight changes to the figures, but they change it to prostate cancer but they leave in the women. So they say, "Oh, 50% of our patients were women," in prostate cancer. Well, that's unexpected. And so sometimes they make these errors and you can tell it's fake and you can see that all these papers have a very similar layout, but it's very hard to prove that they're fake. And there's many papers where I just know it's fake, but I have no proof, so I cannot really post it.
Satyen Sangani: (20:08) My summarization is, all of a sudden there's this set of individuals who decide they wanna make a scientific claim. They on some level build a hypothesis, manipulate data in order to be able to make this claim, which is completely fabricated. And then they have somebody come along who wants to pay them in order to be a co-author on this paper for the purpose of fulfilling some exam requirement. That's effectively how this economy works, is that right?
Elisabeth Bik: (20:34) Yeah, that's how the paper mills work. And the claims are not always that earth-shattering. I don't think that most of these papers are, probably, none of them have any actual influence on how patients are treated. They're very minor in terms of importance or relevance. So they might be, “This little RNA molecule appears to inhibit prostate cancer cells,” or something like that. And I don't think they're really earth-shattering. It's just for the authors to tick a box or they're not in high-impact journals, they're in lower-impact journals.
Satyen Sangani: (21:04) So there's no examples where these have been published in notable journals or not many?
Elisabeth Bik: (21:09) Not that I'm aware of. I think the ones that are based on a template where you can just see that there's usually one sentence that they all have in common or very similar style. Those, I don't think are published in Nature or Science, but they sometimes get published in mid-level journals and definitely in lower-level journals. I think what happens, they submit one paper, see if it gets accepted, if it gets through the filter. And if it does, then the paper mills will massively send manuscripts in. And you have to ask, did the editor not notice that suddenly there's all these papers with a very similar layout, a very similar title, all from small hospitals in China, or Russian authors publishing with other authors? Sometimes you can see that the authors probably did not know each other in real life because that one is working in the, I don't know, the psychology department in a Russian university, and the other one is working in Thailand. And you have to wonder, did they really meet each other, and they've never published together and suddenly they published this paper together. That seems very unlikely. But yes, that's what's happening.
Satyen Sangani: (22:11) And I guess the regulatory authorities that are responsible for accepting these papers, you would think that there is some pretty significant incentive to try to stop it. What is being done to stop it? And is it a recognized problem or do they just say, "Look, this is something that we want to ignore and sweep under the rug?"
Elisabeth Bik: (22:27) The journals are aware of this, the publishers, so a lot of journals and editors are aware of this problem, but unfortunately the national governments are not really taking any action. So it seems that the paper mills from Russia and China are not really being tackled. They can advertise, they can publish this. And so it's really hard to fight this problem if these papers are being produced.
Satyen Sangani: (22:52) My wife is a physician and she was actually the person that spotted your op-ed piece in the New York Times, which sort of is what brought me to ultimately reach out to you to have this conversation. One of the things that she finds frustrating with science is that often therapies or diagnostics are declared safe and there's often very little testing. In particular, she's a pediatrician and there's even less testing on children. And so something that is then declared safe, declared wonderful, declared useful, declared helpful, ultimately then gets turned around and it seems like it's actually quite harmful.
The most recent example that she showed me actually just I think a couple of days ago, was a journal that said that even a single CT scan might actually be harmful. I think that combined with kind of this integrity question can be pretty undermining to this general concept of science. Because on one hand, you have fraudulent actors. On the other hand, you have stuff that's constantly in shift. How do you think about that as a scientist? How do you feel about that and how does it impact your belief in the work more broadly?
Elisabeth Bik: (23:56) That's a super hard question because it touches on so many aspects. I think, in general, the science is so complex. Anything that affects our bodies or anything that's some biomedical process is endlessly more complex than we can ever know. I think most scientists have very good intentions and they might do some testing and really believe that it's safe. I don't think those shifting insights are usually not because of misconduct, they're just changing insights like, "Okay, it's actually not as safe as we thought." But in general, I'm a firm believer in science and I just think that we spend too much time trying to produce new things and not enough time replicating things.
I feel we need to slow down science. We're going too fast, we don't really have time to do high-quality research, in general. We're all as scientists being rewarded in a way, one way or another if we publish a new novel paper. But there's very little incentives to encourage people to replicate papers. Is this result really true? Is it true in the hands of another lab? In the hands of another person? Or in the hands... If we test another population, does the same findings still hold up? And for example, research can be done only in male mice and not in female mice. So the results might apply better to men than to women or results done in mice might not actually work in humans.
Elisabeth Bik: (25:21) There's many complexities that will make science not applicable, even though the results look really good, they are not as great if you finally test them. And there's a lot of drug interaction. Maybe if you eat a lot of grapefruit, your medication doesn't work as well. So all those things can be tested really well in the lab or in a controlled environment, but it just doesn't hold up in real life when you have real persons with different diets and different ethical backgrounds or different lifestyles. I think it's so complex to do good science and really extrapolate the results to the general population. And I guess most people would be very hesitant, for example, to test something in children. You would want to make sure that the drug is safe, but you also don't want to test thousands of children with a risk that something goes wrong. And so you can sort of see this dilemma: what is ethical and what is not.
Satyen Sangani: (26:12) Yeah. And of course, there's also not a lot of money in treating children, as well, so there's financial disincentives as well. How do you think about changing that? So this is a systemic problem where on some level we're all searching for the new new insight and the new-new thing. And there's in news: if it bleeds, it leads. And we all want the kind of catchy title on some social media post. It seems like science suffers from the same attention deficit disorder. How do you think about addressing that? I think that's the world that we all would like to see. Are there particular habits or rules or frameworks or ideas that you feel could be helpful to make the change behavioral?
Elisabeth Bik: (26:50) I would love to have more replication studies. So one of the things I've been proposing is that a graduate student in their first year of doing research and learning the tricks and trades of doing research is to just take a paper in their field and try to replicate it. And that is probably much harder than you might think, but it would give the advantage of knowing that a paper is real and it would count towards the resumes of both the graduate students replicating the work as well as the persons whose work has been replicated. And this is what I was talking about, slowing down science, making science better, making science more open, sharing more data. I'm not sure if they would reply to clinical research because I'm more like a basic research person. Clinical research is obviously endlessly more complicated in that, like I said, you have to deal with humans who are all different and different populations. But the testing — before we really test drugs on patients, the preclinical work — so they work in labs and in animal experiments, I feel that work needs to be done a little bit slower because there's so many clinical trials that fail that look very promising in the lab and the animal stage. But then when they're tested on humans, it just doesn't seem to work.
Satyen Sangani: (28:03) Have you encountered any institutions where graduates are required to do replication studies? Is this a trend that you're seeing happening or even have seen some promising cases?
Elisabeth Bik: (28:12) I don't think that is happening anywhere. It's just a proposal. When I give a talk, I usually say, "I think this would be an amazing thing to do, that graduate students just test the paper." And that as a graduate student, you can put on your resume, like these are the studies I've replicated. Or later when your professor, these are the studies other people have replicated. Right now we only have a section on our resume that says, “these are the papers I published,” and that's what you're held accountable for. And we need to have more of these replication studies and maybe also have smaller studies, like micro studies. Just one figure, one little experiment, publish that online and have other people comment on it or try to replicate it because science papers get too big, they're too complex. I cannot peer review them very well because there's 70 supplemental figures or so. Who can read all of that? It's just too big and to really evaluate in a good way, we need to take a step back.
Satyen Sangani: (29:08) Yeah, it also feels, in that scenario, a really great way to train scientists. But if you were able to find a commonly known study and if you were a graduate student and you somehow were able to disprove it or at least find that the claims were non-replicable, that could be quite a contribution in and of itself.
Elisabeth Bik: (29:28) Yeah, it would. We're focusing too much on publishing positive papers and not negative results, for example. So things might have been tried before and they didn't work, but that work usually doesn't get published. So I feel as a scientist myself, I've only published, I feel like, I don't know, one-fourth of what I've done because a lot of things don't work and you just toss them and then you try something else. And oh, now that works. But you don't publish the things that don't work. And if we did, I think they would save a lot of other people's work, but we don't. So there needs to be more focus on publishing negative results on replication studies and on smaller micro publications where you just publish one figure or one experiment.
Satyen Sangani: (30:10) We had a former guest, Christie Aschwanden, who's a journalist, and she talks and has written a lot about why science is hard and talks a lot about these problems with regard to just being able to find a positive correlation. And you can obviously select your data sample such that you can make a claim. But those things are in many cases very narrow and to your point, really hard to replicate.
So this is a cultural problem and it's a problem based on long-established norms, which have gotten people promoted and recognized and have changed careers and livelihoods. Science and the community of science is now built around this very, very strong structure. When you speak to senior scientists, obviously who have high integrity — many of whom do, most of whom do — how do they respond and how do they think about changing this problem or addressing this problem? Or do they just say, "Yeah, it is the way it's always been." What do people say? Because it seems like it's unassailable to try to fix this.
Elisabeth Bik: (31:03) Most senior people are not very receptive for those ideas, but a lot of younger people are. They're very enthusiastic, like, "That's exactly what we need to do." So I do hope it's changing and I think these changes cannot happen overnight. And like you said, academic life, academic publishing has all these sort of unwritten rules. This is how we have always done it and it's hard to change and maybe the older generation is not open for it. But I see some initiatives amongst some journals to work toward a more open process of publishing all your data, including all of it, and having an open peer review like eLife has started this. And I think those are great developments. And yeah, it's not gonna happen overnight, but we need to be patient and gradually hopefully these changes will come.
Satyen Sangani: (31:52) In your own blog, have you seen the followership increase? You obviously have, I think you have over 100,000 Twitter followers. Have you seen change over the time that you've been doing this work?
Elisabeth Bik: (32:03) A little bit, but mostly on the journal level. I've seen a lot of changes where a couple of years ago I've reported a set of 800 papers and only one-third of those got corrected or retracted after waiting 5 years. But these numbers are going up, so there's more papers now being retracted or corrected in a faster timeline. It's still not enough. But I do see some change where journals are starting to see, “Yes, this is a problem and we need to take action.” There's still too many journals that are not responding very fast. The editors seem to look the other way for all kinds of reasons. So I might report a problem that I think is a 5-second retraction, like this is pretty clear that this is so manipulated. And then the editors just state, they will say, "Okay, thank you. We'll look into this." I'm like, "What is there to look into?" And that's so obvious. There's a lot of legal reasons also why these things take a long time. I think a lot of editors are afraid that the authors will sue them, for example. Sometimes authors disappear, their email addresses don't work. But there are a lot of improvements still possible, for journals to respond faster. And I see some change in that.
Elisabeth Bik: (33:12) But with the institutions, it's still, unfortunately, the case that most institutions are not investigating these cases. They say they do, but then they come to the conclusion like, "Well, nothing has happened." And there's a recent case where the president of Stanford University, there's allegations of image manipulations in papers that he's an author on and he's the president of the university. He's the big boss. And the university wanted to investigate that by the board of trustees of which the president is a member. These are all his friends. They've invested in each other's work. None of them except for one who was a biologist. They have no idea how to investigate these cases. And it gives you the impression that the institution wants to keep these things behind closed doors so that they can come to the conclusion that there was no misconduct. Who are you gonna fight? I cannot fight against Stanford if they say there was no misconduct. But I look at the images, yeah, I can jump high and low, but they're not gonna retract those papers. And so we need more editors who say, "I don't care what the institution says, we need to retract this paper and let them figure out what really happened. But the paper itself is beyond trustworthy. We cannot trust this paper anymore."
Satyen Sangani: (34:20) Yeah, it's like this massive institutional bias and it's so hard to break through.
So you're obviously — to many, you're not super popular. How do you take that on? From a social perspective, I'm sure you've found people who have been supportive of the work and obviously really complimentary. And obviously, you've won awards as I mentioned in the outset. But how do you deal with this? It seems like a really harsh reaction. Were you surprised by that or how did that feel relative to the positive feedback that you've gotten?
Elisabeth Bik: (34:48) I guess people don't like to be criticized, but I've had much worse. I think journals not responding, that's one thing and it's frustrating. They should respond and they should take action. But I've also gotten a lot of hate on Twitter, for example. And of course, Twitter is Twitter, but I've been doxed, I've been threatened with lawsuit, I've been receiving death threats and it's just horrible. And that's just for me personally. And you have to be able to handle that. And most of the times it makes me sad, but I have learned to live with it and try not to look too much into it.
But it's just horrible what people say about me, and as if that makes the problem of the images go away. But I think a lot of people try to push back and they try to discredit me as opposed to, “Look at the images and show me the original photos,” for example. That's how you can take away my concerns, not by saying that I'm ugly or I should be behind bars. That's not gonna take away from my claims. I just have to persist in that. So that is where I get the real pushbacks.
Satyen Sangani: (35:48) Did you expect it to be so hard when you first started out?
Elisabeth Bik: (35:51) I expected it to be hard by the authors themselves, but had not realized how some of these authors — and one professor, in particular, who is a French microbiologist, who has a million followers on Twitter who are very active in defending him. And I'm not sure how many of these are real persons or not, but as soon as he posts something on YouTube, videos about how horrible I am and how I'm not to be trusted because I worked at a certain company that turned out to be fraudulent, yes, but that wasn't me. That was the founders. They seem to have so much power. They have so many followers on social media. Their YouTube videos have a million views within one day. It's like the next Taylor Swift video or something. It's just hard to imagine how many fans these people have. So they don't harass me myself, but they have this follower and they will say, "Check out Elisabeth Bik," and that's enough for their followers to harass me. They don't do it themselves. They have never actually asked them to harass me, but that's how their followers will interpret it. So there's this whole social media thing going on that I had not anticipated at all.
Satyen Sangani: (37:00) Yeah, it's incredibly strong, the amount of bravery it takes to arguably approach these problems and speak truth to power. Those are pretty common age-old themes. I experience them myself. I'm a CEO of a company and I know that many of the people that work inside the company, some cases outside of the company, are often not willing to share negative news because they just don't wanna upset the boss.
And so it does take a lot of bravery and it's really laudable and commendable. And it's funny how constant this refrain is. The Economist just came out with an article that said there's a worrying amount of fraud in medical research and a worrying unwillingness to do anything about it. That's literally the title of the article. You mentioned a couple of techniques, one of which is sort of replication of research, and another one of which is sort of getting the source metadata around the captured images. Obviously, younger people are more biased toward changing the accepted order.
What else can we do to change? How do you think about sort of evolving faster? What are the strategies that you are trying to use or what's working for you or what have you seen work?
Elisabeth Bik: (38:03) I would hope that journals are more encouraged to either correct or retract papers — that there's a body, an organization where I can complain if I feel they didn't handle the case as well. There's the Committee on Publication Ethics, which most scientific journals are a member of, but they're not really taking any sides. And if they do, they take the sides of the journal because this is an organization run by journal editors. And so if I say, "Well, I don't think this image that looks very manipulated should not have been addressed with a correction." But then the journal says, "Well, we handled it officially. We wrote to the authors, they sent in a new figure and we published this new clean figure." I'm like, "Well, that's not how you handle it."
If you win the big race and your urine tests positive for doping that evening after winning the race, you would not allow the athlete to send in a clean sample two weeks later, right? That's not how you should address this. So journals should take more action. Maybe we can hold them accountable by having a correction or retraction index per journal. How many papers did I flag and how many of those were retracted? And also have legal protection for whistleblowers because one day I might be sued and who's going to pay for that? Who's going to pay my legal bills?
Elisabeth Bik: (39:18) I'm just an individual. I could potentially lose everything I own just by criticizing scientific work for something that I feel is very obvious and can be addressed by showing the original photos. Those would be the couple of things I'm thinking of right now.
Satyen Sangani: (39:34) Yeah. And it sounds like a feature for your blog maybe to score.
Elisabeth Bik: (39:39) I wrote a New York Times editorial about what we could do about these things. So I mentioned all of those, yeah.
Satyen Sangani: (39:45) Which is great and I'd encourage anybody who's curious to go read it. If you think about this, who are the customers of these journals? Who reads them?
Elisabeth Bik: (39:54) [laughter] Scientists. All the scientists are the customers and the whole scientific publishing world is crazy, if you think about it. Basically, the authors — so other scientists who write the papers — they send them in and then they can choose 1 of 2 methods. One is to pay to make their paper open access, so they pay an amount of money to get the paper published.
Well, if you're a journalist, you would expect to be paid to publish your thing. Let's say you're a freelancer, you want to publish in the New York Times, usually you don't expect to pay to publish, you expect them to pay you because you wrote something. But scientists have to pay to get these papers published or the alternative is to read the papers. So the pay model technique is to have the journals ask money to download a paper. So that's already crazy.
We write the papers for free as scientists — we peer review them for free — and I have no idea why I have to pay Nature $10,000 to get my papers published under the open access model. Where does the $10,000 go to? Is it managers upon managers upon managers in a shiny building? It doesn't cost nowadays that much to put a paper on a server. That's just peanuts. We can get huge amounts of service for $100 a year at Google or so.
Elisabeth Bik: (41:09) It should not take $10,000 to publish my paper — which I wrote, I generated — to publish that on Nature. There's no papers that are being printed anymore. Everything goes online. There's some journals still doing the printed model, but most papers are just posted online and read online. I cannot find any way to justify that publishing a paper should cost $10,000. It feels that scientific publishers make so much money. This is all research money, basically. We researchers get paid through grants by the government of our countries. So where does that money go to?
Basically, scientific publishers are super rich companies that don't seem to really trust customer care or quality control because we find all these problems in published papers. They all went through peer review. So there's very poorly executed quality control and there's almost no customer service. Once we complain about a paper, yeah, well it's been published. So we made our money, we're not gonna put any effort in trying to retract the papers. So it's just a crazy business model.
Satyen Sangani: (42:12) Well, and then the scientists are the ones who submit to the journals who then publish the scientist papers, which then get the scientist hired or promoted, which in turn gets those scientists invested in the journal. And so on some level it's just perpetuation of the same model. So we obviously sell to these companies who are trying to promote science internally and I've often found that people bring the data that supports their point and I think that's ultimately what seems to be going on here. Do you find sort of an incentive structure, like people have the incentive to prove something positive that obviously is consistent with whatever claims or whatever's gonna get them promoted or to get their agenda met? Is there a lot that you've seen in order to be able to advise people? Have people come in and said, "What kind of structures can I create?" Because I can imagine that now with your role, you have a detection role, and people are sort of saying, "Well, how do I stop this?” Has that started to happen in your world or not very much?
Elisabeth Bik: (43:10) Yeah, it's a very big question because like you said, there's this incentive in academic publishing to publish. That's in the end what all of us scientists are being held accountable for. How many papers did you publish? And our resumes are completely geared toward that. It's not how wonderful your PCR looked or how good you were in the lab, it's about how many papers you published. And I'm not sure if that's a good measure for how good we are as a professional because we can do really good research and go really deep but then we don't have a lot of papers to publish.
So I think, unfortunately, we are currently focusing too much on the amount of papers and impact factors to measure our publication level. And people are gonna game that. People are going to split up their research into a lot of little pieces and maybe publish all of these pieces separately. People are going to start their own journals and publish in those. I've seen that. People are citing their own papers and making sure that their citation index goes up or they form citation rings where they promise to cite each other's papers. I've seen peer reviewers ask to be included — that their paper should be included in the paper even though they had nothing to do with the research.
Elisabeth Bik: (44:23) And I've seen journals saying, "Well, you only cited one paper from our journal. You should cite a couple more, otherwise we don't accept your paper." And so because we focus so much on these numbers, these things we can count, people are going to game the system. People are going to find creative ways of getting their citation index up or their publication level up. And I'm not sure how to address this because as professionals we want to be measured according to some way.
But it's very different than I think when you work in a company where you look at a person's willingness and how hard they work and not necessarily always for their output, although that could of course also be important. But I think we look at it in the wrong way. And I think a lot of people who were great scientists in the past, the Newtons and the Robert Hookes and things like that, I think those people didn't publish that many papers always, necessarily. They might have completely failed in the current system.
Satyen Sangani: (45:22) Yeah. We never would've never gotten papers out of Newton. That would've been tough. So this is interesting because actually there's this phrase in business where you manage what you measure. And often what I found internally is that you want to change metrics often enough over time, maybe from year to year so that people aren't incentivized to get games and sort of systems that are built around a particular metric. They're just not doing it all the same way. That's tremendously hard to do when you're trying to set up systems that are broad and that are common because if you change the metrics, you're totally changing the system. Do you have alternative metrics that you've been thinking about proposing? Or are there, in your work, have you thought about other ways to measure objectively across all of these people who are trying to do the work of science?
Elisabeth Bik: (46:07) Yeah, I feel I am not good enough in thinking of possible solutions. Like I said before, I think reproducing papers is one way of slowing down science and getting recognition that the work I've done, for example, could be replicated by another lab. I feel that's a tremendous recognition of the work I've done. And so I could maybe then put that on my resume, like, "Okay, these papers have been replicated by other people and they found the same results." And if we can put that on our resume, that would be fantastic. But there's no current system that could do that. So that's, I think, the best way I can think of, of tackling this problem.
Satyen Sangani: (46:45) People hearing this podcast might be, maybe, hearing this for the first time as a problem in science. You had read, I think it was the case that you had seen or reviewed 100,000 papers and found that there were problems with 4,500 and then there were another, I think it was 1,700, that could have had some additional fraud, which would indicate a sample rate of 6% fraud across these papers. Does that feel right to you or does that feel broadly understated or overstated? How much fraud is there in the system?
Elisabeth Bik: (47:13) That's a very difficult question to answer. So I actually did a smaller set of 20,000 papers in which I found 4% of papers with photos to contain duplications within the same paper, not across papers. But the real percentage of fraud is 2%. So half of these papers had signs of deliberate duplication done to mislead the reader. Not an honest error. But it's hard to know if something is an honest error. In some cases it could be both an honest error or it could be done intentionally. And in some cases it's very clear. But there's a lot of cases of fraud that we cannot see by just looking at the paper. Maybe a person used a very different antibody or they pipetted no sample in the lanes that are supposed to show the negative controls. There's many ways to fraud and unless you sit next to the person in the lab and follow what they're doing, you would not be able to detect that. And so I think the real percentage of fraud would be in the 5% to 10% percent range.
Satyen Sangani: (48:17) Yeah. So as a final question, imagine that you're talking to a young idealistic would-be scientist who's kind of getting started in the field. How do you prevent them from getting cynical and what advice do you give them as they start out in the field to be both successful and also feeling happy about the work?
Elisabeth Bik: (48:34) I do feel that a lot of young people share my enthusiasm for science and for good science. And I am worried that these people will turn cynical as soon as they sort of really enter the system and become a PI themselves and feel this pressure to publish. I think it's very hard to resist that, to really deal with how to combine being completely honest and a good scientist and have good productivity. I think those things are not always easy to combine. And once you are a professor, you sort of are supposed to stay that way for the rest of your career. And how do you balance grant writing? There's so much emphasis on how much time people have to spend on writing grants and then a lot of their grants are being rejected even though they're good ideas. Maybe we have too many scientists and not enough money to fund all these grants. But I do think there are some really amazing scientists who just — give them a bunch of money, don't let them write detailed plans of what they're going to do. That seems such a waste of money because sometimes good scientists, just being done by having a vague idea and just testing things and not really planning, “We're gonna do this, this, and that. And then if the results are that we're going to continue that.”
Elisabeth Bik: (49:45) You cannot always predict those things. And I really hope we can incentivize people to just do good science and not have to worry about grant writing.
Satyen Sangani: (49:55) And there's a certain “entrepreneurial-ness” to it that being procedural doesn't actually blend itself to. So I guess maybe an awareness, maybe something to be thoughtful about. This has been a really fascinating interview and I think it really makes people think and more fundamentally challenge how they develop truth within their worlds and their own organizations. So just talking to you is super helpful. I can't think of a person to whom the name “data radical” is more applicable. And we've had some pretty amazing guests on this show, so thank you so much for your work and thank you so much for the time that you've shared with us.
Elisabeth Bik: (50:25) Thank you for having me.
Satyen Sangani: (50:32) Science is hard. In a world where technology is constantly evolving and AI is everywhere, it's even harder. These new advancements make it all too easy for scientific papers — and, frankly, any content — to be deceitful. So how can we make sure the work of science stays honest?
First, we can capture source metadata so we know exactly when and where the data was captured, as well as how it might be reproduced. And speaking of reproducibility, we need to work to incentivize it. By training people to do the work and replicate papers, we can reduce the problem of fraudulent research. In this world of science, integrity is critical. So why is it that people like Elisabeth who blow the whistle on fraud are so often ostracized? In truth, they're the heroes and should be celebrated. Yet, often, the people that disagree with the accepted wisdom inside of institutions are ignored or ridiculed.
To me, Elisabeth is an inspiration. She's a true representative of what it means to be a data radical. Thank you for listening to this episode and thank you, Elisabeth for joining. I'm your host, Satyen Sangani, CEO of Alation. And data radicals, stay the course, keep learning and sharing. Until next time.
Producer: (51:44) This podcast is brought to you by Alation. Your entire business community uses data, not just your data experts. Learn how to build a village of stakeholders throughout your organization to launch a data governance program and find out how a data catalog can accelerate adoption. Watch the on-demand webinar titled Data Governance Takes a Village.
Season 2 Episode 20
Want to increase your odds of successfully ramping up a data team at your organization? With advice from Maddy Want, VP of data at Fanatics Betting & Gaming and co-author of Precisely, it’s a sure bet. Maddy explains how turning data into a valuable asset requires anticipating challenges in scaling as well as preserving team and company culture as the pace of growth accelerates.
Season 2 Episode 9
Ashish Thusoo has been on the leading edge of a data culture, whether it’s as a founder of a data lake startup, developing the Hive data warehouse at Facebook, or in his role as GM of AI/AML at Amazon Web Services. This discussion traces the evolution of data innovation, from big data to data science to generative AI.
Season 2 Episode 6
To fully understand quantitative "big" data, you need qualitative "thick" data that reveals human emotions, stories, and world views. Tricia Wang, thick data expert, helps organizations decipher the qualitative, human meanings lurking in quantitative data to fuel meaningful innovation.