The following is a transcript of the presentation video, edited for clarity.
The replication of findings is a current and ongoing emphasis at NIH. As Elena pointed out, but we’re going to talk about in a slightly different way now: It’s the requirement of every investigator, to safeguard against misconduct and to build a laboratory in which scientific integrity and reproducibility of data are the central focus.
Scientific Conduct in the Lab
There are several factors that we can think about that lead to findings that are not replicable. And these factors are closely related to our day to day approach to scientific integrity.
Very often our RCR sessions are all about these egregious cases, Elena had one slide about them. And I’m going to also have one slide about them.
Misconduct – Falsification, Fabrication, or Plagiarism
These are a couple of cases from our very own field. Luckily these are extremely rare, but I start with them as kind of the quintessential example of the difficulty of replicability. When misconduct occurs, falsification of data, fabrication of data, plagiarism. We all know about these, we do not conduct these deliberately.
There are a couple of major cases in our discipline. One of them, many years ago, a grant on Orofacial Motor Control Impairment in Parkinson’s, where, and this is a quote from the NIH response to this, “Where this researcher was engaged in scientific misconduct, by falsifying and fabricating certain figures and research results supported by NIH grants.” So that’s a really extreme example of making up data.
A second example has had a lot of prominence recently, and that is Mark Hauser’s case. This is a quote from Hauser, “Although I have fundamental differences with some of the findings,” Hauser wrote. “I acknowledge that I made mistakes. … I let important details get away from my control, and as head of the lab, I take responsibility for all errors made within the lab, whether or not I was directly involved.” So this is, again, this idea of the role of the PI in being ultimately responsible.
In both of these cases, I think it’s worth pointing out that the whistleblowers—and this is as it goes—tend to be student or postdocs. So talk about a high level of vulnerability, these are the last people who you want to be stuck in this situation of blowing a whistle on their mentor. It’s a pretty nightmare situation.
So that is the extreme. We are opting to not talk about the extreme today. But I think it’s a good starting point.
What do we need to guard against?
We want to think about, in our labs, what we need to guard against. Kind of grading down from this extreme. How can we think about reproducibility and a responsible environment?
One thing that we have to think about in my opinion, and we’ll open it up to you in a few minutes to get your ideas here. But, extreme commitment to a particular theory or result can really have a negative impact of replicability, for example. This may lead to bias in how data are obtained and/or interpreted. All of us have a passionate adherence to a theory, I would guess. But again, it’s something that we need to really guard against as we collect our data, and analyze our data, and think about our data. And train our students to think about our data.
Bias, is rampant. We know this. This is why NIH is extremely engaged in this process right now. And something else that might feed into this problem is the well-known fact that it is very difficult to publish negative results. We have ginormous rewards for really exciting, big findings and very few rewards for negative results. In a way, things are very stacked against us to engage in a very subtle form of potential misconduct. I want to be careful with my terms, here. But, you see the point.
So that’s how your theoretical biases and frames can influence your work.
Experimental issues can also lead to problems of replicability, which is the crucial core of what we do. Poor experimental design, lack of blinding or lack of randomization. Lack of appropriate controls in our experiments—both in terms of participants and in terms of tasks.
Cherry picking of participants and results. So, when we exclude participants. When we select the figures to put into a paper. I think these are familiar issues for all of us. How are we thinking about that in terms of scientific conduct?
Incomplete descriptions of analyses and statistics and sample sizes and lack of inclusion of details. Obviously, our journals all force us to page limitations, our reviewers want us to write very readable, accessible papers. And yet there’s a price to pay, perhaps, in terms of other people replicating our work down the road.
What are granting agencies and journals doing about replication problems?
What are granting agencies and journals doing about this? And I’ve got some of this, I should say, speaking of potential plagiarism, from a presentation that was shared with us by Dan Sklare from the deputy director of NIH, who was talking about what NIH is trying to do about this.
One thing that I think might be coming down the road at NIH, based on at least what people are talking about, is increased systematicity in the review process regarding how some of these replication issues might be handled. The idea of attempting to reward replication studies and negative results. Pressure to do this and pressure to encourage journal editors to allow researchers, and require researchers, to include more detailed methods sections. So these are at least some things that people are talking about that could be coming along to try to address some of these issues.
What can we do to establish a responsible scientific environment in the lab?
With that kind of framework, replicability is kind of the day to day goal that we have. Obviously, I think replicability begins in each of our labs.
Creating an environment that is conducive to maintaining the highest standards of scientific integrity is of course, the job of the PI.
What I thought what we would do is just take just a minute or two to think about this for your lab: the lab you worked in, or your own lab. There’re a few windows or a few places we really need to think about when we consider: How do we build this environment?
Entry into the Lab
One of those is when a new person is entering our lab. What are some things you could do in your lab to set up an environment—right now we’re talking about the responsible formulation of theory and applying that theory to your lab and how you collect and analyze your data. That’s the kind of issues that we’re primarily thinking about.
- I do EEG, so there’re a lot of judgment calls, with deciding which trials you’re going to throw out and which trials you’re going to keep. Because if the person does that [speaker indicates motions], then you have messy data, but sometimes you have some small movements that could actually be an ERP component. We talked about how being clear about how many trials you’re keeping, having standards that you set before the experiment, or before the data’s analyzed. And then also, the artifact detection may be done in parallel by two different people in the lab and comparing.
- One of the things we do in my lab is that we have an internal log book, that is computerized that is all on the same servers. So I expect all the students to continuously put in their logging time. Another thing that sort of keeps people on task, but also insures that people are doing what they’re supposed to do with regards to the data entry, is that I have my student’s write a weekly progress report and talk about the data they collected.
Entry into the Lab
Some ideas from my lab. We have orientation for all new members. RCR training goes without saying. But, we really have very explicit instruction in data analysis and documentation. We do have manuals, you do have your manuals online. We have really systematic manuals. Anybody walking into the lab, who is just starting, they do read them. And we revisit them at our lab meetings regularly. We co-write them as a group on about 6 to 12 month intervals to update and make sure we’re not drifting. These meetings are also the way, when people walk into the lab, that they’re first learning about how we cue kids, how we analyze data. All of these factors. So, we actually really rely on the manual.
We also build little teams that are responsible for each data analysis. And an experienced research assistant, a doctoral student, a postdoc, a research associate is in charge of each team, and is in charge of training in that team. And they deal with micro-level liability issues with me and then bring them back to our weekly lab meeting as needed. And those are just some of the examples from my lab. Very shared accountability for every acquisition and analysis. So any brand new person has a reliability bar they have to cross.
Ongoing Management of Data Acquisition and Analysis
The next issue is ongoing management, which may be similar enough. I may talk about my examples and then open it up just once more in the interest of time. So ongoing data acquisition and analysis is talked about in regular lab meetings. We have many subprojects and we take turns so that the team involved in that subproject is presenting it and bringing in data analysis or cueing.
An issue we hit a lot, for example, is we test language impaired children. We are getting them to produce different kinds of sentences and words and things like that. And so, we have very rigid rules about how you can cue them and how many times you can cue them, so that our language impaired kids get the same cues as our typical kids. So what is happening with the cuing over a two or three year study? What is happening with the data analysis? We just have these rolling reportings, where people bring their work to our lab meeting.
Also, when there are any difficult data from a given child. Every single lab meeting, we ask: Any data problems this week? And we have them bring pictures that everybody looks at together and comes to consensus for difficult analysis problems. And again, our lab manuals are where we write all of these down. So that they are documented. People do not read them regularly, but they do annually.
Ongoing Management of Data Acquisition and Analysis
I still keep written lab notebooks, and the reason I do that is writing things down while you’re doing it in pen is really different from handling things electronically. So, I’d love to get away from the paper. But when it comes to real data and analysis issues, I want to be able to go back to the notebooks, see what people did and be able to replicate that. We do that a lot. I am still wed to paper for a little portion for that reason: paper and pen, not pencil.
Also, obviously, we need to maintain records of all research participants where we collected data or where they decide not to participate, or we decided not to include them, to be able to address that cherry picking problem I alluded to.
And maintain well-labeled files with data—from our data notebooks, to our videos, to our actual data files—for a very extended time period. So all of these relate to replicability.
Ongoing Management of Data Acquisition and Analysis
Documenting processes in our research. The steps that we use to arrive at outcomes. This is what our manuals, what our data notebooks are about. For any new trainees, for new people as they roll in, for clarifying procedures, for standing up to an audit. That is really what we’re thinking about. But the main thing that we’re thinking about for our everyday science, is that our studies take years. And we want who we run in year four to be the same as who we ran in year one. That’s absolutely crucial, and that is really a huge rationale. But an added benefit is if anybody looked or anybody wanted to replicate it, as if they ever would, we could really stand up to that.
Theoretical Framing of Findings
The third issue is theoretical framing of findings. So we’ve talked little bit about new lab members, about approaches to handling data in the lab. What about the situation where—and this happens to me a lot, which is why maybe I’m highlighting this one—where my data don’t fit my hypothesis. Where the story is different. How do you handle that?
So, for all of you, either add to the prior issue, or think about this third issue of when you’re data don’t fit your theoretical account. How do you handle that in your lab?
I have a tough one. We were talking about how, if you have a science question and if you’re very clear with students about there being multiple correct answers—you just let the data lead you. That is a reasonable way to go forward.
But what if you’re not asking a science question. If you’re developing a new tool, or developing a new algorithm, then it’s very clear what a performance spec would be. Which means, you can’t pretend like doing worse than something is still great. So, it’s difficult to think about the theoretical framing of poor findings in that case, and it seems like a long-term problem.
We talked a lot about making sure that it’s really a change in your theoretical concept versus an outlier in your data, or a problem in your data. So going back and checking the original raw data to make sure there wasn’t a miscalculation or something that caused it to be outside of what you hypothesized it to be.
One of the things we do is go into everything basically prepared not to publish the results. And/or revising your theory about what’s going on.
We also talked about whether there is not just an outlier that’s causing the problem, but whether your measure is not sensitive enough and that’s important.
Outliers are informative. Noise in data is informative and this drive to have non-noisy data, non-noisy human data is just irrational. I mean, then it’s really how you tell the story about the data as they are—one outlier, sure. But, if you got multiple outliers then there’s a different story there, and you have to be sensitive to that.
Theoretical Framing of Findings
Just a couple of quick things that we do. I think that these were really deeper examples than mine. Again, in our teams we talk a great deal about the idea that the data need to tell the story. I have so many anecdotes about some of, frankly my own favorite work from my own lab, that was completely counterintuitive and did not work with my hypothesis. I always make sure that I introduce this early and often. For doctoral students when their study comes out, and we run the stats, it’s often like, oh no. And I’m like: this is not over, this is the beginning of the interesting part. We are seeking the truth. If we knew it already, it wouldn’t have been worth doing this experiment. So, basically trying to reframe a little bit about how the theory and the data interact.
Establishing a Responsible Environment
Just to finish up and have Mario have his turn. Above all else, what you have to do is model the behavior you expect from students in your lab.
You need to stay close to the data and to the processes in your lab. And to go to our very sad Mark Hauser story: You’re responsible whether it’s because you were absent, or if it’s because you did the wrong thing.