Thursday, February 21, 2013

Unkept preregistration commitments

In 2005, a group of medical journal editors (ICMJEagreed to require preregistration of clinical trials as a condition of considering them for publication. This was great news. If trials are registered properly* in advance this raises the credibility of the findings, since it's clearer that researchers didn't cherry-pick from among a wider range of results.

The bad news is that ICMJE apparently didn't follow through on its commitment. Ben Goldacre recently pointed to a 2009 JAMA paper reporting that of the 323 RCTs within three specialties indexed in 2008 by ten of ICMJE's member journals, only 45.5% were properly preregistered. Of the remainder, 27.6% were lacking registration, 13.9% were registered after the completion of the study, 12.1% were registered without a clear primary outcome, and .9% were registered after the completion of the study and lacked a clear primary outcome.

Digging further into the results of the JAMA paper, there is a further problem as well. Of the RCTs which had been properly preregistered, in 31.3% (46 out of 147), the primary preregistered outcome was different from the reported primary outcome in the published results.**

Note: of course, even if trials are registered in advance, this still leaves the crushing problem of publication bias. Publication bias arises as a result of journals being more likely to publish positive results. Something that presumably would help with this problem is if all trials must be registered in advance and all data from them must be made public.  There is a petition here -- spearheaded by Ben Goldacre and others -- to put pressure on various groups to require that all clinical trials report results.


*Here is what ICMJE says constitutes a minimum requirement for preregistration: "An acceptable registry must include at minimum the following information: a unique identifying number, a statement of the intervention (or interventions) and comparison (or comparisons) studied, a statement of the study hypothesis, definitions of the primary and secondary outcome measures, eligibility criteria, key trial dates (registration date, anticipated or actual start date, anticipated or actual date of last follow-up, planned or actual date of closure to data entry, and date trial data considered complete), target number of subjects, funding source, and contact information for the principal investigator."

**The discrepancies were: "...the introduction of a new primary outcome in the article (ie, a secondary outcome or an absent outcome in the registry that becomes a primary outcome; 22 of 147 [15.0%]), omission of the registered primary outcome from the article (15 of 147 [10.2%]), published primary outcome registered as a secondary outcome (8 of 147 [5.4%]), a registered primary outcome reported as a secondary outcome in the article (6 of 147 [4.0%]), and timing of assessment different in the article and the registry (4 of 147 [2.7%])."

Tuesday, February 12, 2013

Cochrane library to become open access

I recently saw the announcement that Cochrane reviews will now be open access. Reviews will not be available until one year post-publication. Until now, most of the reviews in the Cochrane Library were only publicly accessible in some countries (not including the U.S.).

The article above quotes the CEO of Cochrane, Mark Wilson, who says, “This new agreement provides a huge boost to The Cochrane Collaboration’s work to inform healthcare decision-making with high-quality research evidence."

Since as I've noted, Cochrane reviews seem to be among the most reliable systematic reviews out there, this is really good news.

Thursday, January 31, 2013

Why systematic reviews matter and where to find them

Let's say you want to know what the research says on a particular question. Do you start your search by looking for systematic reviews? Systematic reviews are surveys of the research on a particular question which bring together evidence from a number of individual studies. Here are a couple of reasons that it's a really good idea to start your research by looking for them:
  • More reliable results. In bringing together results from a number of studies, systematic reviews can avoid some of the issues that affects results of individual studies. Particularly if the review is done well, you'll get a much more reliable answer to your question, without having to do the time-consuming project of tracking down individual studies yourself.
  • Greater power due to increased sample size. Some systematic reviews pool quantitative results from a number of studies into what's called a "meta-analysis." The chance of finding a statistically significant effect increases with sample size, so by pooling results and increasing sample size, meta-analyses are able to better detect a statistically significant effect than individual studies.

Not all systematic reviews are equal:

Though it's better to start by looking for systematic reviews than to look at individual studies, there's a danger in trusting the conclusions of a review without looking more closely. Here are some things to look for:
  • Do the authors specify which intervention/outcomes, population(s), and types of studies they'll be including? The danger here is that if there are not clearly specified inclusion criteria, studies could be selected on the basis of their results in a way that skews the findings in some direction.
  • Does the systematic review evaluate the quality of the included studies and eliminate low quality studies? If the studies that go into the review are problematic e.g., if their method is not likely to yield a reliable estimate of the counterfactual, then the meta-analysis/review will also be problematic. 
  • If the results of studies are combined in a meta-analysis, is it possible to tell whether the studies are similar enough in their interventions and outcomes that doing so is reasonable? 
  • If the results of studies are significantly different, do the authors simply combine the results in a meta-analysis without discussion, or is there an attempt to assess factors which might explain the variation in results? 

Cochrane Collaboration's systematic reviews:

The Cochrane Collaboration's website is the best place to start a search for systematic reviews on health-related topics. Of the various organizations which produce systematic reviews, the Cochrane Collaboration has produced the most by far, about 5,000 reviews. Many of these reviews contain meta-analyses.

Cochrane reviews have some features which make them particularly reliable:


Other places to look for systematic reviews:
  • Campbell Collaboration for systematic reviews on social interventions
  • DARE - a database of systematic reviews from the Center for Reviews and Dissemination 
  • AHRQ's Evidence-based Practice Center reviews
  • Pubmed search: go to advanced options and then select the "systematic reviews" and "meta-analyses" options.
  • Eppi-Centre at the University of London.
  • Health Evidence Canada

If you know of any other good databases/places to search that I've missed, please let me know in the comments or email me.

Wednesday, January 30, 2013

What is high quality evidence? Common methods and issues

Whether studies are on cancer treatments or education or deworming, the purpose is to find out what the effects of these interventions are. High quality evidence is information that gives us a reliable way of discovering causes and effects.

Estimating the counterfactual:

Suppose a doctor prescribes a medication and subsequently, a patient recovers. What we’d want to know is: if the doctor had not prescribed the medication, would the patient have recovered? If so, would she have recovered as quickly?

Since we can’t create a world in which the very same patient receives the medication and also does not receive it, we can't use the same patient (at the same time) to find out the counterfactual i.e., what would have happened in the absence of treatment.


Studies can be described generally by methodology. Certain methods tend to be lower quality than others as a means of discerning causality, because at least in many situations, they provide less reliable information about the counterfactual. By knowing the issues that arise for each kind of method, we're able to better understand where we should be skeptical of studies and where we can be more confident.


Three common study methods:

Let's begin by looking at three common study methods: before vs. after, participant vs. non-participant, and randomized controlled trials (RCTs). RCTs are most often used in medical research, though they are also being used increasingly in non-medical interventions, particularly by groups such as J-PAL and Innovations for Poverty Action (IPA).
  • Before vs. after comparison studies: This type of study is a comparison of some factor pre-treatment to the same factor post-treatment. For instance, a study might measure participant rate of employment prior to a job skills program to rate of employment afterward.
  • Participant vs. non-participant comparison studies: This method measures the difference between participants and non-participants on certain factors e.g., an after-school program compares grades and standardized test scores of participants and non-participant students.
  • Randomized controlled trials (RCTs): These studies randomly assign participants to treatment or control groups. The control group receives a placebo, nothing at all, or another treatment (in some studies, there are multiple comparison groups). The study compares the outcomes of the treatment and control group(s). For example, a drug trial compares the randomly assigned treatment group to another group receiving a placebo.

Two major issues:

There are problems which often prevent the before vs. after and participant vs. non-participant comparison studies from giving us reliable evidence about cause-effect relations.


Confounding factors: 


Suppose you read a news article on a health topic, for instance this CNN article. The article, "Vitamin deficiency may cause weight gain," cites a study which tracked 4,600 women aged 65+ for 4.5 years. The researchers found that women with lower levels of vitamin D also gained more weight (two pounds more on average) over the study period than women with higher levels of Vitamin D.

Might women with higher levels of vitamin D also be different in other ways from women with lower levels? Quite possibly! Lots of things may be correlated with higher levels of vitamin D: those who exercise more outdoors may have higher levels of vitamin D since sunshine is a source of vitamin D. Fish contains vitamin D, and those who eat more fish may also tend to have healthier diets in general. So while it may seem at first that lack of vitamin D causes weight gain, it could merely be that women who exercise more (eat more fish, etc) tend to both lose more weight and to have higher vitamin D levels. In other words, vitamin D levels and weight might be influenced by a third factor, rather than directly influencing each other. The point here isn't that we know what the associated factors are, but just that there could be many intertwined factors (called "confounding factors" or "confounding variables"), and where there are confounding factors, we can't tell merely from the observed effect what the causes are. 

Confounding factors are often present and a problem for before vs. after comparison studies. There are statistical methods for trying to account for confounding factors e.g., regressions. Regressions are calculations which produce coefficients that indicate the degree that various factors are associated with the outcomes that we're interested in. A big issue for regressions is that there are a bunch of assumptions made in inferring causality from the results, which may not be fulfilled and (dangerously for those of us interested in relying on results of studies!) whose fulfillment is often hard to assess.

Selection bias: 

Selection bias occurs when the treatment and comparison groups differ in ways other than whether they receive the treatment that may affect the outcomes. This is frequently a problem for participant vs. non-participant comparison studies. 

Consider a case in which an after-school program reports that its students have higher grades on average than non-participant students. Students who voluntarily attend an after-school program are likely to be different from non-participant students in other ways as well e.g., they might be more motivated than students who don't attend, have more involved parents, etc. Because of these other possible causes of higher grades, we can't be confident that the program is the cause (or even part of the cause) of the higher grades. It's possible that it is, but there isn't strong evidence.

Another example: suppose a philosophy professor who wants to show that majoring in philosophy helps students prepare for law school. The professor points to a study which shows that philosophy majors score higher than students in all other majors except for math/physics majors on the LSAT. The problem, of course, is that students who choose to major in philosophy may have qualities (propensity to logical thinking, for instance!) that help them on the LSAT, independently of their major. Because of confounding factors, we can’t tell the extent to which philosophy classes cause better performance on the LSAT than, say, history classes, just by looking at scores of majors. Certainly the study suggests that philosophy may help cause higher LSAT scores, but the evidence is not very strong.

How RCTs avoid confounding factors and selection bias:

RCTs avoid the problems of confounding factors and self-selection bias through random selection of treatment and control groups. The RCT starts by randomly dividing participants into two or more groups. The groups are tested on relevant metrics (test scores, etc.), either before and after the treatment or just afterwards. The difference between the treatment and control groups are then compared. If the groups are big enough and the difference is large enough, then we can conclude that there is only a very small chance that we’d find such a difference if in fact there were no treatment effect. This is called testing for statistical significance or p-values, and generally researchers use a p-value of .05 or less as a benchmark for significance. The p-value refers to the ch
ance that we'd find a difference between groups that is this extreme or more extreme, if the hypothesis that there is no effect (called the null hypothesis) is true.

RCTs do away with the problem of confounding variables, via the randomization procedure. Because randomization with large enough group sizes “washes out” any confounding variables among students, RCTs allow us to avoid the problem of confounding variables much more effectively than before vs. after and participant vs. non-participant comparison studies do. Similarly, since the participants are not self-selecting into the treatment or control groups, bias does not arise as a result of their choice about whether to receive the treatment. Thus, the randomization process makes it very unlikely that people who are of a particular type will end up heavily represented in one group rather than the other, so the two groups are unlikely to differ systematically in their traits.

Wednesday, October 24, 2012

Cochrane findings that will help you

The Cochrane Collaboration is one of the best sources of high quality evidence on health and has about 5,000 systematic reviews reviews in its library. It's possible to search the reviews using the Cochrane summaries site, and the abstracts provided there are useful, but a major drawback of the site is that there is no way to access a compilation of the results quickly, which would be very useful as a way of accessing Cochrane's results in a time-efficient way.

Since many people don't have time to make a lengthy study of many abstracts/reviews, but could potentially benefit from the findings contained in some of the reviews, I decided to pull together some results of Cochrane reviews in a more user-friendly way.

The most useful reviews for improving your life

I considered where to start. Many Cochrane reviews are on the evidence for particular medical interventions (drugs, surgical techniques, and so on). People would need to see a doctor in order to use many of these treatments. I came to the conclusion that reviews on interventions that people would try at home would be the most useful place to start compiling results.

There is a whole range of things we can think of as aiming at "life improvement": for example, people commonly think that drinking beverages with antioxidants and cutting down on sodium is a healthy choice. The question is: which "life-improving" behaviors are supported by high-quality evidence?

With this question in mind, I selected reviews from the Cochrane library by reading through the review titles in Cochrane's full list

Criteria used in the search
  • Reviews pertaining to a population without an existing serious disorder or dysfunction. So for example, I didn't select reviews for treatment of asthma, heart disease, thyroid disorders, cancer and so on. 
  • Reviews on conditions that people commonly have without going to a doctor - for instance, treatments for aches and pains like headaches and backache, weight loss treatments, and so on. 
  • Reviews that were applicable to a general adult population. I did not select reviews specifically for elderly people, children/infants, teenagers, smokers, or pregnant women.

The findings

67 studies met my criteria. [Note: though there may be 10-20 I missed after scanning the titles ~5,000 titles for relevant reviews and scanning again to double-check, I doubt there are many more that meet the criteria I used.] Here are my definitions of categories used in the table below:

Quality of evidence
  • High: the authors note a low risk of bias at most; they do not mention the need for further studies in order to add to the reliability of the findings
  • Moderate: the authors note a medium/high risk of bias or other problems for some of the included studies and often mention a need for further high-quality randomized controlled trials.
  • Low: the authors note a high risk of bias in some or all of the studies and/or there are very few/small-sized studies; the authors mention a clear need for further high quality studies in order to come to reliable conclusions.

Effect

 In the cases where "helpful/no effect/harmful" ratings are given, there was enough evidence to support this conclusion to an extent according to the authors of the review; note that in the attribution is in some cases tentative because of weak evidence.
  • Helpful: studies showed that there was evidence in favor of the intervention's being helpful.
  • No effect: the authors conclude there is some evidence of no effect.
  • Harmful:  studies showed that there was evidence that the intervention is harmful.
  • Not enough evidence: according to the authors of the review, there is not enough evidence to draw a conclusion at all, either because of a lack of high-quality studies, or because the available high-quality studies are too small. 
See the full results in a shared Google spreadsheet (see "full version" tab).

Recommended use: download the file to be able to view the cells fully, and use filters to search for reviews you're most interested in. Below is my abridged version; note that the full version contains effect sizes with supporting quotations.




Visualizing the results

The labels refer to the above categories; the numbers within the bubbles refer to the number of studies in each pair of categories (for example: "high quality of evidence for a positive effect" is in the upper right-most bubble). The size of the bubbles corresponds to the number of studies in each category.