The Hypothesis Generation:
Confronting Epidemiology, Fear, Public Policy, and the Limits to Knowledge (3)

The Hypothesis Generation: Confronting Epidemiology, Fear, Public Policy, and the Limits to Knowledge (3)

PART THREE- Which Leads do we Follow? Evaluating the Strength of Epidemiological Data

How do we assess which hypotheses might be worthy of further confirmatory study or other research? On a policy basis, we have not (in my opinion) done this terribly well- tending to pursue (i.e.- fund) work based on a vague composite of the “concern du jour,” politics, and pre-conceived popular notions. I would not suggest, nor am I naive enough to think, that public concerns and political issues will ever be expunged entirely from the process, nor should they be. We do not, after all, live in a “scientocracy.” I will, however, set political considerations aside to look at the scientific approaches that may be helpful.

 Looking at a hypothesis and deciding whether to pursue it further is ultimately the same exercise as assessing causation and determining whether to accept a relationship as (for practical purposes) causal. The general criteria should be the same, but the weight of evidence required is obviously greater for concluding causation than for proceeding with further study. Before moving into the criteria, some discussion of the relationship between epidemiological associations (which quickly become hypotheses for further study) and causation is in order. 

 Epidemiological or statistical studies alone cannot formally prove causation. Some causal relationships are readily apparent, particularly when the outcome follows immediately or quickly from cause. You do not need a detailed statistical analysis to know that shootings cause death- the mechanics and biological plausibility of the process are clear and the outcome immediate or nearly so, such that intervening causes have little opportunity or time to operate. If fact, one does perform a kind of “do it yourself” epidemiology assessment even in this straightforward situation- you have determined that being shot is highly associated “statistically” with dying, and taking other factors into account, conclude the relationship is almost certainly causal. 

 The situation becomes more difficult when the timing and biological basis of a response are unclear or unknown and when multiple other causal factors may intervene. A good example would be our current struggles with drug safety. In clinical studies and in post market surveillance, we gather data on a nearly endless variety of possible outcomes in relationship to drug use. Some relationships are reasonably predictable extensions of drug action (sedatives can over-sedate and will cause diminished mental function in sufficient doses), but most possible side effects have no evident relationship to drug function, i.e.- liver disorders from pain relievers or hip fractures from antibiotics. Further, disease processes may also cause various health outcomes, and these outcomes can appear to be associated with drugs used to treat these conditions. Viral infections often cause rashes and elevations in liver function tests for example, and medications taken for the fever and discomfort of such illnesses will be associated with these abnormal findings. The statistical association is valid in this case- but the relationship may not be a causal one. 

 Causation can be still harder to define in longer term studies of health outcomes. Do childhood exposures to environmental agents cause disease in the adult population? One cannot do such a study moving forward in time in most instances- who wants to wait 60 years for a result- so they are usually done looking backwards in time. Thus, exposures must be established on some type of retrospective basis, and there are decades of time in which interceding causes and inter-relationships may intervene. Many outcomes of interest have multiple causes, i.e.- many factors impact the risk for most specific cancers, and reproductive outcomes are similarly influenced by diverse factors from infection to prior behavior. To make matters worse, historical relationships among potential causes are myriad and may create the opportunity for associations which are real, but non-causal. Thus, in studies including farming and non-farming populations, one sees an association of agricultural pesticide use with lip cancer- but this is because agricultural pesticides are used only by farmers, and farming itself is associated with lip cancers- presumably the result of sun exposure. Similarly, residence within 30 miles of a coal fired power plant is associated with increased risks of respiratory diseases. However, selecting for proximity to power plants effectively selects for urban vs. rural residence, and urban air and urban living conditions differ in a myriad of ways from their rural counterparts.

 The bottom line is that statistical associations can be found for many reasons unrelated to causation. They may be due to random variation or may be precipitated by interrelationships (what statistician call confounding) with causal factors which are the true causes of the outcome- whether recognized or not. Even if a causal relationship does exist, the direction of the relationship may be very unclear. Continued intellectual activity is associated with a reduction in Alzheimer’s disease- but does this mean that “brain use” preserves function or does it mean that Alzheimer’s disease has early manifestations which reduce cognitive activity? We would love to believe the former- that we can think our way out of Alzheimer’s disease by remaining intellectually active- but our desires and predispositions do not, in fact, answer the question.

 In view of these considerations, it is not surprising that more formalized criteria for causation look at three basic types of information (once you pass certain threshold tests like proper temporal relationship). The first is the strength of the original statistical finding, followed by consistency with other epidemiological studies, and by the consideration of non-epidemiologic information. To use our token analogy- one can look at just how many heads have been flipped in a row (3 is not impressive, 8 is getting there, and 50 should convince almost anyone); one can look at how many other people have the same finding (if everyone is getting all heads, then all the tokens are two-headed, and if a lot of people flip heads with a few tails here and there perhaps they are mostly two-headed with a few head and tail variants in the lot); or one can stop flipping coins and try a non-epidemiological approach to answering the question- just turn the darn thing over and look at the other side.

 The applicable criteria for assessing causation in epidemiological studies are well known, and are eponymously referred to as the “Hill Criteria” or “Bradford-Hill Criteria” after the British epidemiologist, Sir Austin Bradford-Hill, who put them forth in 1937. To lay them out (in paraphrase and in somewhat more modern terminology) for consideration, they are:

 1. Temporal Relationship:   The exposure to a risk factor must always precede the outcome. This is considered an essential criterion. The implementation of this criterion would seem to be both obvious and simple, but reality can become a good bit more difficult. For example, a widely publicized study looked at urinary pesticide metabolite levels and the risk of attention deficit hyperactivity disorder (ADHD). The metabolites in question are transient; lasting only a few days following exposure, and the metabolites were measured on only a single occasion in each child, between the ages of 12 and 15 years. Metabolite levels were found to correlate with ADHD. The paper was published, with press release, in the Journal of Pediatrics, and subsequently highlighted in the Journal of the American Medical Association as well as widely touted by the media. But here’s the rub- how is it that pesticide metabolites measured at age 12 to 15 relate to- and perhaps cause- a disease process that is well in place by age 8 and may well precede birth in its etiology? Perhaps one-time levels at ages 12-15 are good predictors (or “postdictors”?) of pesticide levels over the more extended time frame from uterus to age 8- but this seems unlikely. This widely acclaimed hypothesis needs some serious skepticism- a fact which the authors, to their credit, do acknowledge. The media, however, often pick up the headline without transmitting the caveats.

 2. Strength: Just how strong is the causal association found? What is the probability that it is due to chance alone? P = 0.5, or a 1-in-20 chance of error is good enough to publish in most cases, but some results are much stronger, and this certainly suggests a greater weight should be placed on an observation. One also needs to look at the size of an effect. If a risk factor creates a 10-fold or 20-fold increase in risk, this is rather striking and is more deserving of attention than a less impressive finding. The reality is that most associations reported in epidemiologic studies carry an increased risk (or odds ratio) between 1 and 3, and many are 1.1 to 1.3-fold increases in risk. Such findings often depend on a very few individual observations, and while they may meet the test of statistical significance (p = 0.5), are not particularly robust. Guidelines vary, but relative risks (or odds ratios) below 3 should certainly be subject to skepticism.

 3.  Dose-Response Relationship:  With most biological effects, one expects to see a biological gradient of response in which risk or severity rises in concert with increasing exposure, often with a level below which no outcome is evident (threshold). Not every study is large enough or has sufficiently detailed exposure or response information to look for a dose-response relationship. Finding a clear dose-response enhances the value of an observation for several reasons. While a simple exposed/unexposed comparison may readily suggest a relationship by random variation, a clear, systematic increase in response at multiple dose levels makes random effects less likely. (Non-random effects like bias and confounding may still explain the finding however.) More importantly, such a finding meets biological expectations and is thus more deserving of consideration.

4. Consistency:  Any finding becomes more convincing when replicated, and this is especially true when a finding is robust among various populations in various places and circumstances over time. The increase in cancer risk with smoking is a good example of a robust finding, having been seen over many years in multiple studies of men and women and across races and geography. (This can be formally assessed via a mathematical agglomeration of results from multiple studies in a process called “meta-analysis.”)

 5.  Plausibility:  The term “plausibility” covers a lot of ground- it refers in general to the consideration of non-epidemiological information. This information might include animal studies replicating a finding in humans, mechanistic studies suggesting the means by which a particular material might cause an outcome, or just general consideration of biological, chemical, and physical knowledge which may impinge on a study. A cancer finding in a hypothesis generating study for example is far stronger if the chemical under study is a known mutagen (causes genetic damage) and has been shown to cause cancers in multiple animal species. If, on the other hand, the chemical is not metabolized, but is simply excreted unchanged, is not mutagenic, and has failed to cause cancers in multiple animal species despite repeated study, one needs to apply a bit more skepticism.

 6.  Alternate Explanations: Alternate explanations need always to be considered. The classic study of coffee and pancreatic cancer, ultimately accounted for largely by increased rates of smoking (ref and ref) among coffee-drinkers, is a good example. In practice, excluding alternative causes is very difficult in a purely hypothesis-generating model as one has limited knowledge at the outset. Typically, we take into account the obvious factors that may alter risks, such as gender, race, and smoking, and often look for additional relationships to socioeconomic status, obesity, etc. The reality is that an alternative cause which is both unknown and not a part of a study will not be identifiable within the study.

 7. Experiment:    Some taxonomies of the Hill criteria put animal data here, but Hill in fact had in mind the use of interventional “experiments” in populations under study. As an example, smoking is associated with lung cancer. If one took a smoking population and intervened to reduce or eliminate smoking in a subset of the population, one would expect the risk to diminish. Even more clearly, if one saw a suspected drug side effect, withdrew the drug, saw improvement, and then could re-create the side effect by re-administering the drug, evidence for causation would be strong. The ability to alter biological responses by intervening with regard to specific risk factors can be a powerful piece of evidence, assuming that risk factors are not intertwined. It would be hard to disentangle, for example, air travel from other risks of international travel today, as most people travel by air.  While a useful approach, ethical, moral and practical considerations necessarily limit the role of human experimentation.

 So-called “experiments of nature” can occur when, due to natural or other factors (regulations, changes in product use, technological progress), exposures to risk factors change over time. As an example, the suggestion that autism is caused by exposure to certain classes of pesticides (organophosphates) is undermined by the fact that the supposed “epidemic” of autism (possibly a reporting and diagnostic artifact) has occurred in a time frame when general population exposure to these agents has been radically curtailed for a variety of reasons, particularly governmental restrictions on residential use.

 8. Specificity:    When a cause has an apparently unique relationship to a particular outcome, this can provide strong evidence of causal relationships. Such circumstances are rare, however, as the risk of most afflictions depends on multiple- often myriad- factors. A few examples do exist, however, in which disease is virtually never seen in the absence of a particular risk factor- asbestos and mesothelioma (a rare tumor of the lining cells of the thoracic cavity), vinyl chloride and hepatic angiosarcoma (another very rare tumor of the blood vessels of the liver), and the highly unusual limb-defects seen in thalidomide-affected babies are among these few examples. Even here, one has to be wary of co-exposures or other factors correlated with the measured exposure.

 9. Coherence:    An association should be coherent with current scientific knowledge and understanding. Thus, correlations between well-water and cholera make sense (at least today- before germ theory, this was another matter), but correlations of astrological signs and cancer do not. The border between this criterion and that of “plausibility” is admittedly vague, and this is perhaps simply a reminder to consider both highly specific information and general knowledge.

  10. Analogy:   Again, there is some overlap with plausibility and coherence, but analogy refers to the occurrence of analogous observations with similar materials or risk factors. The analogy may be quite specific or quite general. Thus, numerous members of the chemical isocyanate family cause asthma, and it is quite likely that any new member of that family will do the same. Such a finding “makes sense.”  Similarly, exposure to atomic radiation in Hiroshima and Nagasaki may shed light on the risks of other radiation exposures resulting from everything from medical x-rays to uranium mining.

There is, in the end, no single test that can be applied to determine which hypotheses deserve further consideration and which do not, or decide which associations rise to the level of presumptive causation, but these criteria provide a strong basis for the assessment of individual findings, and need to be regularly applied if we are to find our way through the hypothesis jungle.

要查看或添加评论,请登录

Daniel Goldstein的更多文章

社区洞察

其他会员也浏览了