The Hypothesis Generation: Confronting Epidemiology, Fear, Public Policy, and the Limits to Knowledge (4)
Daniel Goldstein
Consultant in Medical, Industrial, and Environmental Toxicology Apothecary Historian and Collector
PART FOUR- Issues and Challenges with Epidemiologic Studies (Illustration: "Froot Loop" you would have to eat every day to ingest allowable daily intake of glyphosate)
We have thus far addressed primarily a single category of limitations in the conduct and interpretation of epidemiologic studies- findings due to random chance, which can (and will) predictably impact even well designed and conducted studies. There are many other issues which can affect the performance of studies and which may cause either the incorrect acceptance of a statistical association or the failure to find an association which in fact exists. Before we look at some of these issues, it is worth briefly characterizing what a properly constructed study should look like.
In a well-designed study, the population under examination needs to be clearly defined in advance, and subjects must be selected from the general population using a method that will not distort results. One cannot, for example, take a previously recognized “cluster” of cases associated with a particular exposure and then go find unexposed controls to go with them, as this preordains the outcome (a seemingly obvious error, but this is exactly what was done in the original work on diethystilbesterol). Similarly, you cannot advertise in a community for volunteers to participate in a study of a particular exposure and outcome, as potential exposed volunteers who have the disease outcome may be very motivated to participate, and potential control subjects who do not have the exposure but do have the disease may believe they are undesirable. These are of course unsubtle examples chosen for illustrative purposes. Effects of this nature in real studies can be more difficult to perceive and understand.
Presence of or exposure to the risks factor(s) in question must be correctly ascertained, whether as a continuous measurement (dose) or categorically (low, medium, or high; quartiles, etc.), and of course one must establish that there is differential exposure among groups, i.e.- in the simplest model- the “exposed” group should have a meaningfully higher exposure than the “control” or “unexposed” group. The term “exposure” is used broadly here and does not need to be a chemical substance- any risk factor, even such basic factors as age and gender, must be correctly ascertained and recorded. Lacking a differential exposure, any findings of association are meaningless, or at least mistakenly attributed- how can exposure to “X” be the cause of (an increased rate of) condition “Y” in the “exposed” group if the “control” group actually has the same exposure?
Outcomes, of course, must be properly characterized as well. This may be reasonably straightforward with some conditions- you either have a broken hip or you do not. It can be harder with other conditions. Severe asthma is usually straightforward, but the actual border between normal and asthma is not entirely clear in practice and asthma can be confused with chronic bronchitis and other conditions. For some disorders, diagnostic chaos reigns. “Learning disabilities” is a good example as there is no single diagnostic criterion for this condition, and teachers, parents, and formal testing may provide different views of the same child. Further, learning disabilities covers a wide range of conditions having varying degrees of inter-relatedness. The solution here is often to create an operant definition for purposes of a particular study. There is nothing wrong with this in principle, but it then becomes important to understand that any associations found may pertain only to the operant definition at hand. This is especially problematic when applying consistency testing among studies, as one study of “learning disabilities” does not necessarily confirm (or for that matter refute) another. Does a study showing “learning disabilities” in the form of reduced verbal performance (but not attention disorders) “confirm” another study in which reduced attendance to task is found... or does it in fact refute it?
Finally, studies must be free of the effects of bias and confounding. Bias, in this context, does not refer to a prejudice or ulterior motive. Rather, it refers to a systematic non-causal relationship between a risk factor and an outcome. For example, it has been shown that mothers who have given birth to an infant with a congenital defect are more likely to recall and report exposures to medications during pregnancy than mothers delivering healthy infants (recall bias). This is intuitively understandable as human nature, and the point here is not to accuse mothers of trying to mislead anyone. Rather, the important point is that this phenomenon will create apparent associations with drug exposure in general and can potentially bias a study of any drug as a risk factor for birth defects. Many forms of bias exist, and they may act to either create (or enhance) or diminish statistical associations.
Confounders are factors that relate to both an exposure and an outcome in ways that can create (or enhance) or diminish a statistical association. The best way to convey this is probably an example, and the classic example is the association of coffee consumption and pancreatic cancer, later recognized as resulting from a higher frequency of smoking in coffee drinkers (see above). It is important to recognize that a true confounder is not just a difference between groups. Coffee drinkers have a higher exposure to little wooden coffee stirrers too- but they don’t cause pancreatic cancer and hence are a difference, but not a confounder. Knowing when a difference is (or is not) a confounder can be tricky business.
Problems and Weaknesses Commonly Seen in Epidemiologic Studies
While it is not possible to address the entire taxonomy of issues related to the conduct and interpretation of epidemiologic studies, the detection of random events is but one of many limitations. Several issues are worth noting in order to flesh out some of the challenges in epidemiology and to better understand the limits of interpretability.
Biases:
Bias may or may not impact the interpretability of a study, depending upon the nature of the bias and the relationship of the bias to the study groups. Thus, using “age” (in years) as opposed to actual birth date introduces an average error of six months into the data, as we conventionally utilize our last birthday, rather than the closest birthday (past or future) as our “age”. This affects all groups equally and is largely irrelevant to outcomes, and thus is a bias without much impact under most circumstances. Recall bias is another matter altogether. With recall bias, recollection of exposure events is affected differently between or among the groups to be compares, i.e.- mothers who give birth to infants with a birth defect may be more likely to report exposures to medications during pregnancy than those who do not (for a review of this complex topic, see here). This differential impact, producing a specious elevation in reported exposure in the group having the outcome of concern, can in turn produce specious statistical associations.
Bias can take many other forms. For example, selection bias occurs when subjects are differentially (as opposed to randomly) recruited or allocated into control and exposed groupings. This can occur very easily, for example, if subjects with a disease outcome are asked to identify individuals to serve as their controls. Subjects will generally not choose random control, but rather are likely to choose individuals that do not have the condition under study and/or individuals that they do not believe have risk factors associated with that disease outcome. Differences among groups can easily occur using a variety of recruitment techniques that are not so obviously defective, particularly if prospective subjects are aware of, or can deduce the nature of a research study. For example, any study comparing a group of currently exposed workers to the general population will tend to show a lower incidence of disease among the working population as the work itself selects for healthy individuals- the so-called “healthy worker effect”, which can mask associations. Conversely, individuals recruited for a study of cancer and chemical exposure may well be more inclined to volunteer if they believe they have had the exposure of interest, and disinclined if they have not, which can create false associations. Selection bias may thus either mask or enhance associations.
Measurement bias can also easily occur, especially if investigators are aware of the status (exposed/unexposed or affected/unaffected) of subjects. This can occur with laboratory measurements (rounding up vs. down, deciding which values are truly non-detect vs. very low), but is particularly likely with the variable being measured is assessed by survey or recall. Survey personnel assessing exposure retrospectively may, for example, press affected subjects much harder for exposure information than non-affected subjects. The solution to this is to use blinded investigators, but blinded status may be difficult to maintain. In short, there are many ways that bias can enter into an investigation, and immense care is needed to avoid the introduction of biases that can create spurious associations.
Confounders:
Confounding can also impact the assessment of associations; but does so in a different manner. Rather than differentially impacting the measurement of a particular variable among groups, confounding is about the existence of relationships- frequently unrecognized relationships- between risk factors and outcomes. To use the example noted above of coffee consumption and pancreatic cancer, there was no (recognized) bias in the reporting of either risk (coffee intake) or outcome (pancreatic cancer). Rather, there is a common factor that relates to both coffee consumption (coffee drinkers are more likely to smoke) and pancreatic cancer (smokers get more pancreatic cancer), thus creating the appearance of a relationship between coffee and cancer. While missed by the authors initially and pointed out by others, it is possible to adjust or control for confounding. In this case, one can incorporate smoking as a risk factor, and when one does so, the relationship with coffee “disappears.” (There are multiple ways to mathematically do this, but in effect one looks at coffee drinkers who are non-, moderate, or heavy smokers vs. non-coffee drinkers in the comparable smoking category and sees that the observed risk is attributable to the increased smoking and does not “track” with the coffee consumption.)
If you want to study farmer-applied pesticides, you will be looking at farmers. Being a farmer, however, is associated with a myriad of factors- sunlight, diesel exhaust, animal exposure, agricultural dust- not to mention social and economic factors. Further, farming is associated with gender (mainly male), race/ethnicity (mostly white in the US), rural (vs. urban or suburban) lifestyle, and geographical location (different crops in different states, and different chemicals with different crops). Similarly, most occupational exposures do not come in isolation, but are associated with a variety of other chemicals as well as exposure circumstances and socioeconomic factors. Many disease outcomes are associated with multiple factors- race, socioeconomic status (probably indirectly… but why?), gender, and location (multiple sclerosis and latitude) as well as a myriad of co-exposures.
The great difficulty with confounders comes not in how to control for the known confounders (which has technical issues as well), but how to avoid the unknown confounders. This can be exceedingly difficult to do in a world where so many factors are naturally interrelated. (The great advantage of laboratory science, in fact, is not that rats are more informative than people- but that you can control all the confounders.)
Modeling Problems:
There is another set of difficulties with statistical studies of correlation (especially multiple correlations) which is complex and cannot be treated in detail here. Briefly, any attempt to assess correlation is an attempt to ascertain “goodness of fit” with a theoretical model. Thus, the typical correlation coefficient assumes a linear relationship between causal factors and outcome, one in which risk rises proportionally with exposure (double exposure gives twice the risk). Many other models (logarithmic, exponential, etc.) exist. Such investigation works well only if the relationship of cause to effect fits the assumed model. If the actual mathematical relationship is other than assumed, however, one may miss a real association or create an association which does not exist.
Further complexity arises when multiple presumptive risk factors are related in known or, often, unknown ways. In the example of coffee, smoking, and pancreatic cancer, the risk due to smoking appeared as if it were due to coffee because the two are in fact closely correlated (covariant). The variable that was put into the model (coffee) is assigned all of the responsibility for causation falsely because the real (and correlated) cause was not put into the model. In this case, the problem was caught and the issue properly resolved. It is far harder to disentangle complex, unknown interrelationships. A farmer in the developing world may handle a particular pesticide, but in comparison to a non-farmer, probably handles other pesticides and has exposure to fertilizers, fuels, sunlight, animals, heat stress, dehydration, use of well water, rural existence, often poverty, and other factors. Thus if one looks only at pesticides as risk factors for various outcomes, and fails to incorporate potential covariant causes, responsibility can be incorrectly or disproportionately pinned on the pesticide in question.
The problem becomes still thornier when one tries to address the issue by analysis for multiple causes by, for example, multiple regression. In this type of analysis, one looks for association with one risk factor, mathematically removes the effect of that risk factor from the data, and then applies a similar analysis to one or more additional risk factors. If risk factors are completely independent, this can work well. If risk factors are related to one another, this type of analysis can easily create misleading associations because any risk due to later factors is “moved forward” and assigned to earlier factors to the extent that risk factors are interrelated. For example, cardiovascular mortality depends on obesity, hypertension, and diabetes. If one does a multiple regression and puts obesity first, the impact of obesity on risk may appear to be inordinately large as one “sucks up” the risks attributable to related hypertension and diabetes. Thus, the order in which variables are entered into analysis may have a profound effect on perceived results.
Epidemiologists are of course aware of this, and one can test whether order of analysis impacts results or adopting more sophisticated and robust models which attempt to account for interrelated risk factors. While this can help a great deal in refining analysis and avoiding errors, one is still stuck with the problem noted at the beginning of this section- any model assumes a mathematical relationship not only between risk factors and outcomes, but also among related risk factors. One can readily be lead astray when models do not match reality- and in the world of exploratory epidemiology, one rarely has sufficient knowledge of which risk factors merit consideration, let alone the precise mathematical relationships among them.
Exposure measurement:
Retrospective measures of exposure are a particularly problematic issue in environmental epidemiology. Ideally, one would follow a group of individuals prospectively and quantify their exposure to chemical and other risk factors over time, awaiting the eventual outcomes of interest. While this kind of design has many benefits, it has serious problems as well. The design works well for risk factors which produce outcomes in a large fraction of the population under study over a short period of time. Individuals who drink from the Broad Street Pump either do or don’t get cholera in the next few weeks… and mostly they get sick. The design does not work well for long term questions such as environmental cancer risk (typical latency from exposure to cancer is at least a decade) or the influence of childhood exposures on adult disease (a fifty year project). By the time one has an answer in the latter case, fifty years of technological and social change may well mean the answer is only of academic interest, as our exposures and confounders look very different today than they did fifty years ago. To make matters worse, if outcomes are relatively infrequent, one would need to study immense numbers of subjects to find an answer. We solve this problem by working “retrospectively”- we look backwards in time to assess the exposures of individuals who already have the disease outcome of concern against the exposure history of those who do not have the disease.
There are few circumstances in which one can reliably establish the degree and timing of exposure retrospectively. Most toxic materials (contrary to popular belief) are not “stored” long term in body fat or elsewhere. Even for those substances with long persistence in the body, one cannot know from chemical analysis alone whether the currently measured levels reflect a very large exposure many years ago, a much smaller exposure yesterday, an ongoing exposure over time, or any of a myriad of exposure scenarios. Thus, we are most often left with a recall questionnaire or some type of classification-based exposure based upon job title (welder), location (proximity to a nuclear event), etc. In a few classification-based schemes (mainly occupational) we at least have some type of environmental or individual exposure measurements to help define actual exposures, but in many cases no data exist to define actual levels of exposure, and the degree of homogeneity within the classes is unknown. Further risk of error comes about when affected cases or even controls are now deceased, leaving family members attempting to recall exposure data.
Recall of historical exposures to chemicals or other factors going back 20 or more years is an interesting challenge. At most, one can gather information on occupational activity or product use- not actual quantified exposure. This makes it necessary to create some kind of surrogate for exposure, and these vary from the exceedingly crude (ever used vs. never used) to complex metrics based on recalled time and frequency of use as well as handling practices and protective equipment use. An interesting “experiment” in this area has come about over the past decade as a result of two different studies of farmers and their families. The Agricultural Health Study (AHS) is being conducted by the National Cancer Institute and is a large exploratory (hypothesis generation) study of multiple pesticides (and other factors) and multiple health outcomes (everything from cancer to headaches and wheezing). Exposure assessment was questionnaire-based and in a similar time frame, and to address concerns with the design of the AHS, industry developed the Farm Family Exposure Study (FFES) in conjunction with academic investigators and with US and Canadian regulators. The FFES looked at actual pesticide applications and measured pesticide (or metabolite) levels in urine over three days following exposure. The existence of these two studies allowed the direct comparison of exposure estimations in the AHS recall-based model to actual measured exposures if farmers, farm wives, and farm children. This comparison revealed a number of issues suggesting that the AHS results require close scrutiny. The AHS model proved to be a relatively poor predictor of actual measured exposure for the four compared chemicals, and in the case of one chemical and form (chlorpyrifos granular), appeared to be inversely related to exposure. In the case of another chemical (glyphosate) the majority of farmers and nearly all of the wives and children had undetectable exposures despite a high level of sensitivity in the assay system. For two of the four chemicals, levels increased following application in farmers, but levels in wives and children, while routinely present, were comparable to those in the general population- suggesting the exposure to the pesticide (or its metabolite) may mainly be arising via the general diet. Today’s farmers mostly shop at the supermarket for dinner, just like the rest of us.
Why is this so critical? For epidemiology to work, you need to be comparing populations with different levels of exposure. The FFES raises serious questions not just about how accurate exposure measures may be- but about whether, especially for wives and children, there are any meaningful differences in exposure among the study population and whether studies comparing “exposed” farm family members (non-applicators) to the “non-exposed” general population necessarily have any meaning at all.
In the end, any recall-based exposure metric depends heavily on the assumption that use translates into meaningful exposures- and this is clearly not always the case, particularly for individuals with purported “indirect” exposure such as farm wives and children.
Never-Ever Land:
A related issue is the comparison of “ever used” vs. “never used” for particular products. There are several problems with the use of this distinction. The severity of the problems depends a great deal on the specifics of product use. If a product is routinely and repetitively used by the “users,” such that “ever used” generally predicts much greater use (and consequent exposure potential) than “never users,” this may be a measure with some validity. If, however, we are talking about intermittently and irregularly used products, “ever” users may easily be dominated by individuals who have only one or only a very limited number of uses and may in fact have very limited exposure relative to other substances of concern or, for that matter, to the same substance via different mechanisms of exposure. For example, knowing that a particular pesticide “X” appears in the diet, what does it mean to be an “ever” user? If this means you used the pesticide once in your life, and you are getting pesticide “X” in the diet everyday (albeit at what may be much lower levels)- is the “ever” user really more exposed, overall, than the never user? On the flip side, what does the “never” category really mean? We have all been told that “ever” getting sunburn greatly increases cancer risk versus “never” getting a sunburn- but who exactly are we selecting when we pick someone who manages to never get a sunburn? Likely these are people whose overall sun exposure tends to be- for whatever reason- very small.
Chemical detection and disease:
For better or worse, chemists just keep getting better and better at measuring lower and lower levels of things in the environment or in the body. More and more studies are not reporting levels at all as a primary outcome- but just report detectability in some percentage of the populations or, in a few cases, will report a number of positive detections over a period of time. The problem here is that toxic materials generally have a level below which no effect can be observed- either because there is no biologically ascertainable response or because risk of disease (such as cancer) is exceedingly low. There is simply no guarantee that the detect/non-detect threshold has any meaningful relationship to dose response. What if the “detect” population threshold is set 10,000 times less than the no-response level in animals? The “detect” population has levels which are toxicologically without importance and the non-detect has levels which are…. well…. undetectable (often assumed to half the lower limit of detection… but the reality is simply unknown). Under these circumstances no dose-response assessment is possible, and the plausibility of any detectable effect is exceedingly low.
The real problem we have with “detections” is simply that we have applied better and better chemical technology and to human samples with greater and greater regularity. We have not regulated chemical exposures to zero- but rather to levels intended to confer negligible risk, and thus have no basis for believing that the levels of chemicals in the body will be zero. It is only through a lack of ability to look that we have seen nothing in the past. It is a rude awakening when the public discovers that we can, in fact, detect a myriad of materials in our “formerly pristine” bodies at very low concentrations.