The Yale DNA Hybridization Scandal: Introduction

Charles Sibley and his student, Jon Ahlquist, were interested in avian molecular systematics. Sibley had been a prominent advocate of protein electrophoresis as a phylogenetic tool in the 1960s, but apparently had some difficulty in recognizing boundaries. In the early 1970s, he scandalized Yale University by being led out in handcuffs from his post as director of the Peabody Museum of Natural History, due to his apparent involvement in plots to smuggle the eggs of endangered bird species out of their home countries and into his laboratory. This story is recounted in, of all places, Sports Illustrated (24 June 1974) -- it was apparently of greater concern to falconers than to molecular evolutionists.

Sibley paid a fine and continued his avian systematic research. In the 1980s he and Ahlquist adopted a technique for comparing the DNA of different species to find out just how genetically different they are. DNA hybridization is based on the idea that evolution represents the accumulation of DNA point mutations in different bio-historical lineages. If those mutations could somehow be summed and counted, one could tell just how much genetic change has accumulated since the species diverged from their common ancestor.

DNA hybridization

The technique had been developed by biochemist Roy Britten in the 1960s as a way of analyzing the composition of the genome. Some fractions of DNA are highly redundant and non-functional, others are unique in their DNA sequences and functional. Assuming (wrongly and crucially, as it turns out) that these DNA fractions are distinct from one another, DNA hybridization begins by discarding the repetitive DNA (about half the genome) and retaining the unique-sequence, genic DNA.

DNA is a two-stranded molecule, held together by hydrogen bonds down its center. Boiling a solution of DNA adds energy and breaks these bonds, making the DNA single-stranded. This is known as denaturing (or metaphorically, "melting") the DNA, and will usually take place around 84 C. This is, however, a generally unstable state for DNA, and it will spontaneously re-form a double-helix if permitted to cool slowly.

Flooding the mixture with DNA from a different species, however, ensures that the DNA will not be able to pair with its own perfect partner, but will instead be forced to bond with DNA from the other species. This DNA is a hybrid, or heteroduplex, composed of one strand from each of two species. If the original DNA was labeled radioactively, then you now have hybrid DNA, one strand of which can be traced. (The labelled DNA is called the tracer, and the unlabelled excess DNA of the second species is called the driver.)

This hybrid DNA, however, contains fewer hydrogen bonds holding its strands together, since the two species from which it is composed are genetically different. That means that it will require less heat to become denatured, and "melt" at a lower temperature. The trick is to measure this difference in thermal stability between the heteroduplex and the homoduplex DNA. This will tell you (hopefully) how different the genomes of the two species are, by telling you how different their DNAs are, by telling you how poorly bonded they are, by telling you how much lower the temperature required to denature their hybrid DNA is.

The informative statistic is a delta-T, the difference in temperature at which a heteroduplex denatures, subtracted from that at which the homoduplex denatures. That temperature reduction is a measurement of the reduction in thermal stability of the DNA molecule that comes as a result of the strands being poorly bonded. A small delta-T indicates closely related species, with very similar DNA, and a large value indicates more distant relationship. And on the assumption that one degree of difference equals one percent divergence of DNA, the delta-T becomes an estimate of gross genetic difference as well.

Sibley and Ahlquist automated the procedure and applied it to bird phylogeny. They ran experiments in batches of 25, the first of the batch being a homoduplex control (a species' DNA hybridized to itself), and the other 24 representing that species' DNA hybridized to other species' DNA. Different DNA preparations may melt at slightly different temperatures due to the way in which they are prepared, so each batch of hybridizations is subtracted specifically from the homoduplex it was carried out with.

In a series of papers in avian biology journals, Sibley and Ahlquist claimed that their technique was genetic, precise, replicable, and quantitative. And it worked at any level, with birds very similar or very different. It therefore superseded all other inferences about the phylogeny of birds. Thus,

If we are ever to reconstruct phylogeny, it must be done with methods that do not rely on the human eye as the instrument of comparison. It is self-deluding to assume that what we see is all we need to know to reconstruct phylogenies. Such a procedure is subjective and qualitative, and results only in an opinion -- and this is a definition of Art, not of Science. For a method to qualify as scientific, it must be objective and qualitative.

-- Sibley and Ahlquist (1987), in Molecules and Morphology in Evolution: Conflict or Compromise, edited by C. Patterson, p. 118 (emphasis in original).

DNA hybridization is more objective and quantitative than any known method for analyzing morphological characters.

-- Sibley, Ahlquist and Sheldon (1987), in Evolutionary Biology, vol. 21, edited by Hecht, Wallace and Prance, p. 121.

The Ape work

With tough talk like this, they naturally put off a number of other scientists. But the quarrel was largely restricted to a small audience of academic ornithologists. Sibley and Ahlquist's work on bird phylogeny was poorly-understood, and not even terribly interesting, until David Pilbeam (then at Yale, now at Harvard) persuaded them to approach the problem of human-chimp-gorilla phylogeny. Their 1984 claim in the Journal of Molecular Evolution was that they had "resolved the trichotomy" into human-chimp. In other words, they had found humans and chimpanzees to be especially close relatives genetically, rather than equally close to the gorilla and thereby constituting a three-way evolutionary split, or "trichotomy". They reiterated the claim in a second paper in the Journal of Molecular Evolution in 1987.

This received a great deal of publicity (Roger Lewin in Science, Jared Diamond in Nature, Stephen Jay Gould in Natural History).

Again, the claim was simple. They had a reliable genetic technique for assessing the overall genetic similarity of species. Their results were genetic, precise, and highly repeatable. Their central claim was that they did the experiments over and over, and human-chimp were consistently 1.6 or 1.8% different (if a one-degree temperature difference represents a one-percent difference in DNA), while chimp and gorilla were 2.3% different. The two sets of values were statistically significantly different, by a simple t-test.

Unfortunately, few could evaluate it. I was a post-doc working in "molecular anthropology" in the genetics department at UC-Davis, and asked Carl Schmid, one of very few people who knew the method well, to help me evaluate the paper. Schmid explained to me that there many sources of error inherent, and many controls to run, and that the Sibley-Ahlquist paper showed nothing that would permit a critical reader to decide whether they had in fact done what they claimed. If one had some of the raw data, we could see whether their analysis was detecting artifactual differences, for example. We contacted Sibley, asking to see some raw data, and got nowhere, as (we learned) had several other scientists.

Schmid (an associate editor of J. Mol. Evol., which had published Sibley's now-famous work), informed Emile Zuckerkandl, the journal's editor, and asked him to intervene and compel Sibley to release the data (i.e., actual melting profiles of the DNA, rather than simply the single numbers abstracted from them). Zuckerkandl supported Sibley, and refused to make him document the work he had published in the journal.

Schmid and I then wrote a "cautionary" manuscript for the anthropological community, noting that there were many possible sources of error in this crude technique, it seemed unlikely that the technique as described could resolve such fine-scale branchings, and that the work was undocumented, and therefore skepticism was called for in judging it. We sent it to the Journal of Human Evolution (of which I was on the editorial board), and review was coordinated by Maryellen Ruvolo of Harvard (a supporter of Sibley's, and wife of David Pilbeam, who had suggested applying the technique to the primates in the first place). She received negative reviews from Roy Britten (Sibley's friend and inventor of the technique) and from Jon Ahlquist (!). Eric Delson, editor of the journal, was unsatisfied with the propriety of the reviews, and put me in touch with Britten. Britten felt the ms. should not be published because it was critical, but when I asked him whether anthropologists had the right to know that a widely-cited work was undocumentable, which he would certainly not stand for in his field of physical chemistry, he agreed they had the right to know, but still appeared unswayed.

What the data showed

Unbeknownst to any of us, Britten actually had some of Sibley's data, which he had been given prior to a talk on Sibley's behalf in a 1985 symposium on DNA hybridization at the American Museum of Natural History (Ruvolo also spoke on Sibley's behalf). It was this set of data that he asked Sibley for permission to send me, and I received out of the blue in December 1987. I copied the data and passed them on to Schmid, and to Vince Sarich, with whom I had previously discussed the matter.

The problems became quickly apparent -- namely, that there was a far wider scatter of values than Sibley and Ahlquist had published, and this was just 1/8 of their data, and there was no human-chimp link. For example, Sibley and Ahlquist had published that the human-chimp experiments invariably ranged between delta-T = 1.2 to 2.3. But in our small sample we had experiments where the delta-T for human-chimp was calculable as -0.2 and 2.6, well outside the range they had reported. This also raised the issue of how Britten could possibly have failed to notice it -- unless he himself had never looked at it very carefully. Shortly thereafter, when they published their 1987 paper with a table listing experiment numbers alongside values, it became clear that about 60% of the numbers matched our calculations, and about 40% did not. Our calculation of the values from the data was clearly lacking something, although we did it as they described, and it was perfect over half the time. Obviously there was something significant missing from their Materials and Methods. And more importantly, the numbers we had in no way resolved the trichotomy. We wrote to the authors, but received no reply for several weeks.

Meanwhile, Jared Diamond had been singing their praises again in Nature'sNews and Views (332:685, 1988). I wrote a letter pointing out that the work was as yet still undocumented, and that the rest of the evidence bearing on the problem of the human-chimp-gorilla divergence was highly ambiguous, which made the documentation of this conclusion crucial. Diamond responded pompously,

While Marks refers to the study of Sibley and Ahlquist as undocumented, these authors gave detailed descriptions of their methods in many earlier papers and presented their hominoid data in two lengthy papers... (Nature, 334:656, 1988)

The problem of course is that we now knew the "detailed descriptions" to be inaccurate.

We submitted manuscripts to two journals, and while they were out for review we were contacted by Roger Lewin of Science, who told us he wanted to write it up as news. We tried to discourage him, and to get him to wait until the manuscripts were published, but we were unsuccessful; he wrote a double article at the end of 1988 (and incidentally using two of our figures without our permission). We did put Lewin on to Fred Sheldon, now a researcher at the Philadelphia Academy of Sciences, but prior to that, a fearful post-doc who had removed his name from their papers, generally an odd thing for a post-doc to do. Sibley had claimed that the data could not be reanalyzed because of the chaos during his move from Yale, but Sheldon had retained those data. At Lewin's request, Sheldon checked the number in the real data against those published in the 1987 paper, and found that our discovery that 40% of the numbers didn't match was true for the entire bank of experiments, not just for the fraction of the data we had.

One paper went to the community of interest, the students of human evolution, and review was coordinated by the editors-in-chief directly, and after two rounds of peer-review and one round of the lawyers at Academic Press, it was published in The Journal of Human Evolution (17:769, 1988). The other paper dealt more broadly with the problems of DNA hybridization and this corpus of work, and was sent to the journal that had published the two papers by Sibley and Ahlquist, The Journal of Molecular Evolution. Carl Schmid, second author, was on the editorial board of the journal, and the review was coordinated by board member Allan Wilson, a colleague of senior author Vince Sarich. After one round of favorable peer-review, the paper was re-written and re-reviewed, with the active participation of Wilson. The second round of reviews was enthusiastic, and Wilson accepted the paper on the journal's behalf. A week later, in consultation with Charles Sibley and Roy Britten (also editorial board members) the editor-in-chief Emile Zuckerkandl un-accepted the paper. It was ultimately published in Cladistics (5:3, 1989).

Sibley had by now been elected to the National Academy of Sciences. Their last paper on this was published in The Journal of Molecular Evolution (30:202-236, 1990), and acknowledges that there were three "correction procedures" that they applied and had not mentioned in any of their previous publications. Most significantly, they concede that had it not been for these unreported alterations, "... it is virtually certain that Sibley and Ahlquist would have concluded that Homo, Pan and Gorilla form a trichotomy" (p. 225).

Thus we have clearly a case in which the conclusions were predicated on unreported operations, and therefore the papers were reviewed and published under false pretenses. Perhaps the alterations were legitimate; obviously they needed to be judged by reviewers.

What was their nature? Were the unreported alterations performed innocently, then accidentally omitted from the write-ups, and accidentally also concealed by the refusal of the authors to release their data? What do the data really show about the genetic relations of the apes?