Peer Review: An Iron Law of Disciplines?
Thomas J. Scheff
Advancement of knowledge is closely tied to the effectiveness of the peer review system (PRS). This system serves as gatekeeper to publication and funding, the twin bases of science and scholarship. It must expeditiously discriminate between likely contributions to knowledge, false positives (such as cold fusion), and false negatives (novel contributions that challenge the knowledge status quo).
Many instances of both kinds of errors are reviewed in Broad and Wade (1982) and reported in Gilbert and Mulkay (1984). This essay will focus on false negatives. How are we to understand their occurrence, and is there any way journals could decrease their number?
The broadest study of false negatives is that of Barber (1990), who compiled a list of advances (Helmoltz, Planck, Maxwell, Lister, and others) that were delayed or rejected for many years, usually for reasons that had virtually nothing to do with the real issues. A particularly egregious example is the case of Mendel, whose studies of genetic inheritance were rejected on the grounds that they were too mathematical.
Recent studies of the rates of occurrence of false negatives by journals have been reported by Campanario (1993; 1995; 1996). One of these studies (1996), can serve to illustrate his approach and findings. He collected commentaries by 205 authors for the most cited innovative articles in the journal in which they were published. Twenty-two commentaries (10.7 %) mention difficulties or resistance in carrying out or publishing their article.
Social scientists have long believed in an "iron law of oligarchy," a tendency in complex organizations for the leaders to have interests in the organization itself rather than in its official aims (Michels 1915). Michels and subsequent social scientists, however, refer to putative conflict of interest between leaders and those led in political and labor organizations. A somewhat different law may apply to scientific and scholarly disciplines: a convergence of the interests of leaders and followers in promoting the discipline establishment itself, in addition to, and/or instead of its official aims.
What might be called the iron law of disciplines is suggested by the work of Ludwik Fleck (1935), a distinguished microbiologist and philosopher of science. He hypothesized a tendency toward extreme conservatism in science establishments. That is, there may be as much or more interest in preserving the knowledge status quo, as in advancing knowledge. Fleckís work seems to have been the basic source of Kuhnís (1962) ideas about change and rigidity in disciplinary paradigms.
Kuhnís idea of "paradigm hold," particularly, restates Fleckís hypothesis that "thought collectives" (disciplines, sub-disciplines, and schools of thought) become fixated on one theory, method, and/or body of beliefs, at the expense of novel challenges. A dramatic illustration is the case of George Cantor, the creator of set theory. During his lifetime, his work was banned from publication in Germany by other German mathematicians, who saw to it that he received no job offers, or even advancement in the small rural university (Halle) where he taught.
Cantorís opponents were led by Kronecker, Professor of Mathematics at the University of Berlin. Having read some of Cantorís papers on set theory, Kronecker told Cantor that he wasnít sure what set theory was, it might be philosophy or theology, but he was certain of what is was not: it was not mathematics.
Boltzmannís brilliant theory of heat transport in metals met a similar reception at the hands of contemporary physicists: it didnít fit their paradigm. As Fleck pointed out, the rejection of material outside the paradigm is highly emotional, so intense as to be a taboo.
Being a social psychologist, I am familiar with this response. The most frequent reason give by sociology editors for rejected my papers has been that they are not sociological. I have had identical papers rejected by psychology journals on the grounds that they are too sociological. The most recent case, finally published in a psychiatry journal, was rejected by two sociology journals even though it concerns social components in depression (Scheff 2001). The taboo in sociology against psychology results in rejection of work that links the two disciplines.
The core taboo in psychology, however, is methodological: against all methods other than experiments. As Feynman (1985) pointed out, the findings from most psychological experiments are essentially meaningless: they have the form of science, but lack its spirit. The taboos in sociology and in psychology support Fleckís hypothesis.
It may be important to examine the way in which the hypothesis functions in fine detail. So far I have found two current cases that are fully documented. I would like to hear about other cases as well, whether they support or contradict the hypothesis (see below).
The first case, a dispute over von Frischís "discovery" of a bee language, is summarized by Wenner in Case 1 in the Appendix. The second, summarized by Schonemann in Case 2, involves the controversy over Jensenís "discovery" of innate racial differences in intelligence. After 30 years, science journals are still rejecting what appear to be valid challenges to von Frischís findings. In the case of Jensenís work, it appears that steps toward resolution may be occurring, but only after 12 years of rejection of valid challenges. The PRS in these two cases seems to have made many erroneous decisions, and has certainly not been expeditious.
These two cases are still unresolved but also suggest glimmers of hope. In the Schonemann case, the possibility of resolution may be in the offing. Although the relevant scientific journals are still rejecting what appear to be valid challenges in the Wenner case, his use of the Internet to crack the wall of silence is another glimmer.
The historical evidence suggests that there is a status quo in science and scholarship that leads to a significant number of false negatives; rejection of novel contributions. If this is the case, what could contemporary journals do to reduce the number of false negatives?
This is a difficult issue. If there is an iron law, it would be difficult to overcome, like lifting oneself by oneís own bootstraps. My suggestion is for the purpose of discussion only. Perhaps others can come up with better ideas than this one.
I propose that journals establish two levels of reviewers. The initial level would be composed of workers in "normal" science and scholarship, that is, those whose work falls almost entirely within current paradigms. The second level would be reserved for reviewers whose work falls, at least in part, "outside the box."
All papers would go initially to the first level of reviewers. In addition to giving their narrative review and recommendation, they would also rate the degree of difference from current paradigms of the problem, method, and results. Those papers that fell completely within current paradigms would be rated 0, those that were somewhat different, 1, and those that were very different 2. The most extreme rating would be for those papers completely different from the current paradigm in problem, method, and/or results would be rated "OTB" (outside the box).
Only those papers that received an OTB, or a high score (4?) would be sent to the second level of reviewers. In this way papers that differ markedly from normal science and scholarship would get a second chance. This method would be crude and also time-consuming, but it might lead to more rapid advances of knowledge.
Barber, Bernard. 1990 . Sociological Studies of Science. New Brunswick, NJ: Transaction Press
Broad, William , and Nicholas Wade. 1982 . Betrayers of the Truth . New York : Simon and Schuster.
Campanario, Juan Miguel. 1993. Consolation for the Scientist: Sometimes its Hard to Publish Paper that are Later Highly-Cited. Social Studies of Science. 23, 342-62.
_____________ 1996. Have Referees Rejected Some of the Most Cited Articles of All Times? Journal of the American Society for Information Science. 47, 302-310.
Feynman, Richard. 1985. Cargo Cult Science. Pp. 308-317, in Surely Youíre Joking, Mr. Feynman!. New York: Bantam.
Fleck, Ludwik. 1935. Genesis and development of a scientific fact, in Ludwik Fleck edited by Thaddeus J. Trenn and Robert K. Merton, foreword by Thomas S. Kuhn. Chicago : University of Chicago Press, (1979).
Gilbert, G. Nigel and Michael Mulkay. 1984. Opening Pandora's box : a sociological analysis of scientists' discourse. Cambridge: Cambridge University Press.
Kuhn, Thomas S. 1962. The structure of scientific revolutions. 3rd ed. Chicago, IL : University of Chicago Press (1996).
Michels, Robert. 1915. Political parties; a sociological study of the oligarchical tendencies of modern democracy. Glencoe, Ill., Free Press,
Scheff, Thomas. 2001. Social Components in Depression. Psychiatry. 64, # 3, 212-224.
Schonemann, Peter H. 2001. Better Never Than Late: Peer Review and the Preservation of Prejudice. Ethical Human Sciences and Services. 3: 1, 7-21.
Wenner, A.M. 1998. Honey bee "dance language" controversy. Pages 823-836 in Greenberg, C. and M. Hara, (eds.), Comparative Psychology: A Handbook . Garland Publishing, New York
Wenner, Adrian, and Patrick Wells . 1990 The Anatomy of a Controversy. New York: Columbia University Press.
Appendix. Case 1. Adrian Wenner (Professor Emeritus, Natural History, UCSB).
After WWII, bee researcher Karl von Frisch concluded that a "dance" maneuver executed by foraging honey bees constituted a "language." He argued that other bees attending the dance could gain information to fly directly to the food source. This hypothesis was endorsed by noted scientists. Von Frisch was later awarded a Nobel prize in 1973 for that and other work. Frequent use of the terms "proof" and "discovery" elevated his hypothesis to the status of "fact" - despite the lack of other than circumstantial evidence.
While conducting research on bee communication in the 1950s and 60s, Wenner and co-workers found flaws in both experimental design and interpretation in previous experiments , while executing experiments of more rigorous design. They found that all earlier results in support of bee language could be explained by the odor-search hypothesis that von Frisch had first published in 1937 but had not referred to thereafter. That is, recruited bees could have simply searched for odors of the food (as von Frisch and others had documented in the late 1930s and early 1940s) and not needed other information obtained from the dance (without odors, recruits cannot find the target food).
The PRS permitted publication of their results for a short time but then broke down completely. Others could challenge the work of Wenner and co-workers with or without supportive evidence, but they were no longer permitted by journals even to respond to challenges of their work. Bee researchers stopped citing their work and dropped all mention of their negative results for the next two decades.
Wenner spent that time in other research and returned only when the climate had improved --- the Internet provided an open forum. As a result, bee trade journals now welcome his articles. Insect journal editors, by contrast, still give considerable weight to negative comments from language advocates who serve as peer reviewers. Under the PRS, granting agencies have provided millions to researchers who have been unable to rescue the language hypothesis. Those same agencies, under the PRS, have provided no funds to those who would conduct experiments that might yield results counter to the language hypothesis. The bee language controversy thus persists. ( Wenner, A.M. 1998 ; Wenner and Wells 1990.)
Case 2. Peter Schonemann (Professor Emeritus, Psychology, Purdue)
His recent article documents some difficulties authors face when they challenge faulty research claims published in mainstream literature. Editors of "reputable journals" may react with stonewalling tactics that tend to enshrine those faulty results. A case in point is the mental test literature, which has long been beset with racist myths. In 1985, Arthur Jensen added a new myth, his "Spearman Hypothesis," which asserts that a positive correlation between White/Black differences in scores on mental tests and the loadings of the first principal component confirms the existence of a general intelligence factor ("g") that is innate. It can be shown by mathematical and geometric deduction, by computer simulation, and by reference to "real data," including Jensen's own, that the assertion is unwarranted, and that the relationship Jensen observed is an artifact that has nothing to do with ethnicity or "g." Nevertheless, it proved impossible for more than 12 years to record this challenge to Jensen's claims in any of the leading journals in psychology and statistics. Typically, their editors invoked arguments having nothing to do with the fundamental question of whether Jensen's claims are true or false... (Schonemann 2001)
2113 words May 27, 2002