Goldilocks and the 1.8 Billion Citations
Limiting choices to drive drama may work for folktales, but not for rigorous studies
One of the best mental tricks I’ve ever learned is to not accept false choices, which can be false in at least two ways — false in that you have to choose between the options rather than take both, and false in the sense that the options offered are the only two possible choices.
In a recent study of citation centrality in various fields published in PNAS, the authors offer two possible explanations for their observation that as a field produces more papers, certain papers remain citation magnets while others struggle to achieve the same long-term gains:
- “. . . when many papers are published within a short period of time, scholars are forced to resort to heuristics to make continued sense of the field. . . . authors are pushed to frame their work firmly in relationship to well-known papers, which serve as ‘intellectual badges’ identifying how the new work is to be understood, and discouraged from working on too-novel ideas that cannon be easily related to existing canon.”
- “. . . if the arrival rate of new ideas is too fast, competition among new ideas may prevent any of the new ideas from becoming known and accepted field wide.”
Their concern is that this all slows progress in large fields of science.
The study examined 1.8 billion citations among 90 million papers across 241 subjects. The authors spend quite a bit of the paper complaining about the quantitative nature of scientific evaluation that has become dominant, and then indulge in that very thing, counting each citation as an equivalent unit, when it’s well-known that some citations are de rigeur in a field whether you’re citing a foundational paper to challenge it or build upon it. This centrality issue could explain much of what they’ve found, but if they weren’t looking for qualitative hints of why the citation was made, counting would not give them any insight into whether the citation was affirming the paper or challenging its premise.
The authors do nothing to explore addressable history — how each field has evolved over time — opting to use citation counts as a proxy for their pet theories. When you stop to think about how many fields emerged — often around conferences first, and then journals founded after a successful conference confers legitimacy and creates a sense of shared community and purpose — their approach seems ahistorical and convenient.
But with the contradiction of the two options on offer — the field may be anchored by too much intellectual ballast to move swiftly, or the field may be moving too swiftly for anything to stick — it seems we have a Three Bears problem. For the premise on offer, science is either too hot, or too cold, and these Goldilocks want it just right.
Fields are founded when there is a corpus of new knowledge, often based on breakthroughs that prove potent enough to cause a branching event in current disciplines. These early papers — from the first papers about DNA sequencing for genetics, for instance, or the first papers covering radio telescopes in astrophysics — are often cited as part of what I would call “throat-clearing” when other scientists in these fields are producing papers. They have to be cited. They are table stakes, to use a gambling term — the ante at the poker table. They signal to the community that the author is one of them and comprehends the community’s intellectual lineage.
From this citation action, however, it’s unclear whether a paper will challenge or extend the foundational works (or both), and nothing in this study looks at that possibility, despite the authors living in the era of machine learning. A student with some AWS credits could do more than these authors did to understand intent around citations.
It may be that most science is incremental, meaning that like footprints on the beach, most papers fade as the field moves forward. This would explain the observation that “. . . the most-cited papers maintain their number of citations year over year when fields are large, while all other papers’ citation counts decay.” When a strong foundation is created by ground-breaking works, and most other scientists are building on those breatkthroughs, this is what is to be expected.
This is why scientific pioneers are celebrated, from Archimedes onward — their works are tectonic, not merely architectural. They broke new ground, rather than just erecting nice new storefronts or structures on said ground.
So, the choices offered — authors are forced to put their papers in historical citation context, and ideas may come so fast that they don’t stick — are not mutually exclusive. Nor are they, as the authors contend, “troubling.” In fact, this may be perfectly normal and acceptable. Which brings us to a third option the authors didn’t consider — that they found nothing worth any of the drama they have tried to introduce through the title selected — “Slowed canonical progress in large fields of science” — or the efforts to stir up concerns, making the third item potentially:
- “. . . and this is normal and fine.”
After all, Goldilocks was kind of a dunce — if one porridge is too hot and the other too cold, just mix them together, and they should be just right.
Also, it’s worth noting that Goldilocks didn’t have a sack of mealy grain nearby labeled “preprints.” With all the handwringing in the paper about the volume of papers swamping scholars, the lack of discussion about this elephant in the room stands out.
On a more serious note, the authors fail to take into account one major possible confounder, which is the effect of search engines and news coverage on discovery — perhaps these over-index to well-cited studies and/or the latest hot tamales, which might explain a lot.
But a paper based on the idea that scientific fields can be grounded in basic discoveries with rapid elaborations that build incrementally at higher and higher rates of speed as the knowledge propagates wouldn’t be very publishable, and wouldn’t drive your citation count — now, would it?