THE SCALE PROBLEM
Top journals and conferences receive submissions far faster than human reviewers can keep up. Peer review runs on unpaid academics volunteering hours per paper — the entire system is held together by professional norms, not contracts.
WHY CITATIONS ARE THE WEAK POINT
Reviewers check arguments and methods; they rarely click every reference. A plausible-looking citation — correct journal name, plausible authors, plausible year — almost never gets verified unless the claim it supports is contested.
WHY LLMS HALLUCINATE CITATIONS
Language models generate text token by token based on statistical likelihood. A reference that looks like a real reference — author names that pattern-match the field, a journal that publishes on the topic, a plausible year and page range — is exactly what the model is optimized to produce, whether or not the paper exists.
THE RETRACTION PIPELINE
Retraction Watch and PubPeer became the de facto immune system of science over the past decade, run largely by volunteers and a handful of dogged librarians. Formal journal processes can take 18+ months from flag to retraction; the informal infrastructure flags within days.
THE INCENTIVE STRUCTURE
Academic careers run on publication count and citation count. Both metrics reward volume, neither penalizes citation quality directly. Paper mills — businesses that sell authorship slots on fabricated papers — have operated for years; LLMs simply lowered the production cost toward zero.
WHY MEDICAL JOURNALS MATTER MOST
A fabricated citation in a literature review can propagate into clinical guidelines, meta-analyses, and ultimately treatment decisions. Unlike most fields, the downstream cost of a polluted evidence base is measured in patient outcomes, not just wasted research time.