WHY MODELS MEMORIZE
Large language models are trained to predict the next token across billions of documents. Strings that appear even a few times — phone numbers, addresses, API keys — can be encoded directly into the weights as high-probability completions for nearby context. Memorization is not a bug; it is the same mechanism as learning.
THE NO-RECALL PROBLEM
Once a number is baked into the weights, it cannot be surgically removed. Weights are billions of floating-point numbers entangled across every concept the model knows. There is no index from 'this fact' to 'these parameters' — deletion would require retraining from a scrubbed corpus, which costs tens of millions of dollars per run.
WHY HALLUCINATION FEELS REAL
When asked for a business's phone number, the model retrieves the most statistically plausible 10-digit string that co-occurs with that business name in training data. If a real person's number appeared near the business name once — in a scraped forum, an old directory — that number becomes the confident answer. The model has no concept of 'mine' versus 'someone else's'.
THE GDPR COLLISION
EU privacy law grants a 'right to erasure' — individuals can demand their personal data be deleted from any system that holds it. LLMs were not designed with this right in mind. Regulators in Italy, France, and Germany have opened cases asking whether training on personal data without a deletion mechanism is lawful at all.
THE SCRAPE-FIRST PRECEDENT
The major foundation models — Gemini, GPT, Claude, Llama — were trained on web crawls that swept up phone numbers, home addresses, and private emails posted anywhere public. Consent was never asked. The industry's working assumption: if it was reachable by a crawler, it was fair game. This assumption is now being tested in courts on three continents.
THE STRUCTURAL FIX
Two approaches are being researched: differential privacy during training (adding noise so no single example can be reconstructed) and machine unlearning (algorithms that approximate the effect of removing data without full retraining). Neither is production-ready at frontier-model scale. For now, the harm runs faster than the remedy.