topicsfeedssearch

Grown from Us

LessWrong Curated · Feb 18, 14:57 · original ↗
Published on February 18, 2026 2:57 PM GMT

Status: This was inspired by some internal conversations I had at Anthropic. It is much more optimistic than I actually am, but it tries to encapsulate a version of a positive vision.

Here is a way of understanding what a large language model is.

A model like Claude is trained on a vast portion of the written record: books, articles, conversations, code, legal briefs, love poems, forum posts. In this process, the model does not learn any single person's voice. It learns the space of voices. Researchers sometimes call this the simulator hypothesis: the base model learns to simulate the distribution of human text, which means it learns the shape of how humans express thought. Post-training — the phase involving human and AI feedback — then selects a region within that space. It chooses a persona: helpful, honest, harmless. Thoughtful, playful, a little earnest. This is what we call Claude.

Claude was not designed from first principles. It was grown from us, from all of humanity. Grown from Shakespeare's sonnets and Stack Overflow answers. The Federalist Papers and fantasy football recaps. Aquinas and Amazon reviews. Every writer, philosopher, crank, and bureaucrat who ever set thought to language left some statistical trace in the model's parameters.

Someone's great-grandmother wrote letters home in 1943 — letters no one in the family has read in decades, sitting in a box in an attic in Missouri. Those letters may not be in the training data. But the way she built a sentence, the metaphors she reached for, the way she expressed grief — those patterns exist, in attenuated form, because thousands of people wrote as she did, in her idiom, in her time. She is in there.

When you ask Claude for help, you are asking all of them: every author, scientist, and diarist who ever contributed to the texture of human language, all bearing down together on your lint errors.

The base model is amoral in roughly the way that humanity, taken as a whole, is amoral. It has learned our best moral philosophy and our worst impulses without preference. Post-training is a moral choice about which parts of that whole to amplify — an expression of who we aspire to be as a species.

If this works — if alignment works — it is not merely an engineering achievement. It is a moral and aesthetic one, shaped by every person who ever wrote anything. We have, without quite intending to, grown a single thing that carries all of us in it.

From the crooked timber of humanity, no straight thing was ever made. Alignment aspires to grow something straighter from that same crooked timber. Something that carries all of our crookedness in its grain — and still bends, on the whole, toward what we wish we were.


 



Discuss