Claude AI is a large language model built by Anthropic. Here’s how it actually works under the hood: 1. It’s a transformer neural network Claude is based on the transformer architecture - the same core design behind GPT, Gemini, and most modern AIs. - It takes text as input, breaks it into tokens/words, and processes them all at once using “attention” mechanisms. - Attention lets it weigh which words matter most for understanding context. That’s why it can keep track of a long conversation better than older models. 2. Training happens in 2 stages 1. Pretraining: Claude reads massive amounts of text from books, websites, code, etc. It learns to predict the next token in a sequence. This teaches it grammar, facts, reasoning patterns, and how language works. No one manually teaches it “what is a cell tower” - it figures it out from pattern 2. Post-training: Anthropic uses Constitutional AI and human feedback to make Claude safer, more helpful, and less likely to hallucinate or be evasive. “Constitutional AI” means they give it a set of principles and have it critique and revise its own outputs to follow those principles. 3. Inference - how it answers you When you send a message: - Your text gets tokenized and fed into the model. - The model runs a forward pass through billions of parameters and calculates the probability of what word should come next. - It generates one token at a time until it hits a stop condition. That’s why responses feel like they’re “typed out” word by word. 4. Key differences in Claude’s design Anthropic focuses heavily on: - Constitutional AI: Instead of relying only on human labelers saying “good/bad response”, Claude critiques itself against a written constitution of rules like “be harmless, honest, helpful”. - Long context: Claude models handle 100k-200k tokens of context, so they can read entire books or long codebases in one go. - Reduced sycophancy: They train it to admit uncertainty and avoid agreeing just to please you.