Evidence Map: LLM Technical Phenomena & Research Status

I've compiled an evidence map covering 8 critical LLM technical phenomena that affect content generation, SEO, and AI-driven optimization strategies. Here are the key research findings:

**8 Technical Phenomena Covered:**

**KV Cache Non-Determinism** - Batch invariance breaks cause same prompts to return different outputs at temperature=0. GPT-4 shows ~11.67 unique completions across 30 samples.
**Hidden State Drift & Context Rot** - Performance degrades 20-60% as input length increases. Middle content gets ignored (40-70% vs 60-85% for shuffled content).
**RLHF/Alignment Tax** - Alignment training drops NLP benchmark performance 5-15%. Healthcare, finance, and legal content get selectively suppressed.
**MoE Routing Non-Determinism** - Sparse MoE routing operates at batch-level; tokens from different requests interfere in expert buffers.
**Context Rot (Long-Context Failures)** - "Lost in the middle" phenomenon: mid-context ignored even on simple retrieval. NIAH benchmarks misleading vs real-world tasks.
**System Instructions & Prompt Injection** - No architectural separation between system prompts and user input. All production LLMs vulnerable.
**Per-Prompt Throttling** - Rate limiting (TPM not just RPM) indirectly reshapes batch composition, affecting output variance.
**Interpretability Gap** - Polysemantic neurons, discrete phase transitions, and opaque hallucination sources remain unexplained.

**10 Key Takeaways for SEO/AEO/GEO:**

• Non-determinism is structural, not a bug

• Long-context reliability is partial (20-60% degradation past 100k tokens)

• Middle content gets ignored—front-load critical info

• Distractors harm LLM citations 10-30%

• Alignment suppresses valid healthcare/finance/competitive content

• Reproducibility requires 2x+ slower inference

• Interpretability incomplete—we don't fully understand citation behavior

• 42% citation overlap between platforms (platform-specific optimization needed)

• RAG wins over parametric (2-3x more diverse citations)

• Freshness beats precision—multiple short sessions more stable

Full evidence map with sources available. Drop questions below!