KIMI 2 - Drop · AI Automation Society

KIMI 2 - Drop

Here’s the summary of the technical details about Kimi K2 from Moonshot AI

📐 Architecture & Parameters

Mixture-of-Experts (MoE) architecture with 1 trillion total parameters, 32 billion active per token.
Features 384 experts, 8 active per token + 1 general expert.
61 layers and 64 attention heads.
Uses SwiGLU activation functions.

🎯 Muon Optimizer / MuonClip

Employs Muon optimizer with MuonClip and “qk‑clip” to stabilize large-scale attention.
Trained on 15.5 trillion tokens, achieving 2× efficiency and 50% less memory usage.

🧠 Context Window

📊 Performance & Benchmarks

🤖 Agentic Capabilities

⚙️ Deployment & APIs

Available as Kimi‑K2‑Base (pre-trained) and Kimi‑K2‑Instruct (fine-tuned for chat/general use).
Context limits:
Estimated speed: ~200 tokens per second (Groq Cloud).
Accessible via Groq Cloud, SiliconFlow, OpenRouter, Together AI.

💾 Licensing & Open Source

Open weights and code under modified MIT license (requires citation “Kimi K2” for commercial use).
Can run locally (~1 TB weights) or in the cloud.

🧩 Advanced Features

Mooncake service: disaggregated KV‑Cache architecture for high throughput on long contexts.
Compatible with VS Code, Cline, Roo Code.
No native multimodal support (no Kimi‑VL or Kimi‑Audio yet).

5 comments