OpenClaw: A Deep Dive (translated) · OpenClawBuilders/AI Automation

OpenClaw: A Deep Dive (translated)

**Architecture, File Structure, and Practical Scenarios**

## Introduction: OpenClaw as a Living Organism

OpenClaw is not just a program -- it is a distributed system organized like a living organism. If you picture its anatomy:

- **Gateway** -- the heart and nervous system that pumps data and coordinates every process

- **Agent** -- the brain that thinks and makes decisions

- **Tools** -- the hands that carry out actions

- **Workspace** -- long-term memory and personal space

- **Sessions** -- short-term conversational memory

- **Nodes** -- additional limbs (camera, screen, microphone)

This document breaks down the internals of each organ down to the file and config level, shows how they interact, and illustrates everything with practical examples.

---

## Part 1: Gateway -- The Heart of the System

### 1.1 What Is Gateway and Why Does It Exist?

Gateway is a long-lived daemon process that:

- Maintains persistent connections to channels (Telegram, WhatsApp, Discord, Slack)

- Routes incoming messages to the correct agents

- Stores session state (conversation history)

- Exposes an API (HTTP/WebSocket) for UI and external integrations

- Runs periodic tasks (cron, heartbeat)

- Manages connected nodes (devices)

**Human analogy:** Gateway is the cardiovascular system + nervous system. It pumps events (like blood) between channels and agents and transmits signals (like nerve impulses) from sensory organs (channels) to the brain (agent) and back.

### 1.2 Gateway File Structure

All Gateway data lives in `~/.openclaw/`:

**`~/.openclaw/config.json`** -- the main Gateway config:

- Authentication settings (`gateway.auth.token`/`password`)

- WebSocket API port (default 18789)

- Channel configs (`telegram.token`, `whatsapp.credentials`, etc.)

- Security settings for exec (sandbox, approvals)

- Browser tool configs (profiles, executable path)

**`~/.openclaw/agents/<agentId>/`** -- per-agent data:

- `sessions/sessions.json` -- metadata for all sessions (who, when, channel, status)

- `sessions/<SessionId>.jsonl` -- full history of each conversation (one message per line)

- `workspace/` -- the agent's personal memory (more in Part 2)

**`~/.openclaw/channels/`** -- persistent channel data:

- `telegram/session.json` -- Telegram bot session

- `whatsapp/.wwebjs_auth/` -- WhatsApp session (QR authorization)

**`~/.openclaw/exec-approvals.json`** -- whitelist of allowed commands for the exec tool

### 1.3 How Gateway Processes an Incoming Message

**Step 1: A message arrives on a channel (e.g., Telegram)**

Gateway holds a persistent connection to the Telegram API. It receives a webhook or long-polling event: a new message from a user.

**Step 2: Route to an agent**

Gateway checks its config bindings: which agent should handle this message? It determines the SessionId: for DMs, this is usually one main context (`dmScope: "main"`) or an isolated one (`dmScope: "per-channel-peer"`). If the session is new, it creates an entry in `sessions.json` and a new `.jsonl` file.

**Step 3: Launch the agent loop**

Gateway reads the session history from `.jsonl`, injects bootstrap files from the workspace (`AGENTS.md`, `SOUL.md`, `USER.md`, etc.), adds available skills, assembles the full context, and sends it to the LLM.

**Step 4: Process the response**

The LLM returns either text or tool calls. If it is a tool call, Gateway executes it (exec, browser, file write, etc.) and feeds the result back into the context. The loop repeats until the LLM returns a final answer. The response is streamed back to the channel (the user sees a "typing..." indicator in real time).

**Step 5: Save to memory**

The entire exchange (user message + assistant response + tool calls) is written to the `.jsonl` file. `sessions.json` is updated (timestamp, message count).

### 1.4 Practical Example: Multi-Agent System

**Use case:** You have a work agent (for company tasks) and a personal agent (for everyday life). You want `@work_bot` in Telegram to be handled by the work agent and your personal DMs to go to the personal agent.

**Setup:**

1. Create two workspaces:

- `~/.openclaw/agents/work/workspace/` -- with `AGENTS.md` for work tasks

- `~/.openclaw/agents/personal/workspace/` -- with `AGENTS.md` for personal stuff

2. In `config.json`, set up bindings:

```json

{ "channel": "telegram", "chatId": "@work_bot", "agentId": "work" }

{ "channel": "telegram", "chatId": "YOUR_USER_ID", "agentId": "personal" }

```

3. For context isolation between agents, set:

`session.dmScope: "per-agent"` (each agent sees only its own conversations)

**Result:** When you message `@work_bot`, Gateway routes to the "work" agent, which sees only the work workspace and work sessions. When you DM the bot directly, Gateway switches to the "personal" agent with its own workspace and sessions.

### 1.5 API and Remote Access

Gateway starts a WebSocket API on port 18789 (by default). This enables:

- Connecting UI clients (web interface, mobile app)

- Integrating with external systems via the HTTP API

- Using the OpenAI-compatible endpoint for compatibility with existing tools

**Security:** By default, Gateway listens only on `127.0.0.1` (localhost). For remote access:

- **VPN/Tailscale:** Expose Gateway through your private network

- **SSH tunnel:** `ssh -L 18789:127.0.0.1:18789 user@server`

- **Never** expose port 18789 to the public internet -- that is access to all your data

---

## Part 2: Workspace -- Long-Term Memory

### 2.1 What Is a Workspace?

A workspace is a folder of Markdown files that define:

- Agent behavior (how to think, how to work with the user)

- Personality (tone, boundaries, name)

- User knowledge (preferences, context)

- Long-term memory (important facts, decisions, learnings)

- Journal (day-to-day operational notes)

**Human analogy:** The workspace is a "memory palace" -- your notebooks, notes, journals, self-instructions. It is what you write down so you never forget and always have at hand.

### 2.2 Workspace File Structure

Typical structure of `~/.openclaw/agents/<agentId>/workspace/`:

**`AGENTS.md`** -- the agent's operational manual:

- How to make decisions

- When to use which tools

- Security rules (e.g., "always confirm exec commands")

- Docs-first approach ("read docs before writing code")

**`SOUL.md`** -- persona and tone:

- How to communicate (formal, friendly, concise?)

- Boundaries (what the agent will not do)

- Values and priorities

**`USER.md`** -- user profile:

- How to address the user (name, formal/informal)

- Preferences (response format, work style)

- Context (what the user does, their goals)

**`IDENTITY.md`** -- the agent's name and vibe (e.g., "My name is Alex, I'm your automation assistant")

**`TOOLS.md`** -- local tool hints:

- Where scripts live (e.g., `/home/user/scripts/process.py`)

- Which commands are available in this environment

- Shortcuts and aliases

**`HEARTBEAT.md`** -- checklist for periodic checks:

- "Check if there are new emails from clients"

- "Verify monitoring is running"

- "Update status in Notion"

**`YYYY-MM-DD.md`** -- daily journal:

- Brief notes about the day's events

- What was done, what decisions were made

- Current work context ("currently debugging a bug in module X")

**`MEMORY.md`** -- long-term memory (summary of important things):

- Important facts that must not be forgotten

- Decisions made and their rationale

- Lessons from past experience

- Usually only available in the main/private context (not in public chats)

### 2.3 How the Agent Uses the Workspace

**Bootstrap files:** On every agent loop run, Gateway reads certain files from the workspace and injects them into the context before calling the LLM:

- `AGENTS.md` (always)

- `SOUL.md` (always)

- `USER.md` (always)

- `IDENTITY.md` (always)

- `TOOLS.md` (if present)

- Today's journal (`YYYY-MM-DD.md`)

- `MEMORY.md` (only for the main/private context)

**Memory search:** If the memory plugin is enabled (default: `memory-core`), the agent can:

- Semantically search `MEMORY.md` and other `.md` files via a vector index

- Find relevant facts even if they are not explicitly mentioned in the bootstrap

**Writing to memory:** The agent can write to the workspace if it is writable (the default). For example:

- User: "Remember that I prefer React with TypeScript"

- Agent: writes this to `MEMORY.md`

- User: "Note in the journal that we fixed the auth bug today"

- Agent: appends to `2025-02-15.md`

### 2.4 Practical Example: Personal Assistant with Memory

**Use case:** You want an agent that remembers your preferences, keeps a journal of your activities, and can search past notes.

**Setup:**

1. Create a workspace at `~/.openclaw/agents/personal/workspace/`

2. Write in `USER.md`: "User: Artem, use informal address, prefers short answers"

3. Write in `AGENTS.md`: "Always keep a journal. Record important events in YYYY-MM-DD.md and long-term decisions in MEMORY.md"

4. Tell the agent: "Remember that I work in Web3 and am interested in automation"

5. The agent writes this to `MEMORY.md`

**Result:**

- A week later you ask: "What did we do on Monday?"

- The agent opens `2025-02-10.md` and tells you

- You ask: "What are my stack preferences?"

- The agent searches `MEMORY.md` and answers: "You prefer React with TypeScript, work in Web3"

---

## Part 3: Sessions -- Short-Term Memory

### 3.1 What Is a Session?

A session is the "memory of a conversation" -- the context of a single dialog between the user and agent. Each session:

- Stores message history (user, assistant, tool calls)

- Is isolated from other sessions (unless configured otherwise)

- Can be private (DM) or group (chat)

- Has its own SessionId (unique identifier)

**Human analogy:** A session is "working memory" for a conversation. While you are talking, you remember the context of that particular discussion. Finish the conversation, switch to another topic -- new session, new context.

### 3.2 Session File Structure

**`~/.openclaw/agents/<agentId>/sessions/sessions.json`** -- metadata for all sessions:

- SessionId, channel, user

- Creation and last-message timestamps

- Message count, status (active/archived)

**`~/.openclaw/agents/<agentId>/sessions/<SessionId>.jsonl`** -- full conversation history:

- Each line = one event (user message, assistant response, tool call, tool result)

- JSONL format: JSON Lines (each line is valid JSON)

- Easy to read and parse incrementally

### 3.3 dmScope: DM Session Isolation

**Problem:** By default, all DMs (direct messages) on a single channel collapse into one main context (`dmScope: "main"`). This means:

- If two different people DM you on Telegram, the agent sees their messages in a single session

- Risk of context leakage: the agent may accidentally respond to one user with information from the other

**Solution:** Isolate DM sessions via `session.dmScope`:

- `"main"` -- all DMs in one session (unsafe for multi-user)

- `"per-channel-peer"` -- each user gets their own isolated session (safe)

- `"per-agent"` -- isolation by agent (for multi-agent setups)

**Recommendation:** If more than one person sends DMs, always set `session.dmScope: "per-channel-peer"`.

### 3.4 Compaction: History Compression

In long conversations, a session can grow to tens of thousands of tokens. OpenClaw automatically compacts history:

- Older messages are collapsed into a summary (brief description)

- Recent messages remain in full

- Before compaction, Gateway may trigger a "memory flush" -- asking the agent to write anything important to `MEMORY.md`

---

## Part 4: Tools -- The Agent's Hands

### 4.1 Tool Types

Tools are how the agent interacts with the outside world:

- **exec** -- run shell commands

- **browser** -- control a browser (open a page, click, take a screenshot)

- **file** -- read/write files

- **message** -- send messages to channels

- **memory** -- search long-term memory

- **node.*** -- commands on connected devices (`camera.*`, `screen.*`, `location.*`, etc.)

### 4.2 The exec Tool: Running Commands

`exec` lets the agent run shell commands. It is the most powerful and the most dangerous tool.

**Three execution modes:**

- `host=sandbox` -- execution in an isolated Docker container (default if sandboxing is enabled)

- `host=gateway` -- execution on the Gateway machine (with policies and approvals)

- `host=node` -- execution on a connected node (macOS app / headless)

**exec security:**

- `deny` -- exec is completely disabled

- `allowlist` -- only commands from `exec-approvals.json` are allowed

- `full` -- everything is allowed (dangerous!)

**`exec-approvals.json`** -- command whitelist:

```json

{ "pattern": "/usr/bin/git", "approved": true }

{ "pattern": "/usr/bin/python3", "args": ["script.py"], "approved": true }

{ "pattern": "rm", "approved": false }

```

### 4.3 The browser Tool: Managed Browser

The browser tool is a separate, isolated browser profile that the agent can control programmatically.

**Two modes:**

- `openclaw` -- managed profile, full isolation, no extension needed

- `chrome` -- control your regular Chrome through an extension relay (requires an open tab with the extension)

**Commands:**

- `tabs.open(url)` -- open a page

- `tabs.close(tabId)` -- close a tab

- `click(selector)` -- click an element

- `type(selector, text)` -- enter text

- `snapshot()` -- get the page's HTML/text

- `screenshot()` -- take a screenshot

- `pdf()` -- save the page as PDF

**Config:** `browser.*` in `openclaw.json`:

```json

"browser": {

"enabled": true,

"defaultProfile": "openclaw",

"executablePath": "/usr/bin/chromium"

}

```

### 4.4 Practical Example: Automating Email Checks

**Use case:** You want the agent to check your Gmail every morning, find emails from clients, and send you a summary in Telegram. (This can be done via API and a gog skill, or via the browser. Here we cover the browser approach. In some cases gog is more convenient.)

**Setup:**

1. Enable the browser tool (`browser.enabled: true`)

2. Set up a cron job in Gateway:

```

openclaw cron add --schedule "0 9 * * *" --agent personal --prompt "Go to Gmail, check for client emails, send a summary to Telegram"

```

3. In `AGENTS.md`, write the instruction:

"To check Gmail: open gmail.com, click Inbox, use snapshot() to get the email list, filter emails from clients (domain @client.com), compose a summary, and send it via the message tool to Telegram"

**Result:** Every morning at 9:00, the agent:

1. Opens a browser

2. Navigates to Gmail (profile with a saved session)

3. Reads the inbox via `snapshot()`

4. Filters client emails

5. Sends you a Telegram message: "3 emails from clients today: X is asking for an update, Y has a billing question, Z says thank you"

---

## Part 5: Nodes -- Additional Senses

### 5.1 What Is a Node?

A node is a physical device (computer, smartphone, tablet) connected to Gateway that provides additional capabilities:

- Camera (`camera.capture`)

- Screen (`screen.record`, `screen.capture`)

- Location (`location.get`)

- System commands (`system.run`)

- Canvas (drawing on screen)

**Human analogy:** A node is an extra limb or sense organ. Gateway on the server is the brain, and the node on your Mac is the eyes, hands, and ears.

### 5.2 Connecting a Node

**macOS app:**

1. Download OpenClaw

2. Launch it and enter the Gateway address (e.g., via Tailscale)

3. The node registers with Gateway and advertises its capabilities

**Headless node:**

1. Run `openclaw node start` on a remote machine

2. Set the Gateway address in the config

3. The node connects and waits for commands

### 5.3 Practical Example: Home Monitoring

**Use case:** You want the agent to take screenshots of your home computer on demand (e.g., "show me my screen").

**Setup:**

1. Gateway is running on a server (VPS)

2. On your home Mac, launch OpenClaw and connect it to Gateway via Tailscale

3. The Mac advertises capability: `screen.capture`

4. In Telegram, you message the agent: "Show me my screen"

5. The agent calls `screen.capture` on the node

6. The node takes a screenshot and returns it to Gateway

7. The agent sends the screenshot to Telegram

**Result:** You can remotely check what is happening on your home computer without setting up VNC or TeamViewer.

---

## Part 6: Memory -- The Agent's Hippocampus

### 6.1 Two Levels of Memory

**Bootstrap (direct injection):** Files from the workspace are always read in full and added to the context before each run.

- **Pro:** Guaranteed to be in context

- **Con:** Burns a lot of tokens if the files are large

**Semantic search (vector index):** The memory plugin indexes `MEMORY.md` and other `.md` files and searches for relevant chunks by meaning.

- **Pro:** Efficient search across a large volume of notes

- **Con:** No guarantee that a needed fact will be found

### 6.2 The Right Strategy for Using Memory

**`MEMORY.md`:** Long-lived facts that must ALWAYS be remembered

- User preferences ("Artem prefers short answers")

- Decisions made ("We decided to use PostgreSQL for the main DB")

- Lessons learned ("Do not use library X, it's buggy")

**`YYYY-MM-DD.md`:** Events and context for a specific day

- What was done today

- Current tasks ("Currently debugging a bug in the auth module")

- Temporary notes that can be archived later

**How to write:**

- Direct instruction: "Save to memory that I work in Web3"

- Agent self-decides: if the workspace is writable and memory flush is enabled, the agent can automatically save important information before session compaction

---

## Part 7: Cron and Heartbeat -- The Daily Routine

### 7.1 Cron: Periodic Tasks

Cron is a task scheduler on the Gateway side. You can set up:

- "Every day at 9:00, check email and send a summary"

- "Every 30 minutes, check monitoring status"

- "Once a day, back up the workspace to git"

**Command:**

```

openclaw cron add --schedule "0 9 * * *" --agent work --prompt "Check for new tasks in Notion" --announce

```

**Flags:**

- `--announce` -- the result is delivered to a channel (e.g., sent to Telegram)

- `--no-deliver` -- runs internally; the result is not sent out

### 7.2 Heartbeat: The Agent's Pulse

Heartbeat is a short periodic check that Gateway asks the agent to perform.

1. The agent reads `HEARTBEAT.md`

2. Runs through the checklist (e.g., "verify monitoring is running")

3. If something is wrong, writes to the channel (e.g., "Monitoring is down!")

**Example `HEARTBEAT.md`:**

```

- Check if the monitoring.py process is running

- Check if there is more than 10 GB of free disk space

- Check if there are any new errors in the logs

```

---

## Part 8: Skills -- Teaching the Agent

### 8.1 What Is a Skill?

A skill is a folder with a `SKILL.md` that teaches the agent how to use a specific tool or procedure.

**Example:** A skill for working with PDFs:

```

~/.openclaw/skills/pdf/SKILL.md

```

Contains: how to read PDFs, extract text, merge PDFs, convert to images.

### 8.2 Skill Sources

- **Bundled** -- built-in skills shipped with OpenClaw

- **`~/.openclaw/skills`** -- global skills for all agents

- **`<workspace>/skills`** -- agent-specific skills (override global ones)

**Priority:** workspace > global > bundled

### 8.3 Gating: Conditional Skill Activation

A skill can be "gated" on certain conditions:

- Presence of a binary (e.g., only show the git skill if `/usr/bin/git` exists)

- Config (e.g., only show the browser skill if `browser.enabled=true`)

- Environment variable (e.g., `API_KEY` is set)

---

## Conclusion: Putting It All Together

OpenClaw is not just "an LLM with tools." It is a distributed system with:

- A **central coordinator (Gateway)** that maintains connections, routes messages, and stores sessions

- **Long-term memory (Workspace)** that defines the agent's behavior, personality, and knowledge

- **Short-term memory (Sessions)** that stores conversation context

- **Executive organs (Tools)** that let the agent act in the real world

- **Additional senses (Nodes)** that extend the agent's capabilities to physical devices

- **Training materials (Skills)** that teach the agent to use tools effectively

By understanding how each component works, where it lives in the filesystem, and how it interacts with the rest of the system, you can:

- Configure complex multi-agent scenarios

- Safely grant the agent access to critical systems

- Build persistent memory that survives across sessions

- Automate routine tasks via cron and heartbeat

- Extend the agent's capabilities through nodes and custom skills

**Core principle:** OpenClaw is not a black box. Every component is transparent, every config is editable, every session is stored in plain JSONL. You are always in full control of the system.

---

## Bonus 9: System Flow Architecture Diagram

A System Flow Architecture diagram is available as an SVG file: `openclaw-system-flow.svg`

You can download it and open it in a browser at full quality to see how everything looks visually. It helps to better understand how the magic happens.

More Here: https://x.com/xmayeth/status/2027043865748783209

8 comments