Compression Engine

Three-layer cascade

Context approaching token limit
    ↓
Layer 1: MicroCompact
    Scans all tool outputs in the conversation
    Removes outputs from old turns (Read, Bash, Grep, Glob, WebSearch, Scrape)
    Keeps recent outputs intact
    Replaces cleared content with: [tool output cleared — content was X chars]
    No API call required
    ↓ (if still over limit)
Layer 2: Session Memory Compact
    Uses stored memories as the compression baseline
    Preserves: minimum 10K tokens of context, minimum 5 text messages
    Targets: maximum 40K tokens in active window
    Prevents cutting tool_use / tool_result pairs (API invariant)
    No API call required
    ↓ (if still over limit)
Layer 3: API Digest
    Full LLM summary via fast model
    Strips all images (replaced with [image] placeholders)
    Re-injects 50K token budget of critical content:
        ├── Recent files (max 5 files, max 5K each)
        ├── Active skill instructions
        └── Project instructions
    One API call required

CompactBoundary

After each compression, a boundary marker is inserted into the conversation:

{
  "type": "compact_boundary",
  "compactType": "auto",
  "preCompactTokenCount": 87420,
  "preservedSegment": {
    "summaryMessageUuid": "msg_abc123",
    "preservedMessageUuids": ["msg_xyz789", "msg_def456"]
  }
}

All subsequent operations only process messages after the last boundary. This prevents re-compressing content that was already compressed in a previous pass.

When compression triggers

Compression is fully automatic. It triggers when the active context window approaches the model’s limit. You don’t need to manage it, restart sessions, or summarize manually.

Sessions can run for hours across dozens of tool calls without hitting limits. The compression engine handles it invisibly.

Documentation Index

​Three-layer cascade

​CompactBoundary

​What is never compressed away

​When compression triggers

Three-layer cascade

CompactBoundary

What is never compressed away

When compression triggers