Skip to main content
Long sessions accumulate tokens. Without compression, every conversation would hit context limits and require a restart. Ultron’s compression engine prevents that — automatically, without losing the information that matters.

Three-layer cascade

Context approaching token limit

Layer 1: MicroCompact
    Scans all tool outputs in the conversation
    Removes outputs from old turns (Read, Bash, Grep, Glob, WebSearch, Scrape)
    Keeps recent outputs intact
    Replaces cleared content with: [tool output cleared — content was X chars]
    No API call required
    ↓ (if still over limit)
Layer 2: Session Memory Compact
    Uses stored memories as the compression baseline
    Preserves: minimum 10K tokens of context, minimum 5 text messages
    Targets: maximum 40K tokens in active window
    Prevents cutting tool_use / tool_result pairs (API invariant)
    No API call required
    ↓ (if still over limit)
Layer 3: API Digest
    Full LLM summary via fast model
    Strips all images (replaced with [image] placeholders)
    Re-injects 50K token budget of critical content:
        ├── Recent files (max 5 files, max 5K each)
        ├── Active skill instructions
        └── Project instructions
    One API call required

CompactBoundary

After each compression, a boundary marker is inserted into the conversation:
{
  "type": "compact_boundary",
  "compactType": "auto",
  "preCompactTokenCount": 87420,
  "preservedSegment": {
    "summaryMessageUuid": "msg_abc123",
    "preservedMessageUuids": ["msg_xyz789", "msg_def456"]
  }
}
All subsequent operations only process messages after the last boundary. This prevents re-compressing content that was already compressed in a previous pass.

What is never compressed away

  • Active tool calls and their results (current turn)
  • The last 5 messages with substantive text content
  • Memories injected at session start
  • The user’s profile context
  • Any content marked with a “must-retain” pre-compression hook

When compression triggers

Compression is fully automatic. It triggers when the active context window approaches the model’s limit. You don’t need to manage it, restart sessions, or summarize manually.
Sessions can run for hours across dozens of tool calls without hitting limits. The compression engine handles it invisibly.