How recall works
Side-query runs
Before each agent turn, a separate fast-model side-query runs against the memory store. It receives the current task objective and searches for relevant entries.
Candidates scored
Memory entries are ranked by semantic relevance to the current objective. Recency is a secondary factor — fresh entries score slightly higher when relevance is equal.
Deduplication applied
Entries that were already surfaced in recent turns are skipped. This prevents the same memory from appearing repeatedly and wasting context.
Why the cap is 5
Intuition says more context is better. In practice, it isn’t. With more than 5 memories injected, two things happen:- Irrelevant entries dilute the signal of relevant ones
- Token budget fills up faster, triggering compression sooner
Memory drift defense
Before any memory entry is injected, Ultron validates it:- File paths mentioned — do they still exist?
- Functions referenced — are they still in the codebase?
- External references — are they still current?