The GESA Loop
8 Steps. One Cycle. Continuous Improvement.
The GESA loop is the operational core of the framework. Every optimization cycle runs through all eight steps — from observing the current state to cooling the temperature for the next cycle.
The Loop Diagram
┌─────────────────────────────────────────────────────────┐
│ GESA LOOP │
│ │
│ 1. OBSERVE Current DRIFT + Fetch + 3D scores │
│ ↓ │
│ 2. RETRIEVE Similar episodes from memory │
│ ↓ │
│ 3. GENERATE Candidate strategies │
│ ↓ │
│ 4. ANNEAL Filter by temperature schedule │
│ ↓ │
│ 5. SELECT Best candidate given temperature │
│ ↓ │
│ 6. ACT Execute selected strategy │
│ ↓ │
│ 7. STORE New episode with outcome │
│ ↓ │
│ 8. COOL Advance temperature schedule │
│ └──────────────────────────────────────────────┐ │
│ ↓ │
│ Back to 1 │
└─────────────────────────────────────────────────────────┘Step 1: OBSERVE
Read current system state.
Inputs:
- DRIFT gap and sign (positive/negative/zero)
- Fetch score and decision threshold
- Chirp, Perch, Wake scores
- Active domain and dimension
- Gap velocity (how DRIFT has moved across recent episodes)
This step is passive — it reads, does not modify. The observe step establishes the context fingerprint that will drive retrieval.
Step 2: RETRIEVE
Search episode memory for situations with similar context fingerprint.
Similarity is measured across five dimensions:
Similarity(e₁, e₂) =
0.30 × DomainMatch
+ 0.25 × DriftProximity
+ 0.20 × DimensionMatch
+ 0.15 × TemperatureProximity
+ 0.10 × OutcomePolarityDomainMatch — Are the episodes from the same domain (Workplace, Content, Trading, etc.)?
DriftProximity — How close is the DRIFT magnitude? A gap of 42 is more similar to a gap of 38 than to a gap of 5.
DimensionMatch — In 6D context: does the origin dimension match? D6 Operational episodes are more relevant to a D6 Operational situation.
TemperatureProximity — Episodes captured at similar annealing temperatures are more relevant. A decision made at T = 0.8 (highly exploratory) is less relevant when current temperature is T = 0.15 (exploiting).
OutcomePolarity — Failed episodes constrain generation; successful episodes inform it. Both are retrieved.
Retrieval also applies temporal decay (EpisodeWeight = BaseWeight × e^(-age/τ)) — recent episodes carry more weight.
Step 3: GENERATE
Synthesise candidate strategies from retrieved episodes + current context.
Generation is not lookup. The generator produces variations and combinations, not exact replays of past episodes. A candidate strategy may combine patterns from three different historical episodes in a way that never occurred before.
Inputs to the generator:
- Retrieved episode set
- Current context fingerprint (from OBSERVE)
- Current temperature (from COOL)
Output: A set of CandidateStrategy[] objects, each with a proposed action, confidence score, episodic support count, and reasoning trace.
At high temperature: the generator is given latitude to include novel, less-proven candidates. At low temperature: the generator is constrained to historically-validated strategies.
Step 4: ANNEAL
Apply the temperature filter.
The candidates generated in step 3 are filtered based on current temperature:
| Temperature | Filter Applied |
|---|---|
| T > 70 | All candidates included — full exploration |
| 40 < T ≤ 70 | Candidates with episodicSupport ≥ 1 included |
| 20 < T ≤ 40 | Candidates with episodicSupport ≥ 3 included |
| T ≤ 20 | Only candidates with episodicSupport ≥ 5 and confidence ≥ 60 |
This is the mechanism that makes GESA conservative over time without being permanently conservative from the start. Early in the system's life, weak candidates are considered. As the episode store grows and temperature drops, only proven strategies survive the filter.
Step 5: SELECT
Score remaining candidates. Pick the highest within temperature constraints.
Scoring formula:
Score = confidence × episodicSupport × gapVelocityMultiplier
Where:
gapVelocityMultiplier = 1.0 + (gapVelocity × 0.1)Gap velocity adjustment: if the gap is widening (positive velocity), higher-confidence strategies are weighted more heavily — the situation is worsening, act decisively. If the gap is closing, lower-confidence exploratory strategies can be tolerated.
The highest-scoring candidate that passed the ANNEAL filter is selected.
Step 6: ACT
Return or execute the selected strategy.
Depending on deployment context:
- Recommendation mode — Return the strategy with full GESARecommendation object for human decision
- Automatic mode — Execute directly if Fetch score exceeds Execute threshold (>1000)
- Hybrid mode — Auto-execute for high-confidence low-risk strategies; recommend for others
The selected strategy always carries its full provenance: which episodes supported it, what temperature it was generated under, what alternatives were considered.
Step 7: STORE
After outcome is observable, write the new episode to memory.
This step has two phases:
Phase A (at time of action): Write the episode with context, action, and initial state. Mark outcome as pending.
Phase B (after outcome observed): Update the episode with driftAfter, gapChange, success, and timeToResolve.
The gap between Phase A and Phase B is domain-dependent:
- Browser automation: seconds to minutes
- Content strategy: days to weeks
- Workplace interventions: sprints to months
Incomplete episodes (Phase A only) are excluded from retrieval until Phase B completes.
Step 8: COOL
Advance the temperature schedule.
T_new = T_current × αWhere α is the cooling rate from the active temperature profile.
Temperature only cools when a new episode is stored (step 7 completes). No episode = no cooling. This prevents artificial temperature decay during inactive periods.
The Adaptive Cool profile adjusts α dynamically based on recent episode outcome variance:
- High outcome variance (unpredictable results) → slow the cooling
- Low outcome variance (consistent results) → accelerate the cooling
Loop Invariants
Three invariants must hold at every cycle:
- Every ACT produces an episode. No action without capture.
- Every episode is immutable. No retroactive modification of what happened.
- Temperature only decreases. The system earns conservatism through experience; it cannot reset to exploration without explicit intervention.
Violation of invariant 1 or 2 breaks the learning guarantee. Violation of invariant 3 means the system could oscillate between exploration and exploitation, never converging.