Skip to content

The GESA Loop

8 Steps. One Cycle. Continuous Improvement.

The GESA loop is the operational core of the framework. Every optimization cycle runs through all eight steps — from observing the current state to cooling the temperature for the next cycle.


The Loop Diagram

┌─────────────────────────────────────────────────────────┐
│                      GESA LOOP                          │
│                                                         │
│  1. OBSERVE    Current DRIFT + Fetch + 3D scores        │
│       ↓                                                 │
│  2. RETRIEVE   Similar episodes from memory             │
│       ↓                                                 │
│  3. GENERATE   Candidate strategies                     │
│       ↓                                                 │
│  4. ANNEAL     Filter by temperature schedule           │
│       ↓                                                 │
│  5. SELECT     Best candidate given temperature         │
│       ↓                                                 │
│  6. ACT        Execute selected strategy                │
│       ↓                                                 │
│  7. STORE      New episode with outcome                 │
│       ↓                                                 │
│  8. COOL       Advance temperature schedule             │
│       └──────────────────────────────────────────────┐  │
│                                                      ↓  │
│                                              Back to 1  │
└─────────────────────────────────────────────────────────┘

Step 1: OBSERVE

Read current system state.

Inputs:

  • DRIFT gap and sign (positive/negative/zero)
  • Fetch score and decision threshold
  • Chirp, Perch, Wake scores
  • Active domain and dimension
  • Gap velocity (how DRIFT has moved across recent episodes)

This step is passive — it reads, does not modify. The observe step establishes the context fingerprint that will drive retrieval.


Step 2: RETRIEVE

Search episode memory for situations with similar context fingerprint.

Similarity is measured across five dimensions:

Similarity(e₁, e₂) =
    0.30 × DomainMatch
  + 0.25 × DriftProximity
  + 0.20 × DimensionMatch
  + 0.15 × TemperatureProximity
  + 0.10 × OutcomePolarity

DomainMatch — Are the episodes from the same domain (Workplace, Content, Trading, etc.)?

DriftProximity — How close is the DRIFT magnitude? A gap of 42 is more similar to a gap of 38 than to a gap of 5.

DimensionMatch — In 6D context: does the origin dimension match? D6 Operational episodes are more relevant to a D6 Operational situation.

TemperatureProximity — Episodes captured at similar annealing temperatures are more relevant. A decision made at T = 0.8 (highly exploratory) is less relevant when current temperature is T = 0.15 (exploiting).

OutcomePolarity — Failed episodes constrain generation; successful episodes inform it. Both are retrieved.

Retrieval also applies temporal decay (EpisodeWeight = BaseWeight × e^(-age/τ)) — recent episodes carry more weight.


Step 3: GENERATE

Synthesise candidate strategies from retrieved episodes + current context.

Generation is not lookup. The generator produces variations and combinations, not exact replays of past episodes. A candidate strategy may combine patterns from three different historical episodes in a way that never occurred before.

Inputs to the generator:

  • Retrieved episode set
  • Current context fingerprint (from OBSERVE)
  • Current temperature (from COOL)

Output: A set of CandidateStrategy[] objects, each with a proposed action, confidence score, episodic support count, and reasoning trace.

At high temperature: the generator is given latitude to include novel, less-proven candidates. At low temperature: the generator is constrained to historically-validated strategies.


Step 4: ANNEAL

Apply the temperature filter.

The candidates generated in step 3 are filtered based on current temperature:

TemperatureFilter Applied
T > 70All candidates included — full exploration
40 < T ≤ 70Candidates with episodicSupport ≥ 1 included
20 < T ≤ 40Candidates with episodicSupport ≥ 3 included
T ≤ 20Only candidates with episodicSupport ≥ 5 and confidence ≥ 60

This is the mechanism that makes GESA conservative over time without being permanently conservative from the start. Early in the system's life, weak candidates are considered. As the episode store grows and temperature drops, only proven strategies survive the filter.


Step 5: SELECT

Score remaining candidates. Pick the highest within temperature constraints.

Scoring formula:

Score = confidence × episodicSupport × gapVelocityMultiplier

Where:
  gapVelocityMultiplier = 1.0 + (gapVelocity × 0.1)

Gap velocity adjustment: if the gap is widening (positive velocity), higher-confidence strategies are weighted more heavily — the situation is worsening, act decisively. If the gap is closing, lower-confidence exploratory strategies can be tolerated.

The highest-scoring candidate that passed the ANNEAL filter is selected.


Step 6: ACT

Return or execute the selected strategy.

Depending on deployment context:

  • Recommendation mode — Return the strategy with full GESARecommendation object for human decision
  • Automatic mode — Execute directly if Fetch score exceeds Execute threshold (>1000)
  • Hybrid mode — Auto-execute for high-confidence low-risk strategies; recommend for others

The selected strategy always carries its full provenance: which episodes supported it, what temperature it was generated under, what alternatives were considered.


Step 7: STORE

After outcome is observable, write the new episode to memory.

This step has two phases:

Phase A (at time of action): Write the episode with context, action, and initial state. Mark outcome as pending.

Phase B (after outcome observed): Update the episode with driftAfter, gapChange, success, and timeToResolve.

The gap between Phase A and Phase B is domain-dependent:

  • Browser automation: seconds to minutes
  • Content strategy: days to weeks
  • Workplace interventions: sprints to months

Incomplete episodes (Phase A only) are excluded from retrieval until Phase B completes.


Step 8: COOL

Advance the temperature schedule.

T_new = T_current × α

Where α is the cooling rate from the active temperature profile.

Temperature only cools when a new episode is stored (step 7 completes). No episode = no cooling. This prevents artificial temperature decay during inactive periods.

The Adaptive Cool profile adjusts α dynamically based on recent episode outcome variance:

  • High outcome variance (unpredictable results) → slow the cooling
  • Low outcome variance (consistent results) → accelerate the cooling

Loop Invariants

Three invariants must hold at every cycle:

  1. Every ACT produces an episode. No action without capture.
  2. Every episode is immutable. No retroactive modification of what happened.
  3. Temperature only decreases. The system earns conservatism through experience; it cannot reset to exploration without explicit intervention.

Violation of invariant 1 or 2 breaks the learning guarantee. Violation of invariant 3 means the system could oscillate between exploration and exploitation, never converging.


→ Temperature Profiles