Backpressure & Flow Control

Audience: Internal β€” this page is for the Rulecatch team, not customer-facing documentation.

The AI-Pooler implements smart throttling to handle API unavailability, rate limits, and server load gracefully.


Overview

Before sending events, the flush script:

  1. Checks local state β€” Is a backoff timer active?
  2. Asks the server β€” How much can I send?
  3. Sends in batches β€” Respecting server-recommended sizes and delays
  4. Records results β€” Updates backoff state on success or failure

Backpressure State

The state is persisted in ~/.claude/rulecatch/.backpressure-state:

Field Type Description
backoffLevel number Current backoff level (0-10)
nextAttemptAfter number Timestamp when next attempt is allowed
lastCapacity object Last server capacity response
consecutiveFailures number Consecutive failure count
lastSuccessTime number Last successful flush timestamp
pendingEventCount number Events waiting in buffer

Exponential Backoff

When a flush fails, the backoff delay increases exponentially:

Level Delay After
0 0s No failures
1 2s 1st failure
2 4s 2nd failure
3 8s 3rd failure
4 16s 4th failure
5 32s 5th failure
6 64s 6th failure
7 128s 7th failure
8 256s 8th failure
9 300s (5 min) 9th failure
10 300s (5 min) 10th+ failure

Configuration

Setting Value
Base delay 1,000ms
Max delay 300,000ms (5 minutes)
Multiplier 2x per level
Max backoff level 10

Circuit Breaker

After 10 consecutive failures, the circuit breaker opens:

  • All flush attempts are blocked until the nextAttemptAfter timer expires
  • This prevents hammering a downed server
  • Events continue to accumulate in the buffer safely
  • The circuit closes when the timer expires and the next attempt succeeds

Server Capacity

Before flushing, the pooler asks the server how much it can handle:

POST /api/v1/ai/pooler/capacity

The server responds with:

Field Description
ready Whether server can accept events
maxBatchSize Maximum events per batch
delayBetweenBatches Milliseconds to wait between batches
retryAfter Seconds to wait if not ready
loadPercent Server load (0-100%)
message Optional status message

Server Load Levels

Load % Server Status Recommended Batch Delay
0-39% Normal 100 100ms
40-59% Moderate 50 1,000ms
60-79% High 20 2,000ms
80-94% Very High 10 5,000ms
95-100% Overloaded 0 (not ready) retryAfter

Flush Flow

1. Load backpressure state from disk
2. Can we attempt? (check backoff timer + circuit breaker)
   β†’ No: log reason, save state, exit
   β†’ Yes: continue

3. Ask server for capacity
   β†’ Not ready: set nextAttemptAfter, save state, exit
   β†’ Ready: get maxBatchSize + delay

4. While events remain:
   a. Take batch of min(maxBatchSize, remaining)
   b. Send batch to API
   c. Success?
      β†’ Yes: record success, reduce backoff
      β†’ No: record failure, increase backoff, stop

   d. More events?
      β†’ Wait delayBetweenBatches
      β†’ Every 100 events: re-check capacity

5. Save final state to disk

Recovery Behavior

On Success

  • backoffLevel decreases by 1 (gradual recovery)
  • consecutiveFailures resets to 0
  • nextAttemptAfter cleared (can flush immediately)

On Failure

  • backoffLevel increases by 1 (capped at 10)
  • consecutiveFailures incremented
  • nextAttemptAfter set based on calculated delay

Rate Limited (429)

  • Double the normal backoff delay
  • Parse Retry-After header if present

Server Overloaded (503)

  • Double the normal backoff delay
  • Parse Retry-After header if present

Monitoring

Check backpressure status:

npx @rulecatch/ai-pooler backpressure
Backpressure Status

Status:           Backing Off

Failures:         3 consecutive
Backoff level:    3/10
Next attempt in:  6s
Last success:     45s ago

Pending events:   12

Last Server Response:
  Ready:          Yes
  Max batch:      50
  Delay between:  500ms
  Server load:    35%

Reset

To clear backpressure state (after fixing the underlying issue):

npx @rulecatch/ai-pooler backpressure --reset=true

Buffer Safety

During backoff, events continue to accumulate in the buffer directory. They are never lost:

  • Buffer files are only deleted after successful API acknowledgment
  • The buffer can grow indefinitely (limited only by disk space)
  • When connectivity is restored, all buffered events are gradually drained

See Also