Offline Support in Web Apps: Foreground Queue for Offline Mutations

This is the seventh post in the series about offline support in web applications and the fourth one focused specifically on the foreground queue. In the previous article, we exposed queue state, how UI components subscribe to it, and how to wire everything into React without turning the queue into yet another state store.

So far, the queue processes each mutation exactly once. If it succeeds, we’re done. If it fails, we mark it as failed and move on. That’s a reasonable baseline, but in real-world offline scenarios it breaks down quickly.

In this post, I want to focus on error handling and retry strategies. In particular, how to handle transient failures automatically, without pushing that complexity or frustration onto the user.

Why a single attempt isn’t enough

In the implementation so far, the foreground queue attempts to process each item exactly once. If it succeeds, great. If it fails, we mark it as failed and move on. That’s a reasonable starting point, but in practice it’s not enough.

The main issue is that failures during sync can be transient — the network might briefly drop, the backend could be restarting, or unavailable for a few seconds. None of these mean the mutation itself is invalid — they just mean now is a bad time to process it.

When everything happens online, this is usually fine. If a form submission fails, the user sees an error and can retry immediately. The context is still fresh, the UI is still on screen, and retrying is typically one click away. Placing that burden on the user is acceptable, and in many cases expected.

Offline workflows are very different.

An offline mutation may have been recorded minutes or hours ago. By the time the app comes back online and starts syncing, the user may be doing something entirely unrelated. Asking them to manually retry a background failure at that point is both disruptive and unrealistic. From the user’s perspective, the action already happened. They filled in the form, pressed submit, and moved on. Any failure during sync is an implementation detail they shouldn’t have to care about unless it truly cannot be resolved.

This is where retries become essential. We can smooth over temporary issues without involving the user at all by adding automated retries. Only after multiple attempts should we consider the mutation genuinely failed and surface that state to the user. To put it in one sentence: assume failures are temporary until proven otherwise.

With all of this in mind, let’s zoom in and talk about retry strategies.

Exponential backoff for foreground retries

There are different strategies you can choose from. Linear retries, fixed delays, adaptive algorithms, circuit breakers — it’s a deep topic. I won’t be diving into all of them here. If you want a broader overview, Sam Who wrote a great article that’s well worth reading.

For foreground queues in offline-capable apps, I find exponential backoff hits a good balance between effectiveness and simplicity.

What exponential backoff actually means

Exponential backoff is a retry strategy where the delay between retry attempts increases exponentially after each failure, usually by doubling, until it reaches a maximum cap. Instead of retrying immediately or waiting a fixed amount of time, each failed attempt waits longer than the previous one.

A typical sequence might look like this:

1st retry after 500ms
2nd retry after 1s
3rd retry after 2s
4th retry after 4s
…until a retry limit is reached

This simple change in timing has surprisingly large effects on system behaviour:

Reduced load on struggling services. When many clients fail at once, immediate retries can create a thundering herd. Backoff spreads retries out over time instead of amplifying the problem.
Better odds of success for errors. Short outages often resolve themselves quickly. Waiting a bit before retrying dramatically increases the chance that the next attempt succeeds.

Finally, a few guardrails are essential to keep retries from turning into hidden problems.

Cap the delay. Without a cap, delays can grow so large that retries become effectively useless or block important resources for too long.
Limit the number of retries. If something has persistently failed six times, continuing might no longer make sense.
Add jitter. If many clients retry with the same timing, they can still synchronise. Adding a small random offset (jitter) to each delay helps spread retries out and avoids retry spikes.

This usually translates into fewer visible errors and less aggressive retry noise while the user is waiting.

With the strategy chosen, the remaining work is the implementation: capture retry state, compute delays, and teach the sync loop how to wait. Everything that follows assumes retries are safe — meaning the mutation is idempotent or can tolerate last-write-wins semantics. I’ll come back to cases where this isn’t true at the end.

Retry configuration

We'll start by adding configuration for the retries for our queue. The shape below captures all the controls needed to implement the exponential backoff strategy.

export interface BackoffConfig {
  baseDelay?: number; // Default: 500 ms
  maxDelay?: number; // Default: 30 s 
  maxRetries?: number; // Default: 5
  jitter?: boolean; // Default: true
}

Implementing backoff logic

Let’s start with drafting a simple module specifically for backoff. We’ll need a couple of utility functions to help us track and manage items in backoff. I’ll briefly explain what each function does without going into the implementation details, as they are mostly straightforward.

// Checks if an item should be retried based on attempt count.
export function shouldRetry(
  attemptCount: number,
  config: BackoffConfig = {}
): boolean {}

// Calculates the remaining delay until an item is ready for retry.
// Accounts for elapsed time since last attempt.
export function getRetryDelay(
  attemptCount: number,
  lastAttemptAt: number | null,
  config: BackoffConfig = {}
): number {}

These two functions will be needed to implement the retries in out startSync function in the Queue class. I won't include the entire function implementation, but I'll add the important parts here (I'll indicate unchanged sections with comments).

export class Queue<T> {
  async startSync(): Promise<SyncResult<T>> {
    // Block unchanged compared to the previous posts...

    // Track items already reported as failures...
    const failureIds = new Set<string>();

    try {
      // Single ongoing processing loop (now with retries)
      while (true) {
        // Block unchanged compared to the previous posts...

        // Keep track of items that are to be removed...
        const itemsToRemove = new Set<string>();
        // Keep track of items that are to be updated (we need to update attempt info)...
        const itemsToUpdate = new Map<string, QueueItem<T>>();
        // Keep track of items in backoff...
        const itemsInBackoff: Array<{ item: QueueItem<T>; delay: number }> = [];

        for (const item of items) {
          // Block unchanged compared to the previous posts...

          // Check if we can still retry...
          if (!shouldRetry(item.attemptCount, this.config.backoff)) {
            // Check if we haven't reported this item yet...
            if (!failureIds.has(item.id)) {
              failureIds.add(item.id);
              result.failure.push(item);
            }
            continue;
          }

          // Caclucate the backoff delay for an item and the current attempt...
          const delay = getRetryDelay(
            item.attemptCount,
            item.lastAttemptAt,
            this.config.backoff
          );

          // Check if we still need to wait...
          if (delay > 0) {
            itemsInBackoff.push({ item, delay });
            continue;
          }

          // Otherwise, the item is ready to be processed...
          try {
            // Block unchanged compared to the previous posts...
          } catch (error) {
            // The item processing has failed...
            // We no longer mark it as failure, but update the information,
            // so that we can continue retries in the next loop...
            itemsToUpdate.set(item.id, {
              ...item,
              attemptCount: item.attemptCount + 1,
              lastAttemptAt: Date.now(),
            });
          }
        }

        // All items have been processed...
        await this.withLock(async () => {
          const currentItems = await this.loadItems();

          const updatedItems = currentItems
            // Remove the items marked to be removed (success)...
            .filter((item) => !itemsToRemove.has(item.id))
            // Update the items in backoff...
            .map((item) => itemsToUpdate.get(item.id) ?? item);

          await this.saveItems(updatedItems);
        });

        // Check if there are items in backoff...
        if (itemsInBackoff.length > 0) {
          // Find the shortest delay...
          const minDelay = Math.min(...itemsInBackoff.map((i) => i.delay));
          // ...and wait
          await this.sleep(minDelay);
          continue;
        }

        // No items processed successfully or failed...
        // There's nothing else to process...
        if (itemsToRemove.size === 0 && itemsToUpdate.size === 0) {
          break;
        }
      }
    } finally {
      // Block unchanged compared to the previous posts...
    }

    return result;
  }
}

Here’s what we’ve effectively implemented in the sync loop:

We run an infinite loop with an inner loop over all queued items. This is where the “loop-in-a-loop” structure becomes useful. Previously, we processed each item exactly once. Now, we may process the same item multiple times — up to the maximum number of attempts allowed by the backoff configuration. The outer loop only exits when these conditions is met—no items were processed in a full pass and no items are currently waiting in backoff. At that point, there’s nothing left to do and the sync is finished.
Before processing an item, we first check whether it’s still eligible for retries. This is where we respect the backoff configuration and the current attempt count. If the item has already exhausted all allowed attempts, we mark it as failed and exclude it from further processing. Importantly, we ensure each failed item is reported exactly once, even if the loop continues for other items.
We compute the retry delay for the item. This determines whether enough time has passed since the last attempt, based on the current attempt count and the item’s last processing timestamp. If the item isn’t ready yet, we don’t try to process it. Instead, we add it to a list of items that are currently in backoff and move on to the next item.
Once we’ve looped over all items, we check whether any are waiting in backoff. If so, we find the smallest remaining delay across them and wait for exactly that duration (we’ll expand on this idea in the next section). This ensures the sync loop wakes up only when the next item becomes eligible for processing, avoiding unnecessary polling or busy waiting.

There's one more interesting aspect to discuss: how waiting for items in backoff interacts with sync pausing.

Making sure syncing can still be paused

You might remember that our queue supports pausing via the pauseSync method. Before retries were introduced, this worked well enough. Processing was always asynchronous, but calling pauseSync would take effect immediately after the current attempt finished.

Retries change that behaviour.

With exponential backoff in place, the syncing function may now sit idle for several seconds, waiting for the next item to become ready for processing. If we do nothing, the effects of calling pauseSync would only become visible after that wait finishes. That’s not great — pausing should be observable immediately, even if the sync loop is currently waiting.

The root problem here isn’t retries themselves, but time-based waiting. Once we start awaiting delays, those waits need to be cancelable. That’s why we’ve introduced an interruptible sleep function.

Here’s a simplified version of how this can be implemented.

export class Queue<T> {
  // Calling results in interrupting sleep function...
  private interruptSleep: (() => void) | null = null;

  private async sleep(ms: number): Promise<void> {
    return new Promise<void>((resolve) => {
      // When `interruptSleep`, the promise resolved immediately...
      this.interruptSleep = resolve;
      setTimeout(() => {
        // Otherwise, it resolves when the timeout function is called...
        this.interruptSleep = null;
        resolve();
      }, ms);
    });
  }

  pauseSync(): void {
    this.isPaused = true;
    // We interrupt sleep when syncing is paused...
    this.interruptSleep?.();
    this.interruptSleep = null;
  }
}

Conceptually, this turns time-based waiting into a cancelable operation.

When the sync loop is waiting for the next retry window, calling pauseSync now immediately resolves the pending sleep. This allows the loop to observe the paused state right away instead of being stuck waiting for a timeout to expire. This implementation can still leave a pending timeout behind, which isn’t ideal but acceptable for a simplified, illustrative example.

When retries get complicated

I want to close with a few points about edge cases and broader system considerations. Our offline todo status example has several properties that make retries and backoff much simpler than they might be in other scenarios.

Most importantly, we can accept last-write-wins semantics. That immediately removes the need for explicit conflict resolution. On top of that, the status update itself is idempotent — calling the mutation twice has the same effect as calling it once. Given a single client (or last-write-wins), retries are safe by default.

These properties are convenient, but they don’t generalize to all systems. In more complex setups, there are a few additional factors you’ll likely need to account for. Most of them require explicit back-end support.

Request idempotency. For non-idempotent operations (for example, “charge a credit card” or “append to a log”), retries require extra safeguards. Common approaches include transaction tokens, client-generated IDs, or server-side deduplication.
Non-retryable errors. Not all failures should be retried. Client-side errors (such as certain 4xx responses) often indicate permanent failure. Your sync logic should be able to recognize these cases and stop retrying early.
Conflict resolution. Once multiple clients or concurrent updates are involved, conflict resolution becomes unavoidable. A simple approach is to refetch the latest state before mutating, which can work but is still prone to race conditions. More robust solutions require explicit conflict resolution on the back end, often combined with sending the initial state from the client so the server can detect and resolve conflicts deterministically.

The right approach to retries ultimately depends on the type of mutation. If you can accept last-write-wins, retries relatively simple — no special handling is required around the edges. If you can’t, retries quickly turn into a coordination problem that spans both the client and the back end.

Summary

That’s it for today. We’ve covered why retries are a core requirement for offline-first syncing, not just a nice-to-have, and how to go about implementing them. Here are the key takeaways:

Submission failures can be temporary. Network hiccups and brief backend outages shouldn’t surface as user-facing errors. Automated retries let the system absorb these issues quietly.
Exponential backoff is a good default. It’s simple to implement, reduces load on struggling services, and significantly improves success rates compared to immediate or fixed retries.
Guardrails matter. Capping delays, limiting retries, and adding jitter prevent retries from turning into hidden performance or reliability problems.
Retries change control flow. Once you introduce waiting, we must handle time explicitly.
Retry safety depends on semantics. Idempotent, last-write-wins mutations make retries easy. Non-idempotent operations, permanent errors, and conflicts require explicit backend support and more careful coordination.

In the next article, I’ll shift gears and look at data prefetching, and how proactive data access fits into an offline-capable architecture.

If you enjoyed the article or have a question, feel free to reach out on Bluesky! 👋

Future reading and references

Photo by Maria on U nsplash
Retr ies — An interactive study of common retry methods by Sam Rose

Offline Support in Web Apps: Foreground Queue for Offline Mutations — Part 4