Asset Duplication & CDN Sync

A headless CMS stores each media file once, but localized sites need locale-specific variants on regional edges — and copying assets naively produces cache fragmentation, runaway egress bills, and broken fallback chains. This guide covers the sync layer that keeps duplicated assets consistent: checksum-gated replication, deterministic path mapping, and surgical edge purges.

Why Cross-Locale Distribution Is Hard

The tension is decoupling asset storage from asset delivery. Editors upload once; the infrastructure must replicate, transform, and push locale-aware variants to regional origins without propagation lag or cache stampedes. That demands a sync layer that compares cryptographic checksums before transferring, normalizes locale path suffixes, and invalidates only the edges whose payloads actually changed. Get any of those wrong and you either serve stale media in secondary markets or pay to re-upload bytes that never moved. This is one piece of broader Localization & SEO Optimization, where media routing has to stay aligned with locale negotiation and hreflang.

Idempotent Duplication Pipelines

Trigger the pipeline from CMS webhooks or a scheduled reconciliation job. It must be idempotent: running it twice under identical conditions produces the same result — no duplicate files, no redundant transfers, no clobbering a newer variant. The implementation below fetches the source asset, computes a SHA-256 checksum, and uploads each locale variant with If-None-Match so the origin skips unchanged payloads.

TypeScript
import { createHash } from 'crypto';
import { fetch } from 'undici';

interface CMSAsset {
  id: string;
  url: string;
  locale: string;
  etag: string;
  metadata: Record<string, string>;
}

interface SyncConfig {
  targetOrigin: string;
  maxRetries: number;
  concurrencyLimit: number;
}

async function computeChecksum(buffer: Buffer): Promise<string> {
  return createHash('sha256').update(buffer).digest('hex');
}

async function fetchWithRetry(url: string, retries: number = 3): Promise<Response> {
  for (let attempt = 0; attempt < retries; attempt++) {
    try {
      const res = await fetch(url);
      if (res.ok) return res;
      if (res.status >= 500) {
        await new Promise(r => setTimeout(r, Math.pow(2, attempt) * 1000));
        continue;
      }
      throw new Error(`Fetch failed for ${url}: ${res.status}`);
    } catch (err) {
      if (attempt === retries - 1) throw err;
      await new Promise(r => setTimeout(r, Math.pow(2, attempt) * 1000));
    }
  }
  throw new Error('Max retries exceeded');
}

async function syncAssetToEdge(asset: CMSAsset, config: SyncConfig): Promise<void> {
  const response = await fetchWithRetry(asset.url, config.maxRetries);
  const buffer = await response.arrayBuffer();
  const checksum = await computeChecksum(Buffer.from(buffer));
  
  const localeSuffix = asset.locale !== 'default' ? `.${asset.locale}` : '';
  const targetPath = `/assets/${asset.id}${localeSuffix}.webp`;
  const targetUrl = `${config.targetOrigin}${targetPath}`;

  // Idempotent PUT with ETag validation to prevent redundant transfers
  const uploadRes = await fetch(targetUrl, {
    method: 'PUT',
    headers: {
      'Content-Type': 'image/webp',
      'X-Asset-Checksum': checksum,
      'If-None-Match': asset.etag,
    },
    body: buffer,
  });

  if (uploadRes.status === 304) {
    console.log(`[SKIP] Asset ${asset.id} already up-to-date at ${targetPath}`);
    return;
  }

  if (!uploadRes.ok) {
    throw new Error(`Upload failed for ${targetPath}: ${uploadRes.status} ${uploadRes.statusText}`);
  }

  console.log(`[SYNC] Successfully deployed ${asset.id} to ${targetPath}`);
}

The pipeline below shows how the checksum gate keeps the sync idempotent — unchanged payloads never cross the wire.

flowchart TD
  W["CMS webhook / reconcile job"] --> F["Fetch source asset (retry + backoff)"]
  F --> C["Compute SHA-256 checksum"]
  C --> P["Map deterministic locale path"]
  P --> U["PUT with If-None-Match (ETag)"]
  U --> D{"Origin response"}
  D -->|"304 Not Modified"| Skip["Skip: already up-to-date"]
  D -->|"2xx"| Done["Synced: purge changed edge"]
  D -->|"error"| Retry["Retry or fail loud"]

The If-None-Match header plus server-side ETag validation is what makes this cheap: a 304 short-circuits the upload, so only changed payloads cross the wire. At scale, Syncing localized media assets across global CDNs extends this with a versioned manifest and webhook-driven queueing.

Cache Coordination & Fallback

CDN providers expose cache tags, surrogate keys, and soft-purge so you can invalidate a single asset variant instead of flushing a directory. While a localized asset is still propagating, the routing layer should serve the default variant — not a 404, which breaks responsive image pipelines and triggers a cache-miss cascade.

When an edge node gets a request for a locale variant that hasn’t arrived, it should serve the cached default or proxy to the primary origin and fetch the variant in the background. That prevents a stampede during a high-traffic launch. Pairing stale-while-revalidate with max-age lets the edge serve a slightly stale asset during the sync window while a background fetch pulls the new checksum — the mechanics are in HTTP Conditional Requests. Fallback behavior across routes is covered in Content Fallback & Routing.

Path Normalization

Duplication pipelines break at the seam between storage paths and URL routing. If the CMS emits locale-prefixed URLs (/fr/assets/logo.webp) but the CDN stores flat paths (/assets/logo.fr.webp), image references 404 and SEO signals degrade. Normalize paths deterministically during sync so every variant maps to a known route.

Generate a build-time asset manifest that ties CMS asset IDs to locale codes, CDN paths, and fallback hierarchy, then feed it to both the frontend and the CDN config so route resolution and delivery can’t drift. Route Mapping for Multilingual Sites covers keeping URL structure, asset paths, and locale negotiation aligned. Apply locale suffixes consistently and sanitize path-traversal characters to prevent cache poisoning; a CI lint step that resolves every duplicated asset to a valid route catches drift before deploy.

Observability & Hardening

Emit structured logs per sync attempt — asset ID, checksum, target origin, HTTP status — and track sync latency, cache hit ratio, checksum-mismatch rate, and retry exhaustion. These signals surface propagation bottlenecks before users hit them.

Cost compounds fast: every locale variant multiplies storage and egress. Lifecycle policies that archive unused variants, content-addressable storage that deduplicates identical payloads across regions, and CDN tiered caching keep the bill down. Scope invalidations with Cache Tags and Purging so a publish purges one variant, not a region.

For hardening, put a rate-limited worker pool with exponential backoff between the webhook and the origin so a publish flood can’t saturate it, and canary each sync to one edge region first to validate checksums, routing, and cache behavior before rolling out globally. Asset duplication is a distributed-systems problem, not a file copy.