Data retention policies for headless CMS media
A retention policy for headless media has to delete the binary, not just the content reference to it. The CMS removes a reference; the file lives on in object storage and the CDN edge, often still publicly reachable past its regulatory retention window. This page covers how to enforce TTLs, purge orphaned binaries, and sync CMS metadata with storage lifecycle rules without cache stampedes or broken builds.
Why retention logic fails
Headless stacks separate content delivery from binary storage, and that gap is where retention breaks. When the CMS drops a media reference over GraphQL or REST, the underlying file persists — the API has no native hook into the storage provider. Draft states, localized variants, and webhook-driven image transforms create phantom references that standard garbage collection misses.
The CDN compounds it: edge nodes keep stale binaries until an explicit purge, so an asset stays publicly accessible after the retention window expires even though it’s logically deleted. And because the build resolves asset URLs before the storage lifecycle completes, deleted media reappears in static exports or ISR caches — a race condition. Without explicit state tracking, retention stays reactive instead of automated.
Resolution
- Audit referential integrity. Query the CMS for assets with zero active references across every locale, environment, and content type using
usage,references, orlinkedBymetadata endpoints. Cross-check candidates against frontend routing tables for hardcoded paths. - Track soft-delete state. Add a
retention_statusenum to the asset schema and move assets throughactive→marked_for_deletion→purged. The middle state gives editors a grace period to restore and prevents races during webhook execution. - Map retention windows to storage lifecycle rules. Use prefix tagging (
cms-asset-id/{id}/) or custom metadata headers so expiration rules don’t touch shared CDN paths or public buckets. See S3 lifecycle configuration to align triggers with your compliance calendar. - Gate deletions on a reference counter. A serverless listener intercepts
asset.deleteandasset.updateevents and checks a Redis-backed counter before deleting. The counter decrements only on published reference removal, ignoring draft and preview states. - Purge by content hash, not path. Issue targeted CDN purges keyed on the asset’s immutable hash, and update ISR tags or Vercel/Netlify cache headers so stale media can’t hydrate mid-window. See Next.js caching and revalidation patterns.
Webhook handler
The handler runs a guarded pipeline: every gate must pass before the binary is removed and the edge purged.
flowchart TD
A["asset.delete / marked_for_deletion"] --> B{"Zero references?<br/>published + draft"}
B -->|No| D1["Defer: active references"]
B -->|Yes| C{"Redis counter == 0?"}
C -->|No| D2["Defer: counter > 0"]
C -->|Yes| E["Delete object from S3"]
E --> F["Purge CDN by content hash"]
F --> G["Update CMS: retention_status = purged"]
G --> H["Done"]
This handler verifies zero references across published and draft states, decrements the distributed counter, deletes from storage, then purges the CDN.
import { Request, Response } from 'express';
import { S3Client, DeleteObjectCommand } from '@aws-sdk/client-s3';
import { Redis } from 'ioredis';
// Configuration (inject via environment variables in production)
const CMS_GRAPHQL_ENDPOINT = process.env.CMS_GRAPHQL_URL!;
const CMS_API_TOKEN = process.env.CMS_API_TOKEN!;
const S3_BUCKET = process.env.S3_BUCKET!;
const CDN_PURGE_ENDPOINT = process.env.CDN_PURGE_URL!;
const CDN_API_KEY = process.env.CDN_API_KEY!;
const s3 = new S3Client({ region: 'us-east-1' });
const redis = new Redis(process.env.REDIS_URL!);
interface AssetPayload {
id: string;
url: string;
hash: string;
retention_status: 'active' | 'marked_for_deletion' | 'purged';
}
async function verifyZeroReferences(assetId: string): Promise<boolean> {
const query = `
query CheckReferences($id: ID!) {
asset(id: $id) {
_referencesCount
_draftReferencesCount
}
}
`;
const res = await fetch(CMS_GRAPHQL_ENDPOINT, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${CMS_API_TOKEN}`
},
body: JSON.stringify({ query, variables: { id: assetId } })
});
const { data } = await res.json();
return data?.asset?._referencesCount === 0 && data?.asset?._draftReferencesCount === 0;
}
async function purgeCDNAsset(hash: string): Promise<void> {
await fetch(CDN_PURGE_ENDPOINT, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${CDN_API_KEY}`
},
body: JSON.stringify({ files: [`https://cdn.example.com/${hash}`] })
});
}
export async function handleAssetRetention(req: Request, res: Response) {
const payload = req.body as AssetPayload;
if (!payload?.id || payload.retention_status !== 'marked_for_deletion') {
return res.status(400).json({ error: 'Invalid retention payload' });
}
try {
// 1. Verify referential integrity
const isOrphaned = await verifyZeroReferences(payload.id);
if (!isOrphaned) {
return res.status(200).json({ status: 'deferred', reason: 'Active references exist' });
}
// 2. Decrement reference counter (idempotent)
const refCount = await redis.decr(`asset:refs:${payload.id}`);
if (refCount > 0) {
return res.status(200).json({ status: 'deferred', reason: 'Redis counter > 0' });
}
// 3. Execute storage deletion
await s3.send(new DeleteObjectCommand({
Bucket: S3_BUCKET,
Key: `assets/${payload.id}/${payload.hash}`
}));
// 4. Invalidate CDN edge cache
await purgeCDNAsset(payload.hash);
// 5. Update CMS state to purged (optimistic UI sync)
await fetch(CMS_GRAPHQL_ENDPOINT, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${CMS_API_TOKEN}`
},
body: JSON.stringify({
query: `mutation UpdateStatus($id: ID!) { updateAsset(id: $id, data: { retention_status: "purged" }) { id } }`,
variables: { id: payload.id }
})
});
return res.status(200).json({ status: 'purged', assetId: payload.id });
} catch (error) {
console.error('Retention pipeline failed:', error);
return res.status(500).json({ error: 'Retention execution failed' });
}
}
Operational notes
Make webhook handlers idempotent and route failed deletions to a dead-letter queue. Storage providers rate-limit lifecycle transitions, so batch orphaned assets during off-peak windows to avoid throttling.
For enterprise deployments, align TTLs with legal-hold requirements and audit logging as covered in Enterprise CMS Governance & Compliance. When picking a platform, confirm it exposes granular asset lifecycle hooks and webhook payload filtering — see Headless CMS Architecture & Platform Selection.
Monitor orphan-detection latency, CDN purge success rate, and storage cost reduction, and alert on webhook failures and reference-counter drift.