Dynamic Sitemap Generation
A static XML sitemap goes stale the moment content scales across nested taxonomies, locales, and draft states — in a headless stack the sitemap is a build artifact, not a file. This guide treats it as a data pipeline: a status-filtered CMS query, framework-native route enumeration, and edge caching that keeps the sitemap fresh without rebuilding the site. The goal is to feed crawlers only canonical, published URLs. It’s a building block of Localization & SEO Optimization; the implementation lives in Generating XML sitemaps from headless CMS routes.
Route Discovery & CMS Queries
The sitemap moves through four stages — a status-filtered query, route enumeration, edge caching, and post-build validation — each feeding the next.
flowchart LR Q["Status-filtered CMS query (published, locale, slug)"] --> E["Framework route enumeration per locale"] E --> X["Serialize / stream XML"] X --> C["Edge cache: stale-while-revalidate"] C --> V["Post-build audit: cross-check live routes"] V --> Crawl["Crawlers see canonical published URLs"]
Start with a deterministic fetch of every routable entity. Contentful, Sanity, and Strapi expose GraphQL or REST endpoints built for bulk retrieval. Filter strictly by publication status, locale, and slug; exclude drafts and archived entries unless you’re targeting a preview environment. Request only slug, updatedAt, locale, and changefreq via projection queries, and flatten nested structures at the query layer to avoid expensive client-side recursion.
*[_type in ["post", "page", "category"] && defined(slug.current) && status == "published"] {
"slug": slug.current,
"type": _type,
"lastmod": _updatedAt,
"locale": coalesce(locale, "default"),
"priority": select(
_type == "page" => 1.0,
_type == "post" => 0.8,
0.5
)
}
Missing localized routes need Content Fallback & Routing so fallback URLs don’t pollute the index. Run queries against read-optimized CDN endpoints with retry logic for transient failures.
Framework Implementation
The framework dictates how the sitemap reaches crawlers. In the Next.js App Router, generateSitemaps() and generateStaticParams() enumerate routes programmatically: return locale identifiers, fetch per locale, and stream the XML response to avoid memory spikes on large builds.
// app/sitemap.ts
import { MetadataRoute } from 'next';
import { createClient } from '@sanity/client';
const client = createClient({
projectId: process.env.SANITY_PROJECT_ID!,
dataset: 'production',
apiVersion: '2024-01-01',
useCdn: true,
});
export async function generateSitemaps() {
return [{ id: 'en' }, { id: 'es' }, { id: 'fr' }];
}
export default async function sitemap({ id }: { id: string }): Promise<MetadataRoute.Sitemap> {
const query = `*[_type in ["post", "page"] && locale == $locale && defined(slug.current) && status == "published"] {
"url": "/" + $locale + "/" + slug.current,
"lastModified": _updatedAt
}`;
const routes = await client.fetch(query, { locale: id });
return routes.map((route: { url: string; lastModified: string }) => ({
url: route.url,
lastModified: new Date(route.lastModified),
changeFrequency: 'weekly',
priority: 0.8,
}));
}
Nuxt 3 uses server routes or @nuxtjs/sitemap; Astro uses getStaticPaths() plus community plugins. Whatever the framework, explicit locale routing prevents duplicate indexing — and Route Mapping for Multilingual Sites keeps hreflang annotations aligned with sitemap entries.
Caching & Edge Delivery
Sitemap endpoints trade freshness against CDN efficiency. Use stale-while-revalidate with a conservative max-age (around 1 hour) to absorb traffic without serving stale data indefinitely. At scale, pre-generate sitemaps in CI/CD and persist them to object storage (S3, Cloudflare R2) to shed origin load — which pairs with Incremental sitemap regeneration for dynamic CMS routes so invalidation fires only when a content type actually changes.
Google’s sitemap guidelines cap a sitemap at 50MB uncompressed and 50,000 URLs. Past that, use a sitemap index (sitemap_index.xml) referencing locale- or type-chunked sitemaps, and let edge functions serve the right chunk by Accept-Language or query parameter.
Validation
Generating the sitemap is half the job; keeping it accurate needs continuous checks. Validate XML structure, URL accessibility, and lastmod formatting in the deploy pipeline, and lint for mixed HTTP/HTTPS, trailing-slash inconsistencies, and preview URLs leaking into production. Run Automated SEO audits for headless CMS deployments post-build to cross-reference sitemap URLs against live routes, validate robots.txt, and confirm canonical alignment. The Next.js sitemap file convention covers framework-level generation.
Conclusion
Strict query projections, framework-native routing APIs, and edge-aware caching turn the sitemap from a stale file into a live reflection of your published content graph. Paired with fallback routing and post-build audits, content velocity stops costing you crawl efficiency.