How to Migrate from WordPress to Contentful Without Losing SEO

Transitioning from a coupled Content Management System (CMS) to a decoupled architecture requires precise URL mapping and metadata preservation. When evaluating Headless CMS Architecture & Platform Selection, engineers must prioritize crawlability, canonical consistency, and structured data retention before initiating content exports. This blueprint outlines a deterministic migration path that protects Search Engine Optimization (SEO) equity while modernizing your stack.

Content Type Mapping & Field Translation

WordPress post types, taxonomies, and custom fields must be normalized into Contentful content models. Flattening hierarchical categories into reference fields prevents orphaned assets and maintains internal link equity during the transition. Map wp_posts and wp_postmeta directly to discrete Contentful fields. Avoid nesting complex objects. Use Contentful's Link type for relational data.

DX Tradeoff: Flat fields reduce GraphQL payload size and simplify query logic. However, they require upfront schema design. Deeply nested WordPress custom fields will break during import if not pre-flattened.

Legacy date-based permalinks require regex-based transformation to flat Contentful slugs. Edge-layer redirects must intercept requests before the headless framework renders to preserve link juice and prevent 404 cascades. Implement redirects at the Content Delivery Network (CDN) or edge runtime level. Client-side routing will never match crawler expectations.

DX Tradeoff: Edge routing guarantees crawler compliance and eliminates hydration delays. The tradeoff is increased configuration overhead. You must maintain a deterministic mapping file outside your application code.

SEO Plugin Data Extraction & Injection

Yoast and RankMath metadata stored as raw HTML or serialized arrays must be parsed into discrete Contentful fields. During Monolithic vs Headless CMS Migration, failing to map meta titles, descriptions, and Open Graph tags to Server-Side Rendering (SSR) or Static Site Generation (SSG) head components directly causes indexing drops. Extract serialized PHP arrays using a Node.js script. Flatten them into a clean JSON structure before import.

DX Tradeoff: Headless SEO abstraction layers enforce strict TypeScript interfaces. This prevents runtime meta tag failures. The cost is additional build-time validation logic.

Crawlability Testing & Post-Launch Monitoring

Dynamic sitemap generation, robots.txt configuration, and canonical tag validation must run in CI/CD pipelines. Post-deployment, Google Search Console (GSC) indexing reports and server log analysis confirm successful rank retention. Automate validation checks in your deployment workflow. Treat SEO as a first-class build artifact, not an afterthought.

Implementation Blueprint

The following TypeScript implementation covers redirect generation, edge routing, and metadata injection. It uses modern Next.js 14+ patterns and the Contentful Delivery API (CDA).

Step 1 & 2: Extract & Normalize Legacy URL Inventory

// scripts/generate-redirects.ts
import { readFileSync, writeFileSync } from 'fs';
import { parse } from 'csv-parse/sync';

interface LegacyEntry {
 post_name: string;
 post_date: string;
 meta_title: string;
}

// Critical: Strips /YYYY/MM/ prefix to match headless flat slug structure
const normalizeSlug = (entry: LegacyEntry) => {
 const datePrefix = entry.post_date.split(' ')[0].replace(/-/g, '/');
 return `/${datePrefix}/${entry.post_name}`.replace(/^(\d{4})\/(\d{2})\/(.+)$/, '/$3');
};

export function generateRedirectMap(csvPath: string, outputPath: string) {
 const raw = readFileSync(csvPath, 'utf-8');
 const data = parse(raw, { columns: true }) as LegacyEntry[];
 
 const redirects = data.map(entry => ({
 source: normalizeSlug(entry),
 destination: `/posts/${entry.post_name}`,
 permanent: true,
 }));

 writeFileSync(outputPath, JSON.stringify(redirects, null, 2));
 console.log(`Mapped ${redirects.length} legacy routes.`);
}

Flow: Reads a CSV export from WordPress CLI. Applies a deterministic regex to strip date prefixes. Outputs a JSON array for edge consumption.

Step 3 & 4: Deploy Edge Redirects & Reconstruct Metadata

// middleware.ts (Edge Runtime)
import { NextResponse } from 'next/server';
import redirects from './redirects.json';

export const config = { matcher: ['/posts/:path*', '/legacy/:path*'] };

export function middleware(req: Request) {
 const { pathname } = new URL(req.url);
 const match = redirects.find(r => r.source === pathname);

 if (match) {
 // Critical: 301 status preserves link equity and signals permanent move
 return NextResponse.redirect(match.destination, { status: 301 });
 }
 return NextResponse.next();
}

// app/posts/[slug]/page.tsx
import { Metadata } from 'next';
import { createClient } from 'contentful';

const client = createClient({
 space: process.env.CONTENTFUL_SPACE_ID!,
 accessToken: process.env.CONTENTFUL_CDA_TOKEN!,
});

export async function generateMetadata({ params }: { params: { slug: string } }): Promise<Metadata> {
 const { items } = await client.getEntries({
 content_type: 'post',
 'fields.slug': params.slug,
 });

 const fields = items[0]?.fields;
 if (!fields) throw new Error('Entry not found');

 return {
 title: fields.metaTitle || fields.title,
 description: fields.metaDescription,
 openGraph: {
 title: fields.ogTitle || fields.title,
 images: [fields.ogImage?.fields?.file?.url],
 },
 // Critical: Prevents duplicate content penalties during migration
 alternates: { canonical: `https://yourdomain.com/posts/${params.slug}` },
 };
}

export async function generateStaticParams() {
 const { items } = await client.getEntries({ content_type: 'post' });
 return items.map(i => ({ slug: i.fields.slug }));
}

Flow: Middleware intercepts requests at the network edge before framework hydration. generateMetadata pulls discrete fields from Contentful. Fallback logic prevents empty <meta> tags. Cache-Control headers are automatically managed by Next.js ISR/SSG. Add headers: { 'Cache-Control': 'public, s-maxage=86400, stale-while-revalidate=3600' } to API routes if fetching dynamically.

Common Pitfalls

  • 404/410 Spikes: Occur when date-based permalinks bypass the regex transformation. Validate mapping files against legacy CSV exports before deployment.
  • Duplicate Content Penalties: Caused by missing or mismatched canonical tags. Always inject dynamic canonicals pointing to the new flat URL.
  • Rich Snippet Loss: Stripped JSON-LD happens when structured data is hardcoded instead of mapped to Contentful fields. Use a dedicated schema injection component.
  • Broken Media References: Legacy media URLs bypass CDN rewrite rules. Run a bulk find/replace on wp-content/uploads paths during the export phase.

Prevention Strategy

  • Automate URL diff validation in CI/CD using the Contentful Delivery API versus legacy CSV exports.
  • Enforce Contentful field validation rules to block publishing of entries missing required SEO fields.
  • Implement headless SEO abstraction layers with strict TypeScript interfaces.
  • Schedule quarterly 301 redirect audits and automated GSC coverage report parsing.

FAQ

Does migrating to headless inherently drop search rankings? No. Rankings drop when URL structures change without 301 redirects, or when metadata fails to render during SSR/SSG. Deterministic mapping preserves equity.

Should I use REST or GraphQL for SEO data fetching? GraphQL reduces over-fetching and allows precise field selection for meta tags. REST is simpler for flat exports. Choose GraphQL for production headless builds to optimize payload size.

How do I handle WordPress shortcodes during migration? Strip shortcodes during the Node.js export phase. Replace them with React components or Markdown equivalents. Injecting raw shortcodes breaks headless rendering.

Can I automate canonical tag validation? Yes. Run a CI/CD script that fetches rendered HTML from staging. Parse <link rel="canonical"> and compare against the expected Contentful slug. Fail the build on mismatch.