Schema stitching for multi-vendor headless architectures

Schema stitching unifies disparate vendor GraphQL endpoints — CMS, commerce, DAM, search — into one frontend-facing schema without asking any vendor to adopt a shared SDK. The gateway introspects each endpoint, normalizes conflicting types, and resolves cross-service references at runtime. It’s the integration path when vendors export static schemas but block the runtime federation hooks that Advanced GraphQL Federation Patterns require.

The distinction matters: federation relies on explicit @key directives, subgraph ownership, and vendor-side federation SDKs. Stitching operates purely at the introspection level. That makes it the option for legacy CMS instances, third-party SaaS, and governed enterprise endpoints that prohibit federation hooks.

The gateway abstraction

The stitching gateway is a transparent proxy: it aggregates multiple schemas into one executable surface, intercepts client queries, delegates field resolution to the right downstream service, and merges results. No vendor API changes. This fits Headless CMS Architecture & Platform Selection decisions that prioritize composability over monolithic coupling.

The catch is normalization. Vendors expose conflicting type names, divergent scalars, and inconsistent pagination (offset vs. cursor). The gateway must apply deterministic transforms during schema construction or hit runtime collisions and unpredictable client typing.

The gateway introspects each vendor, normalizes types, and delegates cross-service fields at query time:

flowchart TD
  Client["Frontend client"] --> GW["Stitching gateway"]
  subgraph Construction["Schema construction"]
    Intro["Introspect each vendor"] --> Rename["RenameTypes + FilterRootFields"]
    Rename --> Merge["stitchSchemas merged surface"]
  end
  GW -->|"delegate by selectionSet"| CMS["CMS vendor"]
  GW -->|"delegateToSchema"| Commerce["Commerce vendor"]
  GW -->|"delegateToSchema"| DAM["DAM vendor"]
  CMS --> GW
  Commerce --> GW
  DAM --> GW
  GW -->|"merged result"| Client

Introspection and type normalization

The flow starts with runtime introspection against each vendor. Disable introspection caching in development to catch schema drift; in production, cache snapshots to cut startup latency and avoid vendor rate limits during initialization.

import { stitchSchemas } from '@graphql-tools/stitch';
import { introspectSchema, RenameTypes, FilterRootFields } from '@graphql-tools/wrap';
import { delegateToSchema } from '@graphql-tools/delegate';
import { GraphQLSchema, OperationTypeNode, printSchema } from 'graphql';
import type { SubschemaConfig } from '@graphql-tools/delegate';

interface VendorConfig {
  uri: string;
  headers: Record<string, string>;
  prefix: string;
  excludeRootFields?: string[];
}

async function buildStitchedGateway(vendors: VendorConfig[]): Promise<GraphQLSchema> {
  const subschemas: SubschemaConfig[] = [];

  for (const vendor of vendors) {
    const schema = await introspectSchema({
      uri: vendor.uri,
      headers: vendor.headers,
      // Production optimization: cache introspection results
      // to avoid repeated vendor rate-limit exhaustion
    });

    subschemas.push({
      schema,
      transforms: [
        new RenameTypes((name) => `${vendor.prefix}${name}`),
        ...(vendor.excludeRootFields
          ? [new FilterRootFields((op, field) => !vendor.excludeRootFields!.includes(field))]
          : []),
      ],
    });
  }

  // Extend types to establish cross-service relationships
  const mergedSchema = stitchSchemas({
    subschemas,
    typeDefs: `
      extend type CmsProduct {
        commerceInventory: CommerceInventory
        damAssets: [DamAsset!]
      }
    `,
    resolvers: {
      CmsProduct: {
        commerceInventory: {
          selectionSet: `{ sku }`,
          resolve: async (parent, _args, context, info) => {
            return delegateToSchema({
              schema: subschemas[1], // Commerce subschema config
              operation: OperationTypeNode.QUERY,
              fieldName: 'inventoryBySku',
              args: { sku: parent.sku },
              context,
              info,
            });
          },
        },
      },
    },
  });

  // Validate merged schema before exposing to clients
  const schemaString = printSchema(mergedSchema);
  if (schemaString.includes('TypeConflictError') || schemaString.includes('undefined')) {
    throw new Error('Schema merge validation failed. Inspect printSchema output.');
  }

  return mergedSchema;
}

Three production requirements show up above:

Deterministic prefixing: RenameTypes prevents collisions when vendors share names like Page, User, Asset.
Root filtering: FilterRootFields strips vendor-specific mutations or queries that shouldn’t reach the unified client.
Explicit delegation: selectionSet declares the exact payload the upstream needs, so downstream resolvers get only required context.

Type collisions and drift

Most stitching failures surface during type resolution. When two vendors define overlapping Page, Asset, or User types without transforms, schema construction throws TypeConflictError. Worse are silent overrides: two vendors with identical field names but mismatched scalar types produce unpredictable client behavior.

Guard against drift with CI schema validation — diff introspection snapshots against a baseline using graphql-diff or similar. When a vendor updates an API without versioning, the gateway must reject the incompatible merge or apply a fallback transform. The GraphQL specification requires strict type compatibility during merging; violating it produces runtime failures that slip past standard error boundaries.

Delegation latency

Misaligned selectionSet config drives latency. Over-fetching trips vendor rate limits and bloats payloads; under-fetching propagates null when a required field is missing from the parent. To tune delegation:

Precise selection sets: request only the fields downstream resolution needs.
Batched delegation: enable batching on delegateToSchema when resolving multiple parents to cut round trips.
Response caching: add DataLoader or a Redis cache at the gateway for high-frequency queries like inventoryBySku or assetByHash.
Timeout enforcement: wrap delegation in AbortController so a degraded vendor doesn’t cascade.

Instrument the gateway with OpenTelemetry spans to trace cross-service resolution and catch bottlenecks in TTFB, resolver execution time, and cache hit ratio before they reach the frontend.

Governance and header propagation

Multi-vendor stitching complicates authorization. Never hardcode vendor tokens. Extract auth headers from the incoming client request, validate scopes, and forward only the credentials each subschema needs.

const context = async ({ req }) => {
  const cmsToken = extractScopedToken(req.headers.authorization, 'cms:read');
  const commerceKey = extractScopedToken(req.headers.authorization, 'commerce:inventory');

  return {
    cmsHeaders: { Authorization: `Bearer ${cmsToken}` },
    commerceHeaders: { 'X-Vendor-Auth': commerceKey },
    // Propagate tenant ID for multi-tenant routing
    tenantId: req.headers['x-tenant-id'],
  };
};

Compliance requires audit trails for cross-service access. Log delegation paths, response sizes, and error classes at the gateway without exposing PII or tokens. Rotate vendor credentials and enforce least-privilege scopes.

When to stitch instead of federate

Stitching wins when vendor APIs lack federation support, enforce strict schema-export policies, or span organizational boundaries with limited API governance. It gives fast integration and minimal lock-in. But past roughly five stitched services, delegation complexity and latency usually justify migrating to federation with explicit subgraph ownership. Treat stitching as transitional or permanent based on vendor maturity, compliance needs, and DX targets — done right, it delivers one GraphQL surface without forcing vendor cooperation.