Rate limiting and query complexity in federated GraphQL

Request-counting rate limits fail in federated GraphQL because one client operation fans out across many subgraphs — content, media, localization, commerce — and the computational cost compounds before any payload returns. You need two independent controls: deterministic complexity scoring at the router, and frequency-based rate limiting keyed per tenant. Without them, agency and Jamstack teams hit unpredictable TTFB spikes, falling cache hit ratios, and cascading gateway timeouts. Both are decided during Headless CMS Architecture & Platform Selection, where gateway configuration sets the performance ceiling and tenant isolation boundary.

The execution-cost problem

Unbounded complexity comes from resolver-depth multiplication plus uncoordinated pagination defaults. A single operation can trigger parallel execution across dozens of subgraphs; if each applies a default first: 50 or limit: 100, the aggregated result blows past payload thresholds. That shows up as router memory pressure, serialization latency, and eventual 504s.

Diagnosing it means tracing the execution plan before the query reaches CMS data stores. REST rate limiters measure HTTP request frequency, not computational weight — a lightweight introspection query might cost 5 units while a deeply nested content tree with unbounded lists costs 15,000. Treat them identically at the network layer and resource exhaustion is guaranteed.

The two controls act as independent gates an operation must clear before subgraphs execute:

flowchart TD
  Req["Incoming operation"] --> Build{"Build token?"}
  Build -->|"yes"| CScore["Complexity scoring"]
  Build -->|"no"| RL{"Within tenant rate limit?"}
  RL -->|"no"| R429["429 Too Many Requests"]
  RL -->|"yes"| CScore
  CScore --> Cgate{"Score under threshold?"}
  Cgate -->|"no"| Rej["Reject: COMPLEXITY_LIMIT_EXCEEDED"]
  Cgate -->|"yes"| Plan["Execute federated query plan"]
  Plan --> Subgraphs["Content / media / commerce subgraphs"]

Deterministic complexity scoring

Scoring starts at the routing layer: parse the AST, assign static and dynamic weights to field selections, and reject queries over a threshold before execution. Validation plugins evaluate field depth, connection multipliers, and custom scalar costs. The config below attaches cost multipliers to pagination arguments so the router estimates worst-case paths.

TypeScript
import { createComplexityLimitRule } from 'graphql-validation-complexity';
import { GraphQLError } from 'graphql';

const complexityRule = createComplexityLimitRule(1000, {
  onCost: (cost) => console.warn(`Query complexity score: ${cost}`),
  createError: (max, actual) => new GraphQLError(
    `Query complexity ${actual} exceeds maximum allowed ${max}`,
    { extensions: { code: 'COMPLEXITY_LIMIT_EXCEEDED' } }
  ),
  fieldExtensions: {
    cost: (args) => args?.limit || args?.first || 10,
    multipliers: ['limit', 'first', 'last', 'pageSize']
  }
});

export const validationRules = [complexityRule];

This rejects expensive resolver trees before they materialize. The standardized error code lets frontends fall back gracefully or prompt editors to simplify a query. For gateway-level policy patterns, see Advanced GraphQL Federation Patterns.

Distributed rate limiting

Complexity analysis stops expensive queries; it does nothing against credential stuffing, rapid polling, or scraping. Rate limiting handles frequency per tenant, API key, or IP. In multi-tenant deployments, attach sliding-window counters to the gateway router, not individual subgraphs — centralizing the state avoids inconsistencies and keeps throttling accurate across boundaries.

YAML
# Apollo Router configuration for distributed rate limiting
rate_limit:
  - name: tenant_throttle
    source: header
    header_name: x-tenant-id
    algorithm: sliding_window
    window_size: 60s
    max_requests: 300
    redis:
      url: redis://cache.internal:6379
      key_prefix: "cms:ratelimit:"

When a tenant exceeds quota, return 429 Too Many Requests per RFC 6585, with structured JSON so frontend SDKs can back off exponentially without manual handling.

Build-time exemptions

Static site generation breaks naive limits: Jamstack builds fire hundreds of parallel GraphQL queries in CI, tripping false-positive rate limits and exhausting tenant quotas. Run a dual-tier policy — one for runtime client traffic, one for machine-to-machine build processes. Build tokens bypass the sliding window but still honor complexity caps so a runaway query can’t slip through. Add a cache-warming layer that materializes frequent queries into CDN edge nodes to cut origin load; paired with ISR, content updates propagate without overloading the gateway.

Monitoring and governance

Governance needs observability into both complexity scores and throttle events. Track avg_query_cost, throttle_rate, and cache_miss_ratio alongside standard DX metrics. Log rejected queries with their AST signatures so you can find high-cost resolvers, add DataLoader batching, or refactor unbounded lists into cursor-based pagination. That turns rate limiting from a defensive patch into an architecture control aligned with compliance boundaries.

Conclusion

Federated rate limiting is execution-weight modeling, not request counting. Score complexity at the router, decouple frequency limits from computational cost, and exempt build traffic deliberately — that combination keeps latency predictable and tenant isolation intact as the frontend scales.