Skip to content

URL normalization

Overview

URL normalization ensures that different URL variations pointing to the same content share a single cache entry. Proper normalization reduces the number of renders and decreases cache storage usage.

Many query parameters serve only tracking or marketing purposes. Parameters like utm_source, utm_content, and fbclid do not affect page content and can be stripped before caching.

Other parameters affect content but their order does not. For example, ?sort=price&limit=10 and ?limit=10&sort=price return identical results. Edge Gateway automatically sorts query parameters alphabetically, ensuring both URLs generate the same cache key.

Normalization rules

Scheme

  • Converted to lowercase
  • URLs without scheme default to https://

Host

  • Converted to lowercase
  • Trailing dots removed (example.com.example.com)
  • Default ports removed (:80 for HTTP, :443 for HTTPS)

Path

  • Empty paths become /
  • Duplicate slashes collapsed (///)
  • Relative segments resolved (. and ..)
  • Trailing slashes preserved

Query parameters

  • Parameters sorted alphabetically by key
  • Multiple values for same key preserved in original order
  • Consistent URL encoding
  • Empty values handled (?key vs ?key=)

Fragments

  • Removed by default (content after #)

Non-ASCII characters

  • Percent-encoded using UTF-8 (e.g., α becomes %CE%B1)
  • Uppercase hex digits in encoding (e.g., %2f normalized to %2F)
  • Applies to paths, query parameter names, and values

Tracking parameter removal

Edge Gateway strips common tracking and analytics parameters by default. You can customize this list per host.

Default parameters

ParameterSource
utm_sourceGoogle Analytics
utm_contentGoogle Analytics
utm_mediumGoogle Analytics
utm_campaignGoogle Analytics
utm_termGoogle Analytics
gclidGoogle Ads
fbclidFacebook
msclkidMicrosoft Ads
_gaGoogle Analytics
_glGoogle Analytics
mc_cidMailchimp
mc_eidMailchimp
_keKlaviyo
refGeneric referrer
referrerGeneric referrer

Configuration

Use params to completely replace the built-in defaults, or params_add to extend them with additional patterns.

yaml
tracking_params:
  params:                       # Replaces all defaults
    - "utm_*"
    - "fbclid"
yaml
tracking_params:
  params_add:                   # Keep defaults, add custom
    - "affiliate_id"
    - "tracking_*"

Host-level configuration adds to global parameters:

yaml
hosts:
  - domain: example.com
    id: 1
    render_key: "secret-key"
    tracking_params:
      params_add:
        - "custom_param"
    render:
      timeout: 30s
      # ... other host settings

To disable tracking parameter stripping entirely:

yaml
tracking_params:
  strip: false
yaml
hosts:
  - domain: example.com
    id: 1
    render_key: "secret-key"
    tracking_params:
      strip: false
    # ...

URL pattern level configuration for specific paths:

yaml
hosts:
  - domain: example.com
    id: 1
    render_key: "secret-key"
    url_rules:
      - match: "/api/*"
        action: "bypass"
        tracking_params:
          strip: false              # Keep all params for API
      - match: "/campaigns/*"
        action: "render"
        tracking_params:
          params_add:
            - "campaign_*"          # Strip additional params

Configuration follows the standard hierarchy: Global → Host → URL pattern. Parameters accumulate down the hierarchy when using params_add. Use params at any level to discard all inherited parameters and start fresh with only the specified list.

Pattern types

TypeSyntaxExampleCase
ExactNo prefix"fbclid"Insensitive
Wildcard*"utm_*"Insensitive
Regexp~ prefix"~^gclid.*"Sensitive
Regexp~* prefix"~*^gclid.*"Insensitive

Cache key generation

After normalization, Edge Gateway generates a hash using XXHash64. This hash becomes part of the Redis cache key and HTML filename.

  • Algorithm: XXHash64 (fast, non-cryptographic hash)
  • Output format: 16-character lowercase hexadecimal (e.g., a1b2c3d4e5f67890)
  • Deterministic: Same normalized URL always produces same hash

Example transformation:

Input:  HTTPS://Example.Com:443/path//to/../page?z=1&a=2&utm_source=google#section
Output: https://example.com/path/page?a=2&z=1
Hash:   a1b2c3d4e5f67890

The hash is used in:

  • Redis key: cache:{host_id}:{dimension_id}:{hash}
  • Filename: {hash}_{dimension}.html

Normalization examples

Input URLNormalized URLNotes
HTTP://EXAMPLE.COM/Pagehttp://example.com/PageScheme/host lowercased, path case preserved
https://example.com:443/https://example.com/Default port removed
example.com/path//filehttps://example.com/path/fileScheme added, double slash collapsed
https://example.com/a/../bhttps://example.com/bParent reference resolved
https://example.com/page?b=2&a=1https://example.com/page?a=1&b=2Query params sorted
https://example.com/?a=2&b=1&a=1https://example.com/?a=2&a=1&b=1Multiple values preserved
https://example.com/page?utm_source=googlehttps://example.com/pageTracking param stripped
https://example.com/page?id=5&fbclid=abc123https://example.com/page?id=5fbclid stripped, id preserved
https://example.com/?gclid=xyz&ref=twitter&q=testhttps://example.com/?q=testMultiple tracking params stripped
/path?b=2&a=1ErrorMissing host