JSON Schema Tutorial Part 3 – Design and Structure

JSON Schemas provide a method for validating JSON data, enhancing maintainability and clarity through reusability of subschemas defined under $defs and referenced by $ref. Effective schema design includes normalization of data, versioning strategies for changes, and adherence to best practices. This ensures backward compatibility and organized data management for complex structures.


JSON Schemas are powerful tools for validating JSON data, ensuring consistency and correctness. As your data structures grow in complexity, so too can your schemas. This is where reusability becomes paramount. By defining common patterns once and referencing them throughout your schema, you can significantly improve maintainability, reduce redundancy, and enhance the clarity of your schema definitions.


$defs and $ref

JSON Schema provides the $defs keyword ($definitions before Draft 2019-09) to define reusable subschemas. You can then reference these subschemas using the $ref keyword. This mechanism is crucial for building modular and organized schemas.

Defining and Referencing Internal Subschemas

Let’s illustrate with an example. Imagine an AddressType that might appear in various parts of your data, like a billingAddress or a shippingAddress.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Order Schema",
  "$defs": {
    "AddressType": {
      "type": "object",
      "properties": {
        "streetAddress": { "type": "string" },
        "city": { "type": "string" },
        "postalCode": { "type": "string" }
      },
      "required": ["streetAddress", "city", "postalCode"]
    },
    "ProductType": {
      "type": "object",
      "properties": {
        "productId": { "type": "string" },
        "name": { "type": "string" }
      },
      "required": ["productId", "name"]
    }
  },
  "type": "object",
  "properties": {
    "orderId": { "type": "string" },
    "billingAddress": { "$ref": "#/$defs/AddressType" },
    "shippingAddress": { "$ref": "#/$defs/AddressType" },
    "products": {
      "type": "array",
      "items": { "$ref": "#/$defs/ProductType" }
    }
  },
  "required": ["orderId", "billingAddress", "items"]
}

A JSON Schema using defs in order to define re-usable definitions

In this example:

  • The $defs property contains all our re-usable definitions. In this example we define two reusable sub-schemas : AddressType and ProductType.
  • A $ref is a JSON Pointer reference to a JSON sub schema, although the referenced schema can be anywhere in the file, its good practice to only reference schemas in the $defs section.
  • Both billingAddress and shippingAddress properties reference the AddressType using "$ref": "#/$defs/AddressType".
  • The values in the products array are validated against the ProductType sub-schema using "$ref": "#/$defs/ProductType".

Referencing External Subschemas

For even greater modularity, you can define your subschemas in separate files or at external URLs and reference them. This is particularly useful when you have subschemas common across multiple, independent JSON Schema files.

Imagine we have a separate file named common.json containing our common $defs:

Common.json

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "title": "Order Schema",
    "$defs": {
        "AddressType": {
            "type": "object",
            "properties": {
                "streetAddress": { "type": "string" },
                "city": { "type": "string" },
                "postalCode": { "type": "string" }
            },
            "required": [
                "streetAddress",
                "city",
                "postalCode"
            ]
        },
        "ProductType": {
            "type": "object",
            "properties": {
                "productId": { "type": "string" },
                "name": { "type": "string" }
            },
            "required": [
                "productId",
                "name"
            ]
        }
    }
}

Now, in our OrderSchema.json, we can reference this external file.

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "type": "object",
    "title": "Order Schema",
    "properties": {
        "orderId": { "type": "string" },
        "billingAddress": {
            "$ref": "Common.json#/$defs/AddressType"
        },
        "shippingAddress": {
            "$ref": "Common.json#/$defs/AddressType"
        },
        "products": {
            "type": "array",
            "items": {
                "$ref": "Common.json#/$defs/ProductType"
            }
        }
    },
    "required": [
        "orderId",
        "billingAddress",
        "products"
    ]
}

Examples of $ref to external files

  • Common.json#/$defs/ProductType If the 2 files are in the same folder (either on a file system or web server).
  • ../Common.json#/$defs/ProductType If Common.json is in the folder above.
  • file:///C:/Temp/Common.json#/$defs/ProductType If Common.json is in the windows folder folder C:\Temp.
  • file:///home/user/documents/Common.json#/$defs/ProductType If Common.json is in the linux folder folder /home/user/documents.
  • http://www.server.com/schemas/Common.json#/$defs/ProductType If Common.json is at the url http://www.server.com/schemas/Common.json.

Notes Naming conventions. It is good practice to use a naming conversion that distinguishes your er-usable types. In this example the type definitions are in PascalCase and end with ‘Type’. Choose your own conventions, but use them consistently.


Normalizing Data to Avoid Repetition

While JSON Schemas help in defining data structures, the principle of normalization also applies to the actual JSON data itself. Avoiding repeated data within your JSON instances can lead to smaller file sizes, faster processing, and reduced chances of inconsistencies.

Consider a scenario where you have a list of products, and each product might have a category. Instead of embedding the full category details within each product object, you could normalize the data by having a separate list of categories and referencing them by an ID.

Unnormalized Data:

{
  "products": [
    {
      "productId": "prod-001",
      "name": "Laptop",
      "category": {
        "categoryId": "cat-001",
        "categoryName": "Electronics",
        "description": "Things that need electricity"        
      }
    },    
    {
      "productId": "prod-002",
      "name": "Socks",
      "category": {
        "categoryId": "cat-002",
        "categoryName": "Clothing",
        "description": "Things you wear"
      }
    },
    {
      "productId": "prod-003",
      "name": "Mouse",
      "category": {
        "categoryId": "cat-001",
        "categoryName": "Electronics",
        "description": "Things that need electricity"        
      }
    }
  ]
}

Normalized Data:

{
  "products": [
    {
      "productId": "prod-001",
      "name": "Laptop",
      "categoryId": "cat-001"
    },
    {
      "productId": "prod-002",
      "name": "Socks",
      "categoryId": "cat-002"
    },
    {
      "productId": "prod-003",
      "name": "Mouse",
      "categoryId": "cat-001"
    }
  ],
  "categories": [
    {
      "categoryId": "cat-001",
      "categoryName": "Electronics",
      "description": "Things that need electricity"        
    },    
    {
      "categoryId": "cat-002",
      "categoryName": "Clothing",
      "description": "Things you wear"
    }
  ]
}

While JSON Schema doesn’t enforce data normalization directly, designing your schemas to accommodate normalized data structures is a best practice. You would then use a schema to validate the structure of both products and categories. Unfortunately the JSON can not validate the referential integrity of the categoryId references, so this must be done within your application.


Versioning Strategies for JSON Schema and Data

As your applications evolve, so too will your data structures and their corresponding JSON Schemas. Implementing robust versioning strategies is crucial for managing changes, ensuring backward compatibility, and facilitating smooth transitions.

Benefits of Versioning:

  • Backward Compatibility: Allows older clients or systems to continue processing data validated by previous schema versions.
  • Controlled Evolution: Provides a structured way to introduce changes without breaking existing functionalities.
  • Clear Communication: Explicitly defines the expected data format for different versions, improving communication between data producers and consumers.
  • Reduced Risk: Minimizes the risk of deploying breaking changes to production systems.

Best Ways to Achieve Versioning:

  1. Schema $id and title with Version Numbers: Include a version number in the $id (URI identifier) and title of your schema. This makes it clear which version of the schema a document adheres to.

    {
      "$schema": "https://json-schema.org/draft/2020-12/schema",
      "$id": "https://example.com/schemas/product-v1.0.json",
      "title": "Product Schema - Version 1.0",
      "type": "object",
      "properties": {
        "productId": { "type": "string" }
      }
    }
    

    When you make a breaking change, increment the major version number:

    {
      "$schema": "https://json-schema.org/draft/2020-12/schema",
      "$id": "https://example.com/schemas/product-v2.0.json",
      "title": "Product Schema - Version 2.0",
      "type": "object",
      "properties": {
        "productId": { "type": "string" },
        "description": { "type": "string" }
      }
    }
    
  2. Versioning in the JSON Data:

    Include a schemaVersion or similar property directly within your JSON data. This allows consumers to quickly identify which schema version the data conforms to.

    {
      "schemaVersion": "1.0",
      "productId": "prod-A123"
    }
    

    When new data conforms to a new schema version, update this field:

    {
      "schemaVersion": "2.0",
      "productId": "prod-B456",
      "description": "A detailed description."
    }
    

    Your application logic would then use this schemaVersion field to determine which schema to validate against and how to process the data.

  3. API Versioning: If your JSON data is exposed via an API, consider versioning your API endpoints (e.g., /v1/products, /v2/products). This is a common and effective way to manage schema evolution.

  4. Content Negotiation: For RESTful APIs, use the Accept header to specify the desired media type and version (e.g., application/vnd.example.product.v1+json).


6 Best Practice Tips for Designing JSON Schemas

  1. Use $defs for Reusability: Always define reusable subschemas under the /$defs keyword. This promotes modularity, reduces redundancy, and makes your schemas easier to read and maintain. Give meaningful and consistent names to your subschemas (e.g., AddressType, ProductDetailsType).

  2. Be Specific with Types and Formats: Utilize the type keyword (e.g., string, number, integer, boolean, object, array, null) to define the expected data type. For strings, leverage the format keyword (e.g., uuid, email, date-time, uri) to enforce common data patterns and provide more specific validation.

  3. Define Required Properties Clearly: Explicitly list all mandatory properties using the required keyword within an object schema. This ensures that essential data fields are always present.

  4. Add Descriptive Metadata: Use title and description to provide human-readable explanations of your schema and its properties. This documentation is invaluable for understanding the schema’s purpose and how to use it correctly, especially in complex schemas or when shared across teams.

  5. Include Version Information: Use the $id property in the JSON Schema to indicate the version of the schema. Use a property (i.e. schemaVersion) in your JSON Instance documents to indicate the version of the schema it was created for.

  6. Validate External References Thoroughly: When referencing external schemas or subschemas via $ref, ensure that the referenced schemas are themselves valid and accessible. Implement testing to confirm that external references resolve correctly during validation. Be mindful of potential issues like broken links or changes in external schema definitions that could impact your schema’s validity.

Discover more from Liquid Technologies Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading