SJOT: Schemas for JSON Objects
==============================
by Robert van Engelen, September 28, 2016.
Updated November 15, 2017.
Star Download
The [JSON schema](http://json-schema.org) draft was an important move forward
to make JSON more useful with APIs and other systems that require JSON content
validation.
However, working with JSON schema can be daunting and defeats the simplicity of
JSON.
We created a simpler alternative to JSON schema that is more compact and easier
to use. We call it *Schemas for JSON Objects* or simply *SJOT*.
SJOT aims at fast JSON validation and type-checking with lightweight schemas
and compact validators.
SJOT schemas are valid JSON, just like JSON schema. But SJOT schemas are faster,
more compact, and more intuitive. A SJOT schema of an object can be as simple
as a *JSON object template*. Because SJOT schemas have the look and feel of a
template, SJOT is easy to use.
Not convinced? Try a [live demo](get-sjot.html#demo) of SJOT and snapSJOT in
action.
SJOT by example {#example}
---------------
As a first example, say we have a JSON representation of a company product,
similar to the [json-schema.org example](http://json-schema.org/example1.html)
of a JSON schema (which is over 40 lines long!) that describes a company
product API. An example product in this API is:
[json]
{
"id": 1,
"name": "A green door",
"price": 12.50
}
The product properties `id`, `name`, and `price` are considered the bare
minimum properties of a product and should therefore be required. Other
products may contain optional `tags`, `dimensions`, and a `warehouseLocation`:
[json]
{
"id": 2,
"name": "An ice sculpture",
"price": 12.50,
"tags": ["cold", "ice"],
"dimensions": {
"length": 7.0,
"width": 12.0,
"height": 9.5
},
"warehouseLocation": {
"latitude": -78.75,
"longitude": 20.4
}
}
Let's give this a SJOT, pun intended *(comments are added for clarity and are not part of SJOT)*:
[json]
{
"@id": "http://example.com/product.json", ← identify this schema (this is optional!)
"@note": "A company product", ← describe what is defined
"product": { ← define a product object that has...
"id": "number", ← a required id number
"name": "string", ← a required name string
"price": "<0.0..", ← a required price in decimal greater than 0.0
"tags?": "string{1,}", ← an optional tags array of unique strings (a non-empty set)
"dimensions?": { ← optional dimensions, when provided has..
"length": "number", ← a required length numeric dimension
"width": "number", ← a required width numeric dimension
"height": "number" ← a required height numeric dimension
},
"warehouseLocation?": "http://example.com/geo.json#location"
← an optional warehouseLocation with a type defined by another SJOT
}
}
It's easy to see that property names ending in `?` are optional. Property
types are named (such as `"number"`) or `{}`-objects, `[]`-arrays (see later),
and references to other SJOT schemas, such as
`"http://example.com/geo.json#location"`.
Similar to the json-schema.org example, the `location` type of
`warehouseLocation` is defined in a separate SJOT schema:
[json]
{
"@id": "http://example.com/geo.json",
"location": { ← define location that has...
"latitude": "float", ← a required latitude single precision float
"longitude": "float" ← a required longitude single precision float
}
}
As a side note, if you don't want to write schemas at all, then consider
using the JS `snapSJOT.convert(data)` defined in the `snapSJOT` module of npm
package `snapsjot`. This converts JSON data and JS values to a SJOT schema,
see npm package [snapsjot](https://www.npmjs.com/package/snapsjot).
SJOT types include the basic JSON types `"string"`, `"number"`, `"boolean"`,
`"null"`, `"object"`, and `"array"` but also more specific types, such as
`"char[0,6]"`, `"float"`, integer ranges `"0..10"` and float ranges
`"0.0..10.0"`, and arrays of these, such as `"string[1,10]"` for and array of 1
to 10 strings and `"1..10[3][4]"` for an array of 4 arrays of 3 integers
between 1 and 10.
Object types are just the `{}`-brackets with members, as an "inline" style.
To create an array using the "inline style" (without requiring named types in
strings), simply use a pair of `[` `]` brackets to enclose the type. For
example, `[{"id":"number"}]` is an array of objects with numeric `id`
properties.
The json-schema.org example schema actually defines an array of products. In
SJOT, the product array type is referred to by a SJOT type reference
`"http://example.com/product.json#product[]"`. This reference uses an array
annotation and this suffices to describe and validate a JSON array of products.
Type referencing of the form *URI#name* is used to refer to a named type in
a schema, such as `http://example.com/geo.json#location` that references the
`location` object type defined in the `http://example.com/geo.json` schema.
As you can see, a SJOT type reference is very simple and clean. A type
reference string contains a `#` reference to a global type in a schema without
requiring deeper multi-hop paths (no JSON pointers or paths).
A reference to a type in the current schema (e.g. that has no `@id` attribute
property) is simply written as *#name* with an empty URI. A reference to the
root type in a schema is simply written as *URI#* and *#* for the root type of
the current schema.
Multiple schemas can be combined in a list of schemas, each schema with a
unique `@id`. Types can be referenced between these schemas and the schemas in
the array are used to validate JSON data. See the [examples](#examples) in
this article and check out our live [demo](get-sjot.html#demo) of SJOT in
action.
SJOT can be translated to JSON schema draft v4 without loss of details. See
the [SJOT to JSON schema converter.](get-sjot.html#demo)
SJOT schema basics {#basics}
------------------
A SJOT schema is a dictionary with named types and a `@root` type:
[json]
{
"@root": type,
"SomeType": type,
"AnotherType": type,
...
}
Each type is either atomic (i.e. a primitive type), an object type, an array
type, a reference to a named type, or unions thereof to define alternate
choices of types. Types are explained in the next section.
The `@root` property indicates the root type of the JSON document to validate.
For example:
[json]
{ "@root": "string[0,999]" }
This schema validates JSON arrays of strings. This array can contain up to
999 items.
If the schema has only one type, then `@root` can be replaced by any name of
your choosing:
[json]
{ "mystrings": "string[0,999]" }
However, if the schema has multiple named types, then a `@root` is mandatory
to avoid ambiguity.
The following example defines a `@root` document type that refers to a `Person`
type using the `#Person` type reference, a `Name` string type and a `Person`
object with `firstname` and `lastname` properties:
[json]
{
"@id": "http://example.com/sjot.json",
"@root": "#Person",
"Name": "string",
"Person": {
"firstname": "#Name",
"lastname": "#Name"
}
}
A SJOT schema may optionally include an `@id` property to declare a *namespace
URI* to identify the schema. Using a URL to identify a schema can be useful
when external schemas must be loaded by a validator.
Note that the `firstname` and `lastname` of a `Person` object refer to a `Name`
instead of just a string. This is useful, because if we decide later to
restrict the string content of names then we only have to do this once, for
example by chaning the `Name` type as follows:
[json]
{
...
"Name": "(\\w(\\w|\\s)*)",
...
}
where `\\w` matches a letter or digit and `\\s` matches a space.
SJOT schema types {#types}
-----------------
SJOT has a list of built-in primitive types that are commonly used, besides
`"boolean"`, `"number"`, `"string"`, and `"null"`. Objects, arrays, sets,
tuples, and unions are simply defined in a SJOT schema using an inline style.
It only takes two tables to list all SJOT schema constructs. A SJOT type is one of:
[json]
"any" any type (wildcard)
"atom" any non-null primitive type (boolean, number, or string)
"boolean" Boolean with value true or false
"true" fixed value true
"false" fixed value false
"byte" 8-bit integer
"short" 16-bit integer
"int" 32-bit integer
"long" 64-bit integer
"ubyte" 8-bit unsigned integer
"ushort" 16-bit unsigned integer
"uint" 32-bit unsigned integer
"ulong" 64-bit unsigned integer
"integer" integer (unconstrained)
"float" single precision decimal
"double" double precision decimal
"number" decimal number (unconstrained)
"n..m" inclusive numeric range (n, m are optional integer/decimal values)
"" exclusive numeric range (n, m are optional integer/decimal values)
"n,m..k,l" numeric enumeration with ranges (choice of integer/decimal values)
"string" string
"base64" string with base64 content
"hex" string with hexadecimal content
"uuid" string with UUID content, optionally starting with urn:uuid:
"date" string with RFC 3339 date YYYY-MM-DD
"time" string with RFC 3339 time and optional time zone HH-MM-SS[.s][[+|-]HH:MM|Z]
"datetime" string with RFC 3339 datetime and optional time zone
"duration" string with ISO-8601 duration PnYnMnDTnHnMnS
"char" string with a single character (ASCII, Unicode, UTF-8, etc.)
"char[n,m]" string of n to m characters (n, m are optional)
"(regex)" string that matches the regex
"type[]" array of typed values, shorthand for [ type ]
"type[n,m]" array of n to m typed value, shorthand for [ n, type, m ]
"type{}" set of atoms (array of unique atoms)
"type{n,m}" set of n to m atoms (n, m are optional)
"#name" reference to a named type in the current schema
"URI#name" reference to a named type in schema "@id": "URI"
"object" object, same as {}
"array" array, same as []
"null" fixed value null
[ type ] array of typed values
[ n, type, m ] array of n to m typed values (n, m, type are optional)
[ type, ..., type ] tuple of typed values
[[ type, ..., type ]] union (choice) of types
{ "name": type, ... } object with typed properties
The property names of object types can be annotated to make them optional or
match a pattern:
[json]
"name" property is required
"name?" property is optional
"name?value" property with a default value (primitive types only!)
"(regex)" property name(s) that match the regex
If the character `?` is to be part of a property name, then we write it as a
regex `(who\\?)`, with a double backslash to escape the `?` (a single backslash
will be removed by most JSON parsers). Likewise, if a property name starts
with a `(` then we write it as a regex.
### Objects with required, optional, and default properties
An example object type with a required, optional, and default property is:
[json]
{
"Widget": { ← a widget has...
"id": "string", ← a required id
"tags?": "string{1,}", ← an optional non-empty array of unique string tags
"counter?1": "ulong" ← an optional counter with default value 1
}
}
To disallow additional properties, add the `"@final": true` attribute property.
To permit optional properties to occur depending on other optional properties,
see the SJOT [dependencies](#deps) described further below.
An object with any properties is `"object"` or just `{}`. An empty object that
does not permit any properties is `{ "@final": true }`.
### Regex properties and values
Regex anchoring with `^` and `$` is unnecessary (JSON and SJOT are language and
regex library neutral: regex patterns match entire strings). For example,
this dictionary object maps words to words:
[json]
{ "(\\w+)": "(\\w+)" }
To match strings partially, simply use a `.*` at the ends of the regex.
Additional types with constraints can be easily added to a SJOT schema, for
example the ISO 6709 Annex H latitude and longitude type values (see the Google
[JSON Style Guide](https://google.github.io/styleguide/jsoncstyleguide.xml?showone=Latitude/Longitude_Property_Values#Latitude/Longitude_Property_Values)):
[json]
{
"@id": "http://example.com/iso-6709.json",
"@note": "ISO 6709 Annex H latitude and longitude location",
"LatLon": "([+- ]\\d{2}(.\\d+)?[+- ]\\d{3}(.\\d+)?)"
}
Special string types such as ID, URI, email, hostname, and so on can be easily
defined with a regex and put in a schema for reuse.
### Tuples
A tuple is a fixed-length list of values, such as `[ "point", true ]`, which is
defined by the tuple type:
[json]
[ "string", "boolean" ]
### Arrays and sets
Arrays of named types are simply defined by `"type[]"` without bounds and
`"type[n,m]"` with bounds. The lower and upper bounds are optional, so
`"type[n,]"` and `"type[,m]"` can be used. Use `"type[n]"` for a fixed-size
array.
The inline style for arrays is `[type]` without bounds and `[n, type, m]` with
bounds, where `n` and `m` are non-negative integers. The lower and upper
bounds are optional, so `[n, type]` and `[type, m]` can be used. The `type` is
also optional and is `"any"` when omitted. Thus, `[]` is an array of any type
with any length, `[0]` is an empty array, `[2]` is an array with two items, and
`[1,3]` is an array of one to three items of any type.
For example, extending a Widget object type example to include an array of
quantity-price objects:
[json]
{
"Widget": { ← a widget has...
"id": "string", ← a required id
"tags?": "string{1,}", ← an optional non-empty array of unique string tags
"counter?1": "ulong", ← an optional counter with default value 1
"pricing?": [ ← an optional array of quantity-price objects
{
"quantity": "1..", ← quantity
"price": "<0.0.." ← price per quantity
}
]
}
}
Sets of named types `"type{}"` without bounds and `"type{n,m}"` with bounds are
essentially arrays of atomic values that are unique. The lower and upper
bounds are optional.
Uniqueness of atomic values is well defined. By contrast, object equality is
often semantic instead of structural. That is, two objects may still be
considered equivalent when structurally different, such as when extra
properties are to be ignored. Therefore, SJOT does not admit sets of
non-atomic values. This requirement makes sorting stable and validation of
sets (with sorting) fast.
### Enumerations
To enumerate numbers for a numeric type, use constants and ranges:
[json]
"Composite": "4,6,8..10,12,14..16"
To enumerate strings, use regex alternations:
[json]
"Color": "(RED|GREEN|YELLOW|BLUE)"
Enumerations of mixed types are modeled with a union:
[json]
"TrueOrColorOrByte": [[ "true", "(RED|GREEN|YELLOW|BLUE)", "byte" ]]
### Unions
A union of types describes the range of possible types that a value may have.
For example, this union represents a string or a number value:
[json]
[[ "string", "number" ]]
Array types and object types in the union must be *distinct*. Objects are
distinct if they do not share properties. For example, the following union has
two distinct object types:
[json]
[[ { "a": "number" }, { "b": "string" } ]]
To combine objects that are not distinct in a union, you should define new
objects that use a new outer property name that acts as a unique tag:
[json]
[[
{ "t1": { "a": "string", "b": "number" } },
{ "t2": { "b": "string" } }
]]
Why is this recommended? The goal of SJOT is to make validation fast and
scalable with predictable validation times, similar to XML schema validators
for XML data bindings. Therefore, the SJOT validator must be able to
determine the type of the value efficiently among the choices in the union,
*using constant algorithmic complexity*. By contrast, JSON schema's "oneOf"
and "anyOf" are not always efficient because the validator may have to revisit
the data multiple times.
This recommendation also enhances readability of the JSON data by design.
Consider a counter example where we have a choice of two distinct objects:
[json]
{ "data": [ a long array of objects ], "id": 456 }
and
[json]
{ "data": [ a long array of objects ], "date": "01-01-2017" }
Since both objects have a `data` array, they overlap. By just looking at the
JSON text, one has to search after the array to find the potentially
distinguishing properties. This is not acceptable from a performance point of
view. A compounding problem is that JSON does not require properties to be
ordered in any way, so there is no guarantee to implement a fast object
identification check.
A tag is needed to distinguish these objects properly, making them immediately
recognizable and distinct:
[json]
{ "locations": { "data": [ a long array of objects ], "id": 456 } }
and
[json]
{ "invoices": { "data": [ a long array of objects ], "date": "01-01-2017" } }
Arrays in a union are distinct if the item type of the arrays are distinct.
This takes care of notorious problems with JSON schema when using "oneOf"
instead of "anyOf" for type choices. A "oneOf" over *M* arrays of length *N*
may require *M* x *N* time to validate while SJOT takes at most *M*+*N* time.
Worse, validation with this JSON schema "oneOf" fails for an empty array
because it matches all arrays in the "oneOf" (surprise!).
You may have guessed by now that a union is a smart combination of "oneOf" and
"anyOf". The validator applies "anyOf" semantics for efficiency, but the
restriction on distinct types essentially force "oneOf" semantics by avoiding
ambiguity.
Finally, unions should not be nested, either directly or indirectly via a type
reference to another union or array of unions.
### Type references
To refer to a named type we use a SJOT type reference of the form URI#name* or
*#name*. The first form refers to the named type in the schema identified by
its `@id` and URI value and the second form refers the current schema. If the
reference is to the `@root` type then we use *URI#* and just *#*, respectively.
For example, a linked list of numbers can be very compactly defined as:
[json]
{ "@root": { "value": "number", "next?": "#" } }
Spaghetti references are not allowed: a type reference must refer to a type and
that type cannot directly be another referenced type.
SJOT in JSON {#embed}
------------
A SJOT schema can be embeded within a JSON object by using the `@sjot`
property. The embedded schema describes and validates that object. For
example:
[json]
{
"@sjot": {
"Person": {
"@note": "Person with a first name and a last name",
"firstname": "string",
"lastname": "string"
}
},
"firstname": "Jason",
"lastname": "Bourne"
}
When embedded, the SJOT schema should have only one type or define a `@root`
object type (if several types are defined) that defines the JSON document
content. In this example the `Person` object type describes the content. The
JSON content is valid because it includes the required `firstname` and
`lastname` properties of a `Person` object type.
An embedded SJOT may refer to an external schema's root using `URL#`. For
example, the same object above with a schema reference:
[json]
{
"@sjot": "http://example.com/sjot.json#",
"firstname": "Jason",
"lastname": "Bourne"
}
The `@sjot` URL points to a SJOT schema that has a `Person` object type as the
root, such as the SJOT schema that we [described earlier](#basics) in this
article.
An embedded SJOT may refer to a specific type in a schema:
[json]
{
"@sjot": "http://example.com/sjot.json#Person",
"firstname": "Jason",
"lastname": "Bourne"
}
When you invoke the validator with a specific type and schema, then only that
type and schema are used to validate the data. Use `null` as a type when
invoking the validator to permit an embedded `@sjot` to override the type.
A `@sjot` in a JSON object may occur anywhere JSON, not just the root-level
object.
A `@sjot` may contain an array of schemas, each identified with a unique `@id`.
SJOT attribute properties {#props}
-------------------------
A `@sjot` attribute property of an object in JSON contains an embedded SJOT
that defines the JSON object. An embedded `@sjot` value can be a type
reference to a SJOT schema. If multiple types are defined in the embedded SJOT
schema, the type that defines the JSON object should be named `@root`.
A `@id` attribute property in a SJOT schema identifies the schema by a URI
namespace string.
A `@note` attribute property can be added to a SJOT schema and to the object
types that the schema defines. The `@note` value should be a string.
A `@root` attribute property refers to the root type of the schema. An
embedded SJOT should have a `@root` attribute property or the schema should
define only one type.
A `@one`, `@any`, `@all`, or `@dep` attribute property of an object type in a
SJOT schema restricts the use of optional object properties. See the SJOT
[dependencies](#deps) described further below.
A `@extends` attribute property of an object type in a SJOT schema introduces
a derived object type. A derived object type includes the properties of a base
object type. We will discuss the use of base and derived object types below.
A `@final` attribute property declares an object type final and it cannot be
extended. Also extra properties for this object in JSON are not permitted.
SJOT base and derived object types {#extend}
----------------------------------
You can extend a base object by adding properties to define a derived object.
The `@extends` attribute property in an object type refers to a base object
type that is extended. For example:
[json]
{
"@id": "http://www.example.com/sjot.json",
"@note": "Schema to store personal information",
"Person": {
"@note": "Person with a first name and a last name",
"firstname": "string",
"lastname": "string"
},
"PersonDetails": {
"@note": "Person with optional age and gender",
"@extends": "http://www.example.com/sjot.json#Person",
"age?": "0..",
"gender?": "(MALE|FEMALE)"
}
}
The `age?` property is optional and has a non-negative integer value. The
`gender?` property is optional and has one of the two string values `MALE` or
`FEMALE`.
When creating derived object types, it is not permitted to override the base
properties. Only new properties can be added that are not already in the base
object type to create a derived object type.
This ensures that a derived object can be used in place of a base object in
JSON and will pass validation by ignoring the extra properties in the derived
object. This permits upgrading of a JSON API with backward compatibility to a
base API.
A derived object type can change a base property from optional to required by
using a `@one` singleton propset with that property name.
SJOT final object types {#final}
----------------------
A `@final` object cannot have any extra properties that are not defined in the
schema. Consider the `PersonDetails` example from the previous example but now
declared `@final`:
[json]
{
"PersonDetails": {
"@note": "Person with optional age and gender",
"@extends": "http://www.example.com/sjot.json#Person",
"@final": true,
"age?": "0..",
"gender?": "(MALE|FEMALE)"
}
}
Additional properties that are used in a JSON `PersonDetails` object will cause
the validator to reject this JSON content.
SJOT any, one, and all dependencies {#deps}
-----------------------------------
When object type properties are optional, you can make their use dependent on
the presence of other properties in the object. You can enforcing one property of a
set of properties to be present. Or force any property of a set to be present.
Or all properties as a group to be present or none of that group. More
specific property dependencies can be enforced as well.
### SJOT one
The SJOT `@one` attribute property of an object type is a list of sets of
object property names. Each property set defines the properties that should be
exclusive, meaning only one of the properties may be present.
For example, the `choices` object type defined below has one of the properties
`a`, `b`, or `c`, and one of the properties `x` or `y`:
[json]
{
"choices": {
"a?": "int",
"b?": "int",
"c?": "int",
"x?": "float",
"y?": "float",
"@one": [
[ "a", "b", "c" ],
[ "x", "y" ]
]
}
}
The property sets in the `@one` list should be mutually disjoint and only refer
to properties that are optional (without default values) in the schema.
### SJOT any
The SJOT `@any` attribute property of an object type is a list of sets of
object property names. Each property set defines the properties of which one
or more should be used in this object.
For example, the `anyabc` object type defined below must have at least one of
the properties `a`, `b`, and `c` and therefore cannot be empty:
[json]
{
"anyabc": {
"a?": "int",
"b?": "int",
"c?": "int",
"@any": [
[ "a", "b", "c" ]
]
}
}
The property sets in the `@any` list should be mutually disjoint and only refer
to properties that are optional (without default values) in the schema.
### SJOT all
The SJOT `@all` attribute property of an object type is a list of sets of
object property names. Each property set defines which properties should all
be included when at least one of them is used, meaning that all properties
should be present or none of them at all.
For example, the `allornone` object type defined below must have both of the
properties `x` and `y` or none of them:
[json]
{
"allornone": {
"x?": "int",
"y?": "int",
"@all": [
[ "x", "y" ]
]
}
}
The property sets in the `@all` list should be mutually disjoint and only refer
to properties that are optional (without default values) in the schema.
### SJOT dep
The SJOT `@dep` attribute property of an object type enforces properties to be
present when a specific property is present.
For example, the `ifxthenyz` object type defined below must have properties `y`
and `z` if property `x` is present:
[json]
{
"ifxthenyz": {
"x?": "int",
"y?": "int",
"z?": "int",
"@dep": {
"x": [ "y", "z" ]
}
}
}
To simplify this notation, if a property list has only one property, the
property name can be directly used instead of the singleton list.
The property sets in each `@dep` list should only refer to properties that are
optional (without default values) in the schema.
Note that the `@all` attribute property enforces the *N* dependencies for a
group of *N* properties that are all dependent on each other.
SJOT validation {#validation}
---------------
Validation proceeds recursively over objects, arrays, and tuples.
Primitive values (atoms) are verified against the value type constraints that
are imposed on a value by using the type information in the SJOT schema.
The property names of an object are matched against the property names of a
SJOT object type. For each matching property name the value is recursively
validated.
If a property is required but is absent, validation fails.
If a property is optional and is absent or its value is `null`, validation
succeeds, meaning that `null` is equivalent to absent for optional properties.
In this case the `null` property can be deleted by the validator.
If an optional property has a default value and is absent or its value is
`null`, the default value is assumed and the default value can be assigned to
this property by the validator.
The `@one`, `@any`, `@all`, and `@dep` constraints on object properties is
enforced. For the `@one` constraints, exactly one property must occur for each
property set specified. For the `@any` set of properties at least one of the
properties must occur for each property set specified. For the `@all`
constraints, all or none of the properties must occur for each property set
specified. For the `@dep` constraints, if an optional property is present then
the properties in the specified property set must all be present.
Extra properties of an object are ignored unless the object type is `@final`.
Validation fails when extra properties are present in a final object.
An array is validated by checking constraints on its length and the uniqueness
of atomic items in case of a set. In case of a set of atoms `atom{}`, it is
assumed that integers and floating point values are compared based on their
mathematical value, not their type. So a set cannot contain both 0 and 0.0.
A `null` value in an array is converted when validated against a primitive
type. The result is `false` for Boolean, `0` for numeric types, and `""` for
string types. An array of objects, arrays, or tuples cannot contain `null`
values and triggers a validation error.
A tuple is validated by validating its members, with the same validation rule
for `null` as for arrays stated above. Tuple sizes are fixed. Validation
fails when tuples are not of the correct size.
An object that is validated against the types `any` or `object` is validated
using its embedded `@sjot` schema, when present.
SJOT examples {#examples}
-------------
### Vehicle data with embedded schema
[json]
{
"@sjot": {
"vehicle": {
"color?": "(WHITE|GRAY|BLACK)",
"rgb?": "([0-9a-fA-F]{6})",
"make": "string",
"year?": "1970..",
"@one": [
[ "color", "rgb" ]
]
}
},
"rgb": "D71E1E",
"make": "Honda",
"year": 2006
}
### Product catalog with embedded schemas
[json]
{
"@sjot": [
{
"@id": "http://example.com/product.json",
"@note": "Company product catalog",
"@root": {
"products": "http://example.com/product.json#product[]"
},
"product": {
"@note": "A company product",
"id": "number",
"name": "string",
"price": "<0.0..",
"tags?": "string{1,}",
"dimensions?": {
"length": "number",
"width": "number",
"height": "number"
},
"warehouseLocation?": "http://example.com/geo.json#location"
}
},
{
"@id": "http://example.com/geo.json",
"location": {
"latitude": "float",
"longitude": "float"
}
}
],
"products": [
{
"id": 1,
"name": "A green door",
"price": 12.50
},
{
"id": 2,
"name": "An ice sculpture",
"price": 12.50,
"tags": ["cold", "ice"],
"dimensions": {
"length": 7.0,
"width": 12.0,
"height": 9.5
},
"warehouseLocation": {
"latitude": -78.75,
"longitude": 20.4
}
}
]
}
SJOT chameleon objects: trick or treat? {#trick}
---------------------------------------
A tricky situation arises when a derived object type extends a base object type
that is defined in another schema.
Assuming that one or more of the base object properties refer to a *type* in
the current base schema by using a local *#type* reference, then the scope of
these type references changes as the base object properties are literally
imported into the derived object.
We call this type of base object a *chameleon object*. A chameleon object
(ab)uses local type references and tricks its properties into changing shape!
An example chameleon object is the `Base` object type in the top SJOT schema of
the following two SJOT schemas:
[json]
[
{
"@id": "http://example.com/base.json",
"Base": {
"id": "#ID"
},
"ID", "any"
},
{
"@id": "http://example.com/derived.json",
"Derived": {
"@extends": "http://example.com/base.json#Base"
},
"ID": "string"
}
]
The `Base` object `id` propery changes type, from `"any"` to `"string"` when
imported into `Derived` with the SJOT `@extends` attribute property. To see
why, consider the derived object that results after the import and after
substituting the `#ID` type reference:
[json]
{
"@id": "http://example.com/derived.json",
"Derived": {
"id": "#ID"
},
"ID": "string"
}
Chameleons allow us to define *type generics* that change shape via local type
references. A real treat to the expressiveness of SJOT.
However, danger lurks here! When a JSON API relies on a base object with fixed
property types and this base is a chameleon, then the use of a derived
object in place of the base object may cause validation failures.
A local *#type* reference should only be used when the current schema has no
`@id` so this schema cannot be referenced. If an `@id` is used and the
resulting chameleon type generics are extended, then it makes sense that local
type references should be generic types, such as `any`, `atom`, or `object`.
SJOT versus JSON schema
-----------------------
- JSON schema is **verbose**, doubling the nesting level compared to the JSON
content it describes. By contrast, SJOT schema levels are one-on-one with
JSON data.
- JSON schema validation performance is **not scalable**, because validation
cost may exceed linear time processing cost (meaning linear in the size of
the input), in the worst case taking exponential time or memory to validate
constraints, see the [exploding JSON Schema states](#JSON-schema-sucks)
examples. By contrast, SJOT validators are very fast and scalable. The
asymptotic running time of JSON validity checking is linear in the size of
the given JSON data.
- JSON schema permits constraining primitive type value ranges, but offers
**few predeclared primitive types** to choose from when almost all
programming languages offer byte, short, int, float and double precision
types. You can use minimum, maximum and multipleOf to constrain the decimal
representation in JSON Schema, but we have to keep in mind that floating
point values are typically stored in IEEE 754 format and decimals are
rounded, therefore values such as 1234567890123.0099 also validate when
multipleOf is 0.01. Therefore, fractional constraints are not reliable. By
constrast, SJOT offers a wide choice of pre-defined types and value range
constraints work fine and are very simple to use in SJOT.
- JSON schema is **non-strict by default**, meaning that all object properties
are optional and any additional properties are permitted by default, that is,
schemas accept almost anything by default. For example, JSON with typos in
property names will not be rejected by a JSON Schema validator by default.
By contrast, SJOT is stricy by defailt.
- JSON schemas are **not extensible**, you can only add more constraints when
combining schemas. There is no easy way to achieve object inheritance.
Worse, combining schemas may lead to a schema that rejects too much or even
rejects everything. By contrast, SJOT objects are extensible or final.
- JSON schema **violates the encapsulation principle** because it permits
referencing local schema types via JSON Pointer such as nested objects, which
means that you cannot update local types without breaking all the schemas
that point to the updated local type structures. By contrast, SJOT groups
all types at the top level in the schema as a simple dictionary of named
types.
- JSON schema design **violates the orthogonality principle** for several
constructs. For example [ and ] can sometimes be used to indicate choices but
in other cases it cannot (perhaps oneOf should be used, but that has its own
problems).
- Checking if a JSON schema's constraints reject everything is an **NP-complete
problem**. Worse, constraints may depend on property values in the JSON
data, not just property occurrences. By contrast, the SJOT schema checker
verifies your schemas and detects blocking constraints.
- The **principle of least surprise** does not apply to JSON schema: a
construct may work well in one case when the same construct causes problems
elsewhere. For example, using oneOf to select among primitive types, say
"string" and "number" makes sense, but using oneOf to select schemas may not
always work and leads to surprising rejections. Consider the simple case
when we have a JSON empty array that matches both the "array of strings" and
"array of numbers" schemas!
Converting SJOT to JSON schema is easy and automatic with the tools included
with SJOT, try our [live demo](get-sjot.html#demo) to convert SJOT to JSON
schema and vice versa.
Want to give it a SJOT? {#ps}
-----------------------
SJOT for JS is licensed under the BSD3 and available for download from GitHub
[SJOT](https://github.com/Genivia/SJOT) and npm package
[sjot](https://www.npmjs.com/package/sjot).
In addition, the snapSJOT converter that creates SJOT schemas for JSON data is
available for download from GitHub [snapSJOT](https://github.com/Genivia/SJOT)
and npm package [snapsjot](https://www.npmjs.com/package/snapsjot).
Try a [live demo](get-sjot.html#demo) of SJOT and snapSJOT in action.
APPENDIX A: Exploding JSON Schema states {#JSON-schema-sucks}
----------------------------------------
The first "ping-pong" JSON schema example randomly alternates between a "ping"
and a "pong" schema for nested objects `x` until we find a boolean `y` that is
a final "pong":
[json]
{"x":{"x":{"x":{"x":{"x":{"x":{"y":true}}}}}}}
If the nesting level exceeds 16 then JSON schema validators can take minutes
(or crash) using the following schema:
[json]
{
"$schema" : "http://json-schema.org/draft-04/schema#",
"$ref": "#/definitions/ping",
"definitions": {
"ping": {
"type": "object",
"properties": {
"x": {
"anyOf": [
{ "$ref": "#/definitions/ping" },
{ "$ref": "#/definitions/pong" }
]
}
},
"additionalProperties": false
},
"pong": {
"type": "object",
"properties": {
"x": {
"anyOf": [
{ "$ref": "#/definitions/ping" },
{ "$ref": "#/definitions/pong" }
]
},
"y": { "type": "boolean" }
},
"additionalProperties": false
}
}
}
For the second example, let's implement a finite state machine in a JSON
schema. The JSON Schema has *N* definitions.
The "words" we validate with the schema are defined by the regular expression
`(a{N}|a(a|b+){0,N-1}b)*x` that describes a sequence of `a` and `b` ending in
`x`. The word `abbx` is represented by the JSON pointer `a/b/b/x` which is
`{"a":{"b":{"b":{"x":true}}}}`.
The first definition for "0" has the following schema:
[json]
{
"$schema": "http://json-schema.org/draft-04/schema#",
"$ref": "#/definitions/0",
"definitions": {
"0": {
"type": "object",
"properties": {
"a": { "$ref": "#/definitions/1" },
"x": { "type": "boolean" }
},
"additionalProperties": false
},
Then we add *N*-1 definitions `` to the schema enumerated "1", "2", "3",
... "*N*-1":
[json]
"": {
"type": "object",
"properties": {
"a": { "$ref": "#/definitions/+1" },
"b": {
"anyOf": [
{ "$ref": "#/definitions/0" },
{ "$ref": "#/definitions/" }
]
}
},
"additionalProperties": false
},
where "``+1" wraps back to "0" when `` is equal to *N*-1.
This "NFA" on a two-letter alphabet has *N* states, only one initial and one
final state. Its equivalent minimal DFA has 2^*N* (2 to the power *N*) states.
In the worst case, a validator that uses this JSON schema either takes 2^*N*
time or uses 2^*N* memory "cells" to validate the input.
[![To top](images/go-up.png) To top](#)
APPENDIX B: Tips and tricks {#tricks}
---------------------------
### What does SJOT stand for?
Schemas for JSON Objects.
To JS spelled backwards.
### How to define a schema for JSON when the JSON content may have alternate types
If the alternate types are distinguishable and you must use the same schema for
validation then use a union as the schema root:
[json]
{
"@root": [[ type1, type2, type3, ... ]]
}
### How to define a property with a ? in the name
Use a regex:
[json]
"(PropWithA\\?InItsName)": "string",
This regex property is optional. To make the property required, see below.
Use the same approach when a property name starts with a `(`.
### How to make regex properties required instead of optional
Regex properties are optional by design. If the property is required, add an
`@any` attribute property to force its presence:
[json]
"(PropWithA\\?InItsName)": "string",
"@any": [ ["PropWithA?InItsName"], ... ]
### How to define a property with a default empty string value
Because `null` is converted to an empty string when used as a string type, use
`null` as the default value for a property that needs an empty string default
value:
[json]
"name?null": "string"
By contrast, `"name?"` is an optional property without a default value.
### How to define a singleton tuple
Use unit lower and upper bounds:
[json]
[1, type, 1]
By contrast, `[type]` denotes an array of any length, not a singleton tuple.
### How to define an array of tuples
Use an array lower bound and/or upper bound:
[json]
[0, [type1, type2] ]
By constrast, `[[ type1, type2 ]]` denotes a union.
### How to define an object that rejects additional properties
Use the `@final` attribute property to restrict the object type:
[json]
{
"@final": true,
"name": "string"
}
This validates objects with a required `"name"` property that is a string and
rejects all objects that include other properties. An object type may have
regex properties, which means that additional properties are permitted when
they match the regex:
[json]
{
"@final": true,
"name": "string",
"(extra.*)": "any"
}
This permits additional properties with names that start with `"extra"`.
### How to define an empty object
Use the following:
[json]
{ "@final": true }
By contrast, `"object"` and `{}` denote extensible object types.
### How to define an empty array
Use the following:
[json]
[0]
By contrast, `"array"` and `[]` denote arrays of any type and of any length.
[![To top](images/go-up.png) To top](#)
Copyright (c) 2016, Robert van Engelen, Genivia Inc. All rights reserved.