JSON/YAML models
Calcite models can be represented as JSON/YAML files. This page describes the structure of those files.
Models can also be built programmatically using the Schema
SPI.
Elements
Root
JSON
YAML
version
(required string) must have value 1.0
.
defaultSchema
(optional string). If specified, it is
the name (case-sensitive) of a schema defined in this model, and will
become the default schema for connections to Calcite that use this model.
schemas
(optional list of Schema elements).
types
(optional list of Type elements shared by all schemas).
Schema
Occurs within root.schemas
.
JSON
YAML
name
(required string) is the name of the schema.
type
(optional string, default map
) indicates sub-type. Values are:
-
map
for Map Schema -
custom
for Custom Schema -
jdbc
for JDBC Schema
path
(optional list) is the SQL path that is used to
resolve functions used in this schema. If specified it must be a list,
and each element of the list must be either a string or a list of
strings. For example,
JSON
YAML
declares a path with two elements: the schema ‘/usr/lib’ and the schema ‘/lib’. Most schemas are at the top level, and for these you can use a string.
materializations
(optional list of
Materialization) defines the tables
in this schema that are materializations of queries.
cache
(optional boolean, default true) tells Calcite whether to
cache metadata (tables, functions and sub-schemas) generated
by this schema.
-
If
false
, Calcite will go back to the schema each time it needs metadata, for example, each time it needs a list of tables in order to validate a query against the schema. -
If
true
, Calcite will cache the metadata the first time it reads it. This can lead to better performance, especially if name-matching is case-insensitive.
However, it also leads to the problem of cache staleness.
A particular schema implementation can override the
Schema.contentsHaveChangedSince
method to tell Calcite
when it should consider its cache to be out of date.
Tables, functions, types, and sub-schemas explicitly created in a schema are not affected by this caching mechanism. They always appear in the schema immediately, and are never flushed.
Map Schema
Like base class Schema, occurs within root.schemas
.
JSON
YAML
name
, type
, path
, cache
, materializations
inherited from
Schema.
tables
(optional list of Table elements)
defines the tables in this schema.
functions
(optional list of Function elements)
defines the functions in this schema.
types
defines the types in this schema.
Custom Schema
Like base class Schema, occurs within root.schemas
.
JSON
YAML
name
, type
, path
, cache
, materializations
inherited from
Schema.
factory
(required string) is the name of the factory class for this
schema. Must implement interface
org.apache.calcite.schema.SchemaFactory
and have a public default constructor.
operand
(optional map) contains attributes to be passed to the
factory.
JDBC Schema
Like base class Schema, occurs within root.schemas
.
JSON
YAML
name
, type
, path
, cache
, materializations
inherited from
Schema.
jdbcDriver
(optional string) is the name of the JDBC driver class. If not
specified, uses whichever class the JDBC DriverManager chooses.
jdbcUrl
(optional string) is the JDBC connect string, for example
“jdbc:mysql://localhost/foodmart”.
jdbcUser
(optional string) is the JDBC user name.
jdbcPassword
(optional string) is the JDBC password.
jdbcCatalog
(optional string) is the name of the initial catalog in the JDBC
data source.
jdbcSchema
(optional string) is the name of the initial schema in the JDBC
data source.
Materialization
Occurs within root.schemas.materializations
.
JSON
YAML
view
(optional string) is the name of the view; null means that the table
already exists and is populated with the correct data.
table
(required string) is the name of the table that materializes the data in
the query. If view
is not null, the table might not exist, and if it does not,
Calcite will create and populate an in-memory table.
sql
(optional string, or list of strings that will be concatenated as a
multi-line string) is the SQL definition of the materialization.
Table
Occurs within root.schemas.tables
.
JSON
YAML
name
(required string) is the name of this table. Must be unique within the schema.
type
(optional string, default custom
) indicates sub-type. Values are:
-
custom
for Custom Table -
view
for View
columns
(list of Column elements, required for
some kinds of table, optional for others such as View)
View
Like base class Table, occurs within root.schemas.tables
.
JSON
YAML
name
, type
, columns
inherited from Table.
sql
(required string, or list of strings that will be concatenated as a
multi-line string) is the SQL definition of the view.
path
(optional list) is the SQL path to resolve the query. If not
specified, defaults to the current schema.
modifiable
(optional boolean) is whether the view is modifiable.
If null or not specified, Calcite deduces whether the view is modifiable.
A view is modifiable if contains only SELECT, FROM, WHERE (no JOIN, aggregation or sub-queries) and every column:
- is specified once in the SELECT clause; or
- occurs in the WHERE clause with a
column = literal
predicate; or - is nullable.
The second clause allows Calcite to automatically provide the correct value for
hidden columns. It is useful in multi-tenant environments, where the tenantId
column is hidden, mandatory (NOT NULL), and has a constant value for a
particular view.
Errors regarding modifiable views:
- If a view is marked
modifiable: true
and is not modifiable, Calcite throws an error while reading the schema. - If you submit an INSERT, UPDATE or UPSERT command to a non-modifiable view, Calcite throws an error when validating the statement.
- If a DML statement creates a row that would not appear in the view
(for example, a row in
female_emps
, above, withgender = 'M'
), Calcite throws an error when executing the statement.
Custom Table
Like base class Table, occurs within root.schemas.tables
.
JSON
YAML
name
, type
, columns
inherited from Table.
factory
(required string) is the name of the factory class for this
table. Must implement interface
org.apache.calcite.schema.TableFactory
and have a public default constructor.
operand
(optional map) contains attributes to be passed to the
factory.
Stream
Information about whether a table allows streaming.
Occurs within root.schemas.tables.stream
.
JSON
YAML
stream
(optional; default true) is whether the table allows streaming.
history
(optional; default false) is whether the history of the stream is
available.
Column
Occurs within root.schemas.tables.columns
.
JSON
YAML
name
(required string) is the name of this column.
Function
Occurs within root.schemas.functions
.
JSON
YAML
name
(required string) is the name of this function.
className
(required string) is the name of the class that implements this
function.
methodName
(optional string) is the name of the method that implements this
function.
If methodName
is specified, the method must exist (case-sensitive) and Calcite
will create a scalar function. The method may be static or non-static, but
if non-static, the class must have a public constructor with no parameters.
If methodName
is “*”, Calcite creates a function for every method
in the class.
If methodName
is not specified, Calcite looks for a method called “eval”, and
if found, creates a table macro or scalar function.
It also looks for methods “init”, “add”, “merge”, “result”, and
if found, creates an aggregate function.
path
(optional list of string) is the path for resolving this function.
Type
Occurs within root.types
and root.schemas.types
.
JSON
YAML
name
(required string) is the name of this type.
type
(optional) is the SQL type.
attributes
(optional) is the attribute list of this type.
If attributes
and type
both exist at the same level,
type
takes precedence.
Lattice
Occurs within root.schemas.lattices
.
JSON
YAML
name
(required string) is the name of this lattice.
sql
(required string, or list of strings that will be concatenated as a
multi-line string) is the SQL statement that defines the fact table, dimension
tables, and join paths for this lattice.
auto
(optional boolean, default true) is whether to materialize tiles on need
as queries are executed.
algorithm
(optional boolean, default false) is whether to use an optimization
algorithm to suggest and populate an initial set of tiles.
algorithmMaxMillis
(optional long, default -1, meaning no limit) is the
maximum number of milliseconds for which to run the algorithm. After this point,
takes the best result the algorithm has come up with so far.
rowCountEstimate
(optional double, default 1000.0) estimated number of rows in
the lattice
tiles
(optional list of Tile elements) is a list of
materialized aggregates to create up front.
defaultMeasures
(optional list of Measure elements)
is a list of measures that a tile should have by default.
Any tile defined in tiles
can still define its own measures, including
measures not on this list. If not specified, the default list of measures is
just ‘count(*)’:
JSON
YAML
statisticProvider
(optional name of a class that implements
org.apache.calcite.materialize.LatticeStatisticProvider)
provides estimates of the number of distinct values in each column.
You can use a class name, or a class plus a static field. Example:
If not set, Calcite will generate and execute a SQL query to find the real value, and cache the results.
See also: Lattices.
Tile
Occurs within root.schemas.lattices.tiles
.
YAML
dimensions
(list of strings or string lists, required, but may be empty)
defines the dimensionality of this tile.
Each dimension is a column from the lattice, like a GROUP BY
clause.
Each element can be either a string
(the unique label of the column within the lattice)
or a string list (a pair consisting of a table alias and a column name).
measures
(optional list of Measure elements) is a list
of aggregate functions applied to arguments. If not specified, uses the
lattice’s default measure list.
Measure
Occurs within root.schemas.lattices.defaultMeasures
and root.schemas.lattices.tiles.measures
.
JSON
YAML
agg
is the name of an aggregate function (usually ‘count’, ‘sum’, ‘min’,
‘max’).
args
(optional) is a column label (string), or list of zero or more column
labels
Valid values are:
- Not specified: no arguments
- null: no arguments
- Empty list: no arguments
- String: single argument, the name of a lattice column
- List: multiple arguments, each a column label
Unlike lattice dimensions, measures can not be specified in qualified format, {@code [“table”, “column”]}. When you define a lattice, make sure that each column you intend to use as a measure has a unique label within the lattice (using “{@code AS label}” if necessary), and use that label when you want to pass the column as a measure argument.