MySQL is the most popular Open Source SQL
database management system, is developed, distributed, and supported
by Oracle Corporation. InnoDB is a general-purpose storage engine that
balances high reliability and high performance in MySQL, since 5.6
InnoDB has become the default MySQL storage engine.
Calcite’s InnoDB adapter allows you to query the data based on InnoDB
data files directly as illustrated below, data files are also known as
.ibd files. It leverages the
innodb-java-reader. This
adapter is different from JDBC adapter which maps a schema in a JDBC
data source and requires a MySQL server to serve response.
With .ibd files and the corresponding DDLs, the InnoDB adapter acts
as a simple “MySQL server”: it accepts SQL queries and attempts to
compile each query based on InnoDB file access APIs provided by
innodb-java-reader.
It projects, filters and sorts directly in the InnoDB data files where
possible.
What’s more, with DDL statements, the adapter is “index aware”. It
leverages rules to choose the appropriate index to scan, for example,
using primary key or secondary keys to look up data, then it tries to
push down some conditions into storage engine. The adapter also
supports hints, so that users can tell the optimizer to use a
particular index.
A basic example of a model file is given below, this schema reads from
a MySQL “scott” database:
sqlFilePath is a list of DDL files, you can generate table
definitions by executing `mysqldump -d -u -p -h
` in command-line.
The file content of `/path/scott.sql` is as follows:
ibdDataFileBasePath is the parent file path of `.ibd` files.
Assuming the model file is stored as `model.json`, you can connect to
InnoDB data file to perform query via
[sqlline](https://github.com/julianhyde/sqlline) as follows:
We can query all employees by writing standard SQL:
While executing this query, the InnoDB adapter scans the InnoDB data
file `EMP.ibd` using primary key, also known as clustering B+ tree
index in MySQL, and is able to push down projection to underlying
storage engine. Projection can reduce the size of data fetched from
the storage engine.
We can look up one employee by filtering. The InnoDB adapter retrieves
all indexes through DDL file provided in `model.json`.
The InnoDB adapter recognizes that `empno` is the primary key and
performs a point-lookup by using the clustering index instead of a
full table scan.
We can also do range queries on the primary key:
Note that such query with acceptable range is usually efficient in
MySQL with InnoDB storage engine, because for clustering B+ tree
index, records close in index are close in data file, which is good
for scanning.
We can look up employee by secondary key. For example, in the
following query, the filtering condition is a field `ename` of type
`VARCHAR`.
The InnoDB adapter works well on almost all the commonly used data
types in MySQL, for more information on supported data types, please
refer to
[innodb-java-reader](https://github.com/alibaba/innodb-java-reader#3-features).
We can query by composite key. For example, given secondary index of
`DEPTNO_MGR_KEY`.
The InnoDB adapter leverages the matched key `DEPTNO_MGR_KEY` to push
down filtering condition of `deptno = 20 and mgr = 7566`.
In some cases, only part of the conditions can be pushed down since
there is a limitation in the underlying storage engine API; other
conditions remain in the rest of the plan. Given the following SQL,
only `deptno = 20` is pushed down.
`innodb-java-reader` only supports range queries with lower and upper
bound using an index, not fully `Index Condition Pushdown (ICP)`. The
storage engine returns a range of rows and Calcite evaluates the rest
of `WHERE` condition from the rows fetched.
For the following SQL, there are multiple indexes satisfying the
left-prefix index rule: the possible indexes are `DEPTNO_JOB_KEY`,
`DEPTNO_SAL_COMM_KEY` and `DEPTNO_MGR_KEY`. The InnoDB adapter chooses
one of them according to the ordinal defined in DDL; only the `deptno
= 20` condition is pushed down, leaving the rest of `WHERE` condition
handled by Calcite's built-in execution engine.
Accessing rows through secondary key requires scanning by secondary
index and retrieving records back to clustering index in InnoDB, for a
"big" scan, that would introduce many random I/O operations, so
performance is usually not good enough. Note that the query above can
be more performant by using `EPTNO_SAL_COMM_KEY` index, because
covering index does not need to retrieve back to clustering index. We
can force using `DEPTNO_SAL_COMM_KEY` index by hint as follows.
Hint can be configured in `SqlToRelConverter`, to enable hint, you
should register `index` HintStrategy on `TableScan` in
`SqlToRelConverter.ConfigBuilder`. Index hint takes effect on the base
`TableScan` relational node, if there are conditions matching the
index, index condition can be pushed down as well. For the following SQL,
although none of the indexes can be used, but by leveraging covering
index, the performance is better than full table scan, we can force to
use `DEPTNO_MGR_KEY` to scan in secondary index.
Ordering can be pushed down if it matches the natural collation of the index used.
## About time zone
MySQL converts `TIMESTAMP` values from the current time zone to UTC
for storage, and back from UTC to the current time zone for
retrieval. So in this adapter, MySQL's `TIMESTAMP` is mapped to
Calcite's `TIMESTAMP WITH LOCAL TIME ZONE`. The per-session time zone
setting can be configured in Calcite connection config `timeZone`,
which tells the MySQL server which time zone the `TIMESTAMP` value was
in. Currently the InnoDB adapter cannot pass the property to the
underlying storage engine, but you can specify `timeZone` in
`model.json` like below. Note that you only need to specify the
property if `timeZone` is set in connection config and it is different
from system default time zone where the InnoDB adapter runs.
## Limitations
`innodb-java-reader` has some prerequisites for `.ibd` files.
* The `COMPACT` and `DYNAMIC` row formats are supported. `COMPRESSED`,
`REDUNDANT` and `FIXED` are not supported.
* `innodb_file_per_table` should set to `ON`, `innodb_file_per_table`
is enabled by default in MySQL 5.6 and higher.
* Page size should set to `16K` which is also the default value.
For more information, please refer to
[prerequisites](https://github.com/alibaba/innodb-java-reader#2-prerequisites).
In terms of data consistency, you can think of the adapter as a simple
MySQL server, with the ability to query directly through InnoDB data
file, dump data by offloading from MySQL. If pages are not flushed
from InnoDB Buffer Pool to disk, then the result may be inconsistent
(the LSN in `.ibd` file might smaller than in-memory pages). InnoDB
leverages write ahead log in terms of performance, so there is no
command available to flush all dirty pages. Only internal mechanism
manages when and where to persist pages to disk, like Page Cleaner
thread, adaptive flushing, etc.
Currently the InnoDB adapter is not aware of row count and cardinality
of a `.ibd` data file, so it relies on simple rules to perform
optimization. If, in future, the underlying storage engine can provide
such metrics and metadata, this could be integrated into Calcite by
leveraging cost based optimization.