Apache Geode adapter

For instructions on downloading and building Calcite, start with the tutorial.

Optionally: add -Puberjdbc to your maven build to create a single self-contained Geode JDBC adapter jar.

Once you’ve managed to compile the project, you can return here to start querying Apache Geode with Calcite. First, we need a model definition. The model gives Calcite the necessary parameters to create an instance of the Geode adapter. The models can contain definitions of materializations. The name of the tables defined in the model definition corresponds to Regions in Geode.

A basic example of a model file is given below:

{
  "version": "1.0",
  "defaultSchema": "geode",
  "schemas": [
    {
      "name": "geode_raw",
      "type": "custom",
      "factory": "org.apache.calcite.adapter.geode.rel.GeodeSchemaFactory",
      "operand": {
        "locatorHost": "localhost",
        "locatorPort": "10334",
        "regions": "Zips",
        "pdxSerializablePackagePath": ".*"
      }
    }
  ]
}

This adapter is targeted for Geode 1.3.x. The regions field allows to list (comma separated) all Geode regions to appear as relational tables.

Assuming this file is stored as model.json, you can connect to Geode via sqlline as follows:

$ ./sqlline
sqlline> !connect jdbc:calcite:model=model.json admin admin

sqlline will now accept SQL queries which access your Regions using OQL. However, you’re not restricted to issuing queries supported by OQL. Calcite allows you to perform complex operations such as aggregations or joins. The adapter will attempt to compile the query into the most efficient OQL possible by exploiting filtering, sorting and aggregation directly in Geode where possible.

For example, in the example Bookshop dataset there is a Regions BookMaster.

We can issue a SQL query to fetch the annual retail cost ordered by the cost:

sqlline> SELECT
           "yearPublished",
           SUM("retailCost") AS "totalCost"
         FROM "TEST"."BookMaster"
         GROUP BY "yearPublished"
         ORDER BY "totalCost";
+---------------+--------------------+
| yearPublished | totalCost          |
+---------------+--------------------+
| 1971          | 11.989999771118164 |
| 2011          | 94.9800033569336   |
+---------------+--------------------+

While executing this query, the Geode adapter is able to recognize that the projection, grouping and ordering can be performed natively by Geode.

The final OQL query given to Geode is below:

SELECT  yearPublished AS yearPublished,  SUM(retailCost) AS totalCost
FROM /BookMaster
GROUP BY yearPublished
ORDER BY totalCost ASC

Operations that are not supported in Geode are handled by Calcite itself. For example the following JOIN query on the same Bookshop dataset

sqlline> SELECT
           "i"."itemNumber",
           "m"."author",
           "m"."retailCost"
         FROM "TEST"."BookInventory" "i"
           JOIN "TEST"."BookMaster" "m" ON "i"."itemNumber" = "m"."itemNumber"
         WHERE "m"."retailCost" > 20;
+------------+----------------+------------+
| itemNumber | author         | retailCost |
+------------+----------------+------------+
| 123        | Daisy Mae West | 34.99      |
+------------+----------------+------------+

Will result in two separate OQL queries:

SELECT  itemNumber AS itemNumber, retailCost AS retailCost, author AS author
FROM /BookMaster
WHERE retailCost > 20

SELECT  itemNumber AS itemNumber
FROM /BookInventory

And the result will be joined in Calcite.

To select a particular item in Geode array field use the fieldName[index] syntax:

sqlline> SELECT
           "loc" [0] AS "lon",
           "loc" [1] AS "lat"
         FROM "geode".ZIPS

To select a nested fields use the map fieldName[nestedFiledName] syntax:

sqlline> SELECT "primaryAddress" ['postalCode'] AS "postalCode"
         FROM "TEST"."BookCustomer"
         WHERE "primaryAddress" ['postalCode'] > '0';

This will project BookCustomer.primaryAddress.postalCode value field.

The following presentations and video tutorials provide further dails about Geode adapter:

Enable SQL/JDBC Access to Apache Geode/GemFire Using Apache Calcite (GeodeSummit/SpringOne 2017)
Access Apache Geode/GemFire over SQL/JDBC
Explore Geode & GemFire Data with IntelliJ SQL/Database tool
Advanced Apache Geode Data Analytics with Apache Zeppelin over SQL/JDBC
Unified Access to Geode/Greenplum/…
Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode (ApacheCon Big Data, 2016)

There is still significant work to do in improving the flexibility and performance of the adapter, but if you’re looking for a quick way to gain additional insights into data stored in Geode, Calcite should prove useful.