OS adapter and sqlsh
The OS (operating system) adapter allows you to access data in your operating system and environment using SQL queries.
It aims to solve similar problems that have traditionally been solved using UNIX command pipelines, but with the power and type-safety of SQL.
The adapter also includes a wrapper called
sqlsh that allows you to execute
commands from your favorite shell.
The OS adapter launches processes, and is potentially a security loop-hole. It is included in Calcite’s “plus” module, which is not enabled by default. You must think carefully before enabling it in a security-sensitive situation.
We try to support all tables on every operating system, and to make sure that the tables have the same columns. But we rely heavily on operating system commands, and these differ widely. So:
- These commands only work on Linux and macOS (not Windows, even with Cygwin);
vmstathas very different columns between Linux and macOS;
pshave the same column names but semantics differ;
- Other commands work largely the same.
A simple example
Every bash hacker knows that to find the 3 largest files you type
This actually a pipeline of relational operations, each tuple represented
by line of space-separated fields. What if we were able to access the list of
files as a relation and use it in a SQL query? And what if we could easily
execute that SQL query from the shell? This is what
sqlsh launches a connection to Calcite whose default schema is the OS adapter.
It uses the JAVA lexical mode, which means that unquoted table and column names remain in the case that they were written. This is consistent with how shells like bash behave.
Shell meta-characters such as
) have to be treated with
care. Often adding a back-slash will suffice.
Tables and commands
The OS adapter contains the following tables:
du- Disk usage (based on
ps- Processes (based on
stdin- Standard input
files- Files (based on the
git_commits- Git commits (based on
vmstat- Virtual memory (based on
Most tables are implemented as views on top of table functions.
New data sources are straightforward to add; please contribute yours!
How many class files, and what is their total size? In
The back-slashes are necessary because
' are shell meta-characters.
How many files and directories? In
bash, you would use
sqlsh, use the
Which users have processes running? In
ps. qualifier and back-quotes are necessary because USER is a SQL reserved word.
Now a ‘top N’ problem: Which three users have the most processes? In
How’s my memory?
To find out what columns a table has, use
How many commits and distinct authors per year?
git_commits table is based upon the
git log command.
group by y is possible because
sqlsh uses Calcite’s
Print the stdin, adding a number to each line.
Example: output format
-o option controls output format.
provides a display of all current java process pids In
- spaced - spaces between fields (the default)
- headers - as spaced, but with headers
- csv - comma-separated values
- json - JSON, one object per row
- mysql - an aligned table, in the same format used by MySQL
The OS adapter was created in [CALCITE-1896] but is not complete.
Some ideas for further work:
- Allow ‘-‘and ‘.’ in unquoted table names (to match typical file names)
- Allow ordinal field references, for example ‘$3’. This would help for files
that do not have named fields, for instance
stdin, but you could use them even if fields have names. Also ‘$0’ to mean the whole input line.
- Use the file adapter, e.g.
select * from file.scott.empwould use the file adapter to open the file
- More tables based on git, e.g. branches, tags, files changed in each commit
select path, lineCount from git_ls_files cross apply wc(path)
sqlshcommand, or at least the java code underneath it, into sqlline