Skip to content

Conversation

@sjrusso8
Copy link

@sjrusso8 sjrusso8 commented Jan 2, 2023

@andialbrecht & @mrmasterplan update my initial PR with the lexer changes. See below!

This PR will add frequently used Databricks and Delta table syntax. Databricks SQL has a lot of special operations to work with Delta tables which means a lot of new keywords.

Here is an example of standard operations of Databricks SQL for a created Delta table.

CREATE TABLE IF NOT EXISTS default.event 
(
    id INT, 
    name STRING, 
    description VARCHAR(30)
)
USING delta
LOCATION '/mnt/data/location'
PARTITIONED BY (id)
COMMENT 'this is a comment'
TBLPROPERTIES (
    'foo'='bar',
    delta.autoOptimize.optimizeWrite = true, 
    delta.autoOptimize.autoCompact = true
);

OPTIMIZE event 
WHERE date >= current_timestamp() - INTERVAL 1 day 
ZORDER BY (id);

VACUUM event;

CREATE BLOOMFILTER INDEX ON TABLE event 
FOR COLUMNS(description OPTIONS (fpp=0.1, numItems=50000000));

CREATE TABLE default.event_clone SHALLOW CLONE default.event;

DESCRIBE HISTORY event;

DESCRIBE TABLE EXTENDED event;

SHOW DETAIL event;

MSCK REPAIR TABLE event SYNC METADATA;

REFRESH TABLE event;

Then operating on those statements should parse out additional keywords like below.

statements = sqlparse.parse(sql)

for statement in statements:
    result = [v.value for v in sqlparse.sql.IdentifierList(statement.tokens).get_identifiers() if v.is_keyword]

    print(result)

>>> ['CREATE', 'TABLE', 'IF', 'NOT', 'EXISTS', 'USING', 'LOCATION', 'PARTITIONED BY', 'COMMENT', 'TBLPROPERTIES']
>>> ['OPTIMIZE', 'ZORDER BY']
>>> ['VACUUM']
>>> ['CREATE', 'BLOOMFILTER INDEX', 'ON', 'TABLE', 'FOR']
>>> ['CREATE', 'TABLE', 'SHALLOW CLONE']
>>> ['DESCRIBE', 'HISTORY']
>>> ['DESCRIBE', 'TABLE', 'EXTENDED']
>>> ['SHOW', 'DETAIL']
>>> ['MSCK REPAIR', 'TABLE', 'SYNC', 'METADATA']
>>> ['REFRESH', 'TABLE']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant