|
| 1 | +# **RFC-0010 for Presto** |
| 2 | + |
| 3 | +## Mixed case identifiers |
| 4 | + |
| 5 | +Proposers |
| 6 | + |
| 7 | +* Reetika Agrawal |
| 8 | + |
| 9 | +## Summary |
| 10 | + |
| 11 | +Improve Presto's identifier (schema, table & column names) handling to align with SQL standards, ensuring better interoperability with case-sensitive |
| 12 | +and case-normalizing databases while minimizing SPI-breaking changes. |
| 13 | + |
| 14 | +## Background |
| 15 | + |
| 16 | +Presto treats all identifiers as case-insensitive, normalizing them (typically to lowercase). This creates issues when |
| 17 | +querying databases that are case-sensitive (e.g., MySQL, PostgreSQL) or case-normalizing to uppercase (e.g., Oracle, |
| 18 | +DB2). Without a standard approach, identifiers might not match the actual names in the underlying data sources, leading |
| 19 | +to unexpected query failures or incorrect results. Additionally, inconsistent handling of delimited and non-delimited |
| 20 | +identifiers across different connectors further complicates cross-engine compatibility. |
| 21 | + |
| 22 | +The goal here is to improve interoperability with storage engines by aligning identifier handling with SQL standards |
| 23 | +while ensuring a seamless user experience. Ideally, the change should be implemented in a way that minimizes |
| 24 | +backward-compatibility-breaking changes to the SPI, allowing connectors to adopt the new approach without significant |
| 25 | +modifications as per their own requirements. |
| 26 | + |
| 27 | +### Proposed Plan |
| 28 | + |
| 29 | +Connectors handle identifiers in three ways: |
| 30 | + |
| 31 | +1. SQL-Compliant (e.g., Oracle, DB2) |
| 32 | + - Delimited identifiers keep their original case. |
| 33 | + - Non-delimited identifiers are converted to uppercase. |
| 34 | + |
| 35 | +2. Case-Sensitive (e.g., PostgreSQL) |
| 36 | + - Delimited identifiers keep their original case. |
| 37 | + - Non-delimited identifiers may be converted to lowercase or another case. |
| 38 | + |
| 39 | +3. Case-Insensitive (e.g., Hive) |
| 40 | + - Delimited and non-delimited identifiers are treated the same. |
| 41 | + - Identifiers may be automatically converted to a specific case. |
| 42 | + |
| 43 | +Presto handles identifiers in several ways: |
| 44 | + |
| 45 | + - Matching identifiers to locate entities such as catalogs, schemas, tables, views. |
| 46 | + - Resolving column names based on table metadata provided by connectors. |
| 47 | + - Passing identifiers to connectors when creating new entities. |
| 48 | + - Processing and displaying entity names retrieved from connectors, including column resolution and metadata introspection commands like SHOW and DESCRIBE. |
| 49 | + |
| 50 | +## Proposed Implementation |
| 51 | + |
| 52 | +#### Core Changes |
| 53 | + |
| 54 | +* In the common code path, make changes to pass original identifier (Schema, table and column names) |
| 55 | +* Introduce new API in Metadata for preserving lower case identifier by default to preserve backward compatibility |
| 56 | +* Introduce new Connector specific API in ConnectorMetadata |
| 57 | + |
| 58 | +Metadata.java |
| 59 | + |
| 60 | +```java |
| 61 | + String normalizeIdentifier(Session session, String catalogName, String identifier, boolean delimited); |
| 62 | +``` |
| 63 | + |
| 64 | +MetadataManager.java |
| 65 | + |
| 66 | +```java |
| 67 | + @Override |
| 68 | + public String normalizeIdentifier(Session session, String catalogName, String identifier, boolean delimited) |
| 69 | + { |
| 70 | + Optional<CatalogMetadata> catalogMetadata = getOptionalCatalogMetadata(session, transactionManager, catalogName); |
| 71 | + if (catalogMetadata.isPresent()) { |
| 72 | + ConnectorId connectorId = catalogMetadata.get().getConnectorId(); |
| 73 | + ConnectorMetadata metadata = catalogMetadata.get().getMetadataFor(connectorId); |
| 74 | + return metadata.normalizeIdentifier(session.toConnectorSession(connectorId), identifier, delimited); |
| 75 | + } |
| 76 | + return identifier.toLowerCase(ENGLISH); |
| 77 | + } |
| 78 | +``` |
| 79 | + |
| 80 | +ConnectorMetadata.java |
| 81 | +```java |
| 82 | + |
| 83 | + /** |
| 84 | + * Normalize the provided SQL identifier according to connector-specific rules |
| 85 | + */ |
| 86 | + default String normalizeIdentifier(ConnectorSession session, String identifier, boolean delimited) |
| 87 | + { |
| 88 | + return identifier.toLowerCase(ENGLISH); |
| 89 | + } |
| 90 | +``` |
| 91 | + |
| 92 | +JDBC Connector specific implementation |
| 93 | + |
| 94 | +JdbcMetadata.java |
| 95 | + |
| 96 | +```java |
| 97 | + @Override |
| 98 | + public String normalizeIdentifier(ConnectorSession session, String identifier, boolean delimited) |
| 99 | + { |
| 100 | + return jdbcClient.normalizeIdentifier(session, identifier, delimited); |
| 101 | + } |
| 102 | +``` |
| 103 | + |
| 104 | +JdbcClient.java |
| 105 | +```java |
| 106 | + String normalizeIdentifier(ConnectorSession session, String identifier, boolean delimited); |
| 107 | +``` |
| 108 | + |
| 109 | +BaseJdbcClient.java |
| 110 | +```java |
| 111 | + @Override |
| 112 | + public String normalizeIdentifier(ConnectorSession session, String identifier, boolean delimited) |
| 113 | + { |
| 114 | + return identifier.toLowerCase(ENGLISH); |
| 115 | + } |
| 116 | +``` |
| 117 | + |
| 118 | +Example - Connector specific implementation - |
| 119 | +MySqlClient.java |
| 120 | + |
| 121 | +```java |
| 122 | + @Override |
| 123 | + public String normalizeIdentifier(ConnectorSession session, String identifier, boolean delimited) |
| 124 | + { |
| 125 | + return identifier; |
| 126 | + } |
| 127 | +``` |
| 128 | + |
| 129 | +#### Example Queries |
| 130 | + |
| 131 | +#### MySQL Table Handling |
| 132 | + |
| 133 | +``` |
| 134 | +presto> show schemas from mysql; |
| 135 | + Schema |
| 136 | +-------------------- |
| 137 | + Test |
| 138 | + TestDb |
| 139 | + information_schema |
| 140 | + performance_schema |
| 141 | + sys |
| 142 | + testdb |
| 143 | +(6 rows) |
| 144 | +
|
| 145 | +presto> show tables from mysql.TestDb; |
| 146 | + Table |
| 147 | +----------- |
| 148 | + Test |
| 149 | + TestTable |
| 150 | + testtable |
| 151 | +(3 rows) |
| 152 | +
|
| 153 | +presto> SHOW CREATE TABLE mysql.TestDb.Test; |
| 154 | + Create Table |
| 155 | +-------------------------------------- |
| 156 | + CREATE TABLE mysql."TestDb"."Test" ( |
| 157 | + "id" integer, |
| 158 | + "Name" char(10) |
| 159 | + ) |
| 160 | +(1 row) |
| 161 | +
|
| 162 | +presto> select * from mysql.TestDb.Test; |
| 163 | + id | Name |
| 164 | +----+------------ |
| 165 | + 2 | Tom |
| 166 | +(1 row) |
| 167 | +``` |
| 168 | + |
| 169 | +## Backward Compatibility Considerations |
| 170 | + |
| 171 | +* Existing connectors that do not implement normalizeIdentifier will default to lowercase normalization. |
| 172 | +* Any connectors requiring case preservation can override the default behavior. |
| 173 | +* A configuration flag could be introduced to allow backward-compatible identifier handling at the catalog level. |
| 174 | + |
| 175 | +## Test Plan |
| 176 | + |
| 177 | +* Ensure that existing CI tests pass for connectors where no specific implementation is added. |
| 178 | +* Add support for mixed-case identifiers in at least one JDBC connector (e.g., MySQL, PostgreSQL) and create relevant unit tests. |
| 179 | +* Cover cases such as: |
| 180 | + - Queries with mixed-case identifiers. |
| 181 | + - Queries with delimited and non-delimited identifiers. |
| 182 | + - Metadata retrieval commands (SHOW SCHEMAS, SHOW TABLES, DESCRIBE). |
| 183 | + - Joins, subqueries, and alias usage with mixed-case identifiers. |
| 184 | + |
| 185 | +To ensure backward-compatibility current connectors where connector specific implementation is not added, existing CI tests should pass. |
| 186 | +Add support for mixed case for a JDBC connector (ex. mysql, postgresql etc) and add relevant Unit tests for same. |
| 187 | + |
| 188 | +## Modules involved |
| 189 | +- `presto-main` |
| 190 | +- `presto-common` |
| 191 | +- `presto-spi` |
| 192 | +- `presto-parser` |
| 193 | +- `presto-base-jdbc` |
| 194 | + |
| 195 | +## Final Thoughts |
| 196 | + |
| 197 | +This RFC enhances Presto's identifier handling for improved cross-engine compatibility. The proposed changes ensure |
| 198 | +better adherence to SQL standards while maintaining backward compatibility. Implementing connector-specific identifier |
| 199 | +normalization will help prevent unexpected query failures and improve user experience when working with different |
| 200 | +databases. |
| 201 | +Would appreciate feedback on any additional cases or edge scenarios that should be covered! |
0 commit comments