|
| 1 | +# **RFC-0010 for Presto** |
| 2 | + |
| 3 | +## Mixed case identifiers |
| 4 | + |
| 5 | +Proposers |
| 6 | + |
| 7 | +* Reetika Agrawal |
| 8 | + |
| 9 | +## Summary |
| 10 | + |
| 11 | +Improve Presto's identifier (schema, table & column names) handling to align with SQL standards, ensuring better interoperability with case-sensitive |
| 12 | +and case-normalizing databases while minimizing SPI-breaking changes. |
| 13 | + |
| 14 | +## Background |
| 15 | + |
| 16 | +Presto treats all identifiers as case-insensitive, normalizing them to lowercase. This creates issues when |
| 17 | +querying databases that are case-sensitive (e.g., MySQL, PostgreSQL) or case-normalizing to uppercase (e.g., Oracle, |
| 18 | +DB2). Without a standard approach, identifiers might not match the actual names in the underlying data sources, leading |
| 19 | +to unexpected query failures or incorrect results. |
| 20 | + |
| 21 | +The goal here is to improve interoperability with storage engines by aligning identifier handling with SQL standards |
| 22 | +while ensuring a seamless user experience. Ideally, the change should be implemented in a way that minimizes |
| 23 | +breaking changes to the SPI, i.e. allowing connectors to adopt the new approach without significant impact. |
| 24 | + |
| 25 | +### Goals |
| 26 | + |
| 27 | +- Align Presto’s identifier handling with SQL standards to improve interoperability with case-sensitive and |
| 28 | + case-normalizing databases. |
| 29 | +- Minimize SPI-breaking changes to maintain backward compatibility for existing connectors. |
| 30 | +- Introduce a mechanism for connectors to define their own identifier normalization behavior. |
| 31 | +- Allow identifiers to retain their original case where necessary, preventing unexpected query failures. |
| 32 | +- Ensure Access Control SPI can correctly normalize identifiers. |
| 33 | +- Preserve a seamless user experience while making these changes. |
| 34 | + |
| 35 | +### Proposed Plan |
| 36 | + |
| 37 | +Presto's default behavior is - |
| 38 | + |
| 39 | +- Identifiers are converted to lowercase by default unless a connector enforces a specific behavior. |
| 40 | +- Identifiers are normalized when: |
| 41 | + - Resolving schemas, tables, columns, views. |
| 42 | + - Retrieving metadata from connectors. |
| 43 | + - Displaying entity names in metadata introspection commands like SHOW TABLES and DESCRIBE. |
| 44 | + |
| 45 | +Presto uses identifiers in several ways: |
| 46 | + |
| 47 | + - Matching identifiers to locate entities such as catalogs, schemas, tables, views. |
| 48 | + - Resolving column names based on table metadata provided by connectors. |
| 49 | + - Passing identifiers to connectors when creating new entities. |
| 50 | + - Processing and displaying entity names retrieved from connectors, including column resolution and metadata introspection commands like SHOW and DESCRIBE. |
| 51 | + |
| 52 | +## Proposed Implementation |
| 53 | + |
| 54 | +#### Core Changes |
| 55 | + |
| 56 | +* In the presto-spi, add new API to pass original identifier (Schema, table and column names) |
| 57 | +* Introduce new API in Metadata for preserving lower case identifier by default to preserve backward compatibility |
| 58 | +* Introduce new Connector specific API in ConnectorMetadata |
| 59 | + |
| 60 | +Metadata.java |
| 61 | + |
| 62 | +```java |
| 63 | + String normalizeIdentifier(Session session, String catalogName, String identifier); |
| 64 | +``` |
| 65 | + |
| 66 | +MetadataManager.java |
| 67 | + |
| 68 | +```java |
| 69 | + @Override |
| 70 | + public String normalizeIdentifier(Session session, String catalogName, String identifier) |
| 71 | + { |
| 72 | + Optional<CatalogMetadata> catalogMetadata = getOptionalCatalogMetadata(session, transactionManager, catalogName); |
| 73 | + if (catalogMetadata.isPresent()) { |
| 74 | + ConnectorId connectorId = catalogMetadata.get().getConnectorId(); |
| 75 | + ConnectorMetadata metadata = catalogMetadata.get().getMetadataFor(connectorId); |
| 76 | + return metadata.normalizeIdentifier(session.toConnectorSession(connectorId), identifier); |
| 77 | + } |
| 78 | + return identifier.toLowerCase(ENGLISH); |
| 79 | + } |
| 80 | +``` |
| 81 | + |
| 82 | +ConnectorMetadata.java |
| 83 | +```java |
| 84 | + |
| 85 | + /** |
| 86 | + * Normalize the provided SQL identifier according to connector-specific rules |
| 87 | + */ |
| 88 | + default String normalizeIdentifier(ConnectorSession session, String identifier) |
| 89 | + { |
| 90 | + return identifier.toLowerCase(ENGLISH); |
| 91 | + } |
| 92 | +``` |
| 93 | + |
| 94 | +JDBC Connector specific implementation |
| 95 | + |
| 96 | +JdbcMetadata.java |
| 97 | + |
| 98 | +```java |
| 99 | + @Override |
| 100 | + public String normalizeIdentifier(ConnectorSession session, String identifier) |
| 101 | + { |
| 102 | + return jdbcClient.normalizeIdentifier(session, identifier); |
| 103 | + } |
| 104 | +``` |
| 105 | + |
| 106 | +JdbcClient.java |
| 107 | +```java |
| 108 | + String normalizeIdentifier(ConnectorSession session, String identifier); |
| 109 | +``` |
| 110 | + |
| 111 | +BaseJdbcClient.java |
| 112 | +```java |
| 113 | + @Override |
| 114 | + public String normalizeIdentifier(ConnectorSession session, String identifier) |
| 115 | + { |
| 116 | + return identifier.toLowerCase(ENGLISH); |
| 117 | + } |
| 118 | +``` |
| 119 | + |
| 120 | +Example - Connector specific implementation - |
| 121 | +MySqlClient.java |
| 122 | + |
| 123 | +```java |
| 124 | + @Override |
| 125 | + public String normalizeIdentifier(ConnectorSession session, String identifier) |
| 126 | + { |
| 127 | + return identifier; |
| 128 | + } |
| 129 | +``` |
| 130 | + |
| 131 | +#### Example Queries |
| 132 | + |
| 133 | +#### MySQL Table Handling |
| 134 | + |
| 135 | +``` |
| 136 | +presto> show schemas from mysql; |
| 137 | + Schema |
| 138 | +-------------------- |
| 139 | + Test |
| 140 | + TestDb |
| 141 | + information_schema |
| 142 | + performance_schema |
| 143 | + sys |
| 144 | + testdb |
| 145 | +(6 rows) |
| 146 | +
|
| 147 | +presto> show tables from mysql.TestDb; |
| 148 | + Table |
| 149 | +----------- |
| 150 | + Test |
| 151 | + TestTable |
| 152 | + testtable |
| 153 | +(3 rows) |
| 154 | +
|
| 155 | +presto> SHOW CREATE TABLE mysql.TestDb.Test; |
| 156 | + Create Table |
| 157 | +-------------------------------------- |
| 158 | + CREATE TABLE mysql."TestDb"."Test" ( |
| 159 | + "id" integer, |
| 160 | + "Name" char(10) |
| 161 | + ) |
| 162 | +(1 row) |
| 163 | +
|
| 164 | +presto> select * from mysql.TestDb.Test; |
| 165 | + id | Name |
| 166 | +----+------------ |
| 167 | + 2 | Tom |
| 168 | +(1 row) |
| 169 | +``` |
| 170 | + |
| 171 | +## Behavioral Examples with `case-sensitive-name-matching` Flag |
| 172 | + |
| 173 | +Presto will allow the connector to handle identifier normalization if the `case-sensitive-name-matching` configuration flag is |
| 174 | +supported by the connector. For example, if the Postgres connector does not normalize identifiers to lowercase, the |
| 175 | +original case from the Presto DDL is preserved — including for unquoted identifiers. |
| 176 | + |
| 177 | +**When case-sensitive-name-matching = false (default behavior)** |
| 178 | +This is the default behavior in Presto. Identifiers are normalized to lowercase, regardless of quoting. |
| 179 | + |
| 180 | +Presto DDL: |
| 181 | + |
| 182 | +```sql |
| 183 | +CREATE TABLE TeSt1 ( |
| 184 | + ID INT, |
| 185 | + "Name" VARCHAR |
| 186 | +); |
| 187 | +``` |
| 188 | + |
| 189 | +Underlying DDL sent to Postgres, since Postgres identifierQuote is double quotes: |
| 190 | + |
| 191 | +```sql |
| 192 | +CREATE TABLE "test1" ( |
| 193 | + "id" INT, |
| 194 | + "name" VARCHAR |
| 195 | +); |
| 196 | +``` |
| 197 | + |
| 198 | +* Table and column names are normalized to lowercase. |
| 199 | +* Quoting is added as needed by the connector, but the case is not preserved. |
| 200 | + |
| 201 | +**When case-sensitive-name-matching = true** |
| 202 | + |
| 203 | +Connector is responsible for identifier normalization, allowing case preservation or other casing. |
| 204 | + |
| 205 | +```sql |
| 206 | +CREATE TABLE test1 ( |
| 207 | + UPR INTEGER, |
| 208 | + lwr INTEGER, |
| 209 | + "Mixed" INTEGER |
| 210 | +); |
| 211 | +``` |
| 212 | + |
| 213 | +Resulting Postgres DDL: |
| 214 | + |
| 215 | +```sql |
| 216 | +CREATE TABLE "test1" ( |
| 217 | + "UPR" INTEGER, |
| 218 | + "lwr" INTEGER, |
| 219 | + "Mixed" INTEGER |
| 220 | +); |
| 221 | + |
| 222 | +``` |
| 223 | + |
| 224 | +* Table name is preserved as "test1" (unquoted input becomes quoted). |
| 225 | +* Column names retain their original case — whether quoted or unquoted. |
| 226 | +* This behavior aligns with SQL standard semantics and matches user intent. |
| 227 | + |
| 228 | +If users prefer lowercase identifiers, they can write: |
| 229 | + |
| 230 | +Presto DDL: |
| 231 | + |
| 232 | +```sql |
| 233 | +CREATE TABLE test1 ( |
| 234 | + upr INTEGER, |
| 235 | + lwr INTEGER, |
| 236 | + mixed INTEGER |
| 237 | +); |
| 238 | +``` |
| 239 | + |
| 240 | +Underlying DDL sent to Postgres: |
| 241 | + |
| 242 | +```sql |
| 243 | +CREATE TABLE "test1" ( |
| 244 | + "upr" INTEGER, |
| 245 | + "lwr" INTEGER, |
| 246 | + "mixed" INTEGER |
| 247 | +); |
| 248 | +``` |
| 249 | +### Rationale |
| 250 | + |
| 251 | +This behavior gives users full control over identifier casing, matching SQL standard semantics and improving |
| 252 | +compatibility with case-sensitive engines like Postgres. It also ensures a smooth migration path by defaulting to |
| 253 | +existing behavior (case-sensitive-name-matching = false), avoiding surprises for current users. |
| 254 | + |
| 255 | +## Backward Compatibility Considerations |
| 256 | + |
| 257 | +* Existing connectors that do not implement normalizeIdentifier will default to lowercase normalization. |
| 258 | +* Any connectors requiring case preservation can override the default behavior. |
| 259 | +* A configuration flag could be introduced to allow backward-compatible identifier handling at the catalog level. |
| 260 | + |
| 261 | +## Test Plan |
| 262 | + |
| 263 | +* Ensure that existing CI tests pass for connectors where no specific implementation is added. |
| 264 | +* Add unit tests for testing mixed-case identifiers support in a JDBC connector (e.g., MySQL, PostgreSQL). |
| 265 | +* Cover cases such as: |
| 266 | + - Queries with mixed-case identifiers. |
| 267 | + - Metadata retrieval commands (SHOW SCHEMAS, SHOW TABLES, DESCRIBE). |
| 268 | + - Joins, subqueries, and alias usage with mixed-case identifiers. |
| 269 | + |
| 270 | +To ensure backward-compatibility current connectors where connector specific implementation is not added, existing CI tests should pass. |
| 271 | +Add support for mixed case for a JDBC connector (ex. mysql, postgresql etc) and add relevant Unit tests for same. |
| 272 | + |
| 273 | +## Modules involved |
| 274 | +- `presto-main` |
| 275 | +- `presto-common` |
| 276 | +- `presto-spi` |
| 277 | +- `presto-parser` |
| 278 | +- `presto-base-jdbc` |
| 279 | + |
| 280 | +## Final Thoughts |
| 281 | + |
| 282 | +This RFC enhances Presto's identifier handling for improved cross-engine compatibility. The proposed changes ensure |
| 283 | +better adherence to SQL standards while maintaining backward compatibility. Implementing connector-specific identifier |
| 284 | +normalization will help prevent unexpected query failures and improve user experience when working with different |
| 285 | +databases. |
| 286 | +Would appreciate feedback on any additional cases or edge scenarios that should be covered! |
| 287 | + |
| 288 | +## WIP - Draft PR Changes |
| 289 | +https://github.com/prestodb/presto/pull/24551 |
0 commit comments