Skip to content

Commit 1d5a0eb

Browse files
Add mixed case identifiers RFC
1 parent 4a28092 commit 1d5a0eb

File tree

1 file changed

+289
-0
lines changed

1 file changed

+289
-0
lines changed
Lines changed: 289 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,289 @@
1+
# **RFC-0010 for Presto**
2+
3+
## Mixed case identifiers
4+
5+
Proposers
6+
7+
* Reetika Agrawal
8+
9+
## Summary
10+
11+
Improve Presto's identifier (schema, table & column names) handling to align with SQL standards, ensuring better interoperability with case-sensitive
12+
and case-normalizing databases while minimizing SPI-breaking changes.
13+
14+
## Background
15+
16+
Presto treats all identifiers as case-insensitive, normalizing them to lowercase. This creates issues when
17+
querying databases that are case-sensitive (e.g., MySQL, PostgreSQL) or case-normalizing to uppercase (e.g., Oracle,
18+
DB2). Without a standard approach, identifiers might not match the actual names in the underlying data sources, leading
19+
to unexpected query failures or incorrect results.
20+
21+
The goal here is to improve interoperability with storage engines by aligning identifier handling with SQL standards
22+
while ensuring a seamless user experience. Ideally, the change should be implemented in a way that minimizes
23+
breaking changes to the SPI, i.e. allowing connectors to adopt the new approach without significant impact.
24+
25+
### Goals
26+
27+
- Align Presto’s identifier handling with SQL standards to improve interoperability with case-sensitive and
28+
case-normalizing databases.
29+
- Minimize SPI-breaking changes to maintain backward compatibility for existing connectors.
30+
- Introduce a mechanism for connectors to define their own identifier normalization behavior.
31+
- Allow identifiers to retain their original case where necessary, preventing unexpected query failures.
32+
- Ensure Access Control SPI can correctly normalize identifiers.
33+
- Preserve a seamless user experience while making these changes.
34+
35+
### Proposed Plan
36+
37+
Presto's default behavior is -
38+
39+
- Identifiers are converted to lowercase by default unless a connector enforces a specific behavior.
40+
- Identifiers are normalized when:
41+
- Resolving schemas, tables, columns, views.
42+
- Retrieving metadata from connectors.
43+
- Displaying entity names in metadata introspection commands like SHOW TABLES and DESCRIBE.
44+
45+
Presto uses identifiers in several ways:
46+
47+
- Matching identifiers to locate entities such as catalogs, schemas, tables, views.
48+
- Resolving column names based on table metadata provided by connectors.
49+
- Passing identifiers to connectors when creating new entities.
50+
- Processing and displaying entity names retrieved from connectors, including column resolution and metadata introspection commands like SHOW and DESCRIBE.
51+
52+
## Proposed Implementation
53+
54+
#### Core Changes
55+
56+
* In the presto-spi, add new API to pass original identifier (Schema, table and column names)
57+
* Introduce new API in Metadata for preserving lower case identifier by default to preserve backward compatibility
58+
* Introduce new Connector specific API in ConnectorMetadata
59+
60+
Metadata.java
61+
62+
```java
63+
String normalizeIdentifier(Session session, String catalogName, String identifier);
64+
```
65+
66+
MetadataManager.java
67+
68+
```java
69+
@Override
70+
public String normalizeIdentifier(Session session, String catalogName, String identifier)
71+
{
72+
Optional<CatalogMetadata> catalogMetadata = getOptionalCatalogMetadata(session, transactionManager, catalogName);
73+
if (catalogMetadata.isPresent()) {
74+
ConnectorId connectorId = catalogMetadata.get().getConnectorId();
75+
ConnectorMetadata metadata = catalogMetadata.get().getMetadataFor(connectorId);
76+
return metadata.normalizeIdentifier(session.toConnectorSession(connectorId), identifier);
77+
}
78+
return identifier.toLowerCase(ENGLISH);
79+
}
80+
```
81+
82+
ConnectorMetadata.java
83+
```java
84+
85+
/**
86+
* Normalize the provided SQL identifier according to connector-specific rules
87+
*/
88+
default String normalizeIdentifier(ConnectorSession session, String identifier)
89+
{
90+
return identifier.toLowerCase(ENGLISH);
91+
}
92+
```
93+
94+
JDBC Connector specific implementation
95+
96+
JdbcMetadata.java
97+
98+
```java
99+
@Override
100+
public String normalizeIdentifier(ConnectorSession session, String identifier)
101+
{
102+
return jdbcClient.normalizeIdentifier(session, identifier);
103+
}
104+
```
105+
106+
JdbcClient.java
107+
```java
108+
String normalizeIdentifier(ConnectorSession session, String identifier);
109+
```
110+
111+
BaseJdbcClient.java
112+
```java
113+
@Override
114+
public String normalizeIdentifier(ConnectorSession session, String identifier)
115+
{
116+
return identifier.toLowerCase(ENGLISH);
117+
}
118+
```
119+
120+
Example - Connector specific implementation -
121+
MySqlClient.java
122+
123+
```java
124+
@Override
125+
public String normalizeIdentifier(ConnectorSession session, String identifier)
126+
{
127+
return identifier;
128+
}
129+
```
130+
131+
#### Example Queries
132+
133+
#### MySQL Table Handling
134+
135+
```
136+
presto> show schemas from mysql;
137+
Schema
138+
--------------------
139+
Test
140+
TestDb
141+
information_schema
142+
performance_schema
143+
sys
144+
testdb
145+
(6 rows)
146+
147+
presto> show tables from mysql.TestDb;
148+
Table
149+
-----------
150+
Test
151+
TestTable
152+
testtable
153+
(3 rows)
154+
155+
presto> SHOW CREATE TABLE mysql.TestDb.Test;
156+
Create Table
157+
--------------------------------------
158+
CREATE TABLE mysql."TestDb"."Test" (
159+
"id" integer,
160+
"Name" char(10)
161+
)
162+
(1 row)
163+
164+
presto> select * from mysql.TestDb.Test;
165+
id | Name
166+
----+------------
167+
2 | Tom
168+
(1 row)
169+
```
170+
171+
## Behavioral Examples with `case-sensitive-name-matching` Flag
172+
173+
Presto will allow the connector to handle identifier normalization if the `case-sensitive-name-matching` configuration flag is
174+
supported by the connector. For example, if the Postgres connector does not normalize identifiers to lowercase, the
175+
original case from the Presto DDL is preserved — including for unquoted identifiers.
176+
177+
**When case-sensitive-name-matching = false (default behavior)**
178+
This is the default behavior in Presto. Identifiers are normalized to lowercase, regardless of quoting.
179+
180+
Presto DDL:
181+
182+
```sql
183+
CREATE TABLE TeSt1 (
184+
ID INT,
185+
"Name" VARCHAR
186+
);
187+
```
188+
189+
Underlying DDL sent to Postgres, since Postgres identifierQuote is double quotes:
190+
191+
```sql
192+
CREATE TABLE "test1" (
193+
"id" INT,
194+
"name" VARCHAR
195+
);
196+
```
197+
198+
* Table and column names are normalized to lowercase.
199+
* Quoting is added as needed by the connector, but the case is not preserved.
200+
201+
**When case-sensitive-name-matching = true**
202+
203+
Connector is responsible for identifier normalization, allowing case preservation or other casing.
204+
205+
```sql
206+
CREATE TABLE test1 (
207+
UPR INTEGER,
208+
lwr INTEGER,
209+
"Mixed" INTEGER
210+
);
211+
```
212+
213+
Resulting Postgres DDL:
214+
215+
```sql
216+
CREATE TABLE "test1" (
217+
"UPR" INTEGER,
218+
"lwr" INTEGER,
219+
"Mixed" INTEGER
220+
);
221+
222+
```
223+
224+
* Table name is preserved as "test1" (unquoted input becomes quoted).
225+
* Column names retain their original case — whether quoted or unquoted.
226+
* This behavior aligns with SQL standard semantics and matches user intent.
227+
228+
If users prefer lowercase identifiers, they can write:
229+
230+
Presto DDL:
231+
232+
```sql
233+
CREATE TABLE test1 (
234+
upr INTEGER,
235+
lwr INTEGER,
236+
mixed INTEGER
237+
);
238+
```
239+
240+
Underlying DDL sent to Postgres:
241+
242+
```sql
243+
CREATE TABLE "test1" (
244+
"upr" INTEGER,
245+
"lwr" INTEGER,
246+
"mixed" INTEGER
247+
);
248+
```
249+
### Rationale
250+
251+
This behavior gives users full control over identifier casing, matching SQL standard semantics and improving
252+
compatibility with case-sensitive engines like Postgres. It also ensures a smooth migration path by defaulting to
253+
existing behavior (case-sensitive-name-matching = false), avoiding surprises for current users.
254+
255+
## Backward Compatibility Considerations
256+
257+
* Existing connectors that do not implement normalizeIdentifier will default to lowercase normalization.
258+
* Any connectors requiring case preservation can override the default behavior.
259+
* A configuration flag could be introduced to allow backward-compatible identifier handling at the catalog level.
260+
261+
## Test Plan
262+
263+
* Ensure that existing CI tests pass for connectors where no specific implementation is added.
264+
* Add unit tests for testing mixed-case identifiers support in a JDBC connector (e.g., MySQL, PostgreSQL).
265+
* Cover cases such as:
266+
- Queries with mixed-case identifiers.
267+
- Metadata retrieval commands (SHOW SCHEMAS, SHOW TABLES, DESCRIBE).
268+
- Joins, subqueries, and alias usage with mixed-case identifiers.
269+
270+
To ensure backward-compatibility current connectors where connector specific implementation is not added, existing CI tests should pass.
271+
Add support for mixed case for a JDBC connector (ex. mysql, postgresql etc) and add relevant Unit tests for same.
272+
273+
## Modules involved
274+
- `presto-main`
275+
- `presto-common`
276+
- `presto-spi`
277+
- `presto-parser`
278+
- `presto-base-jdbc`
279+
280+
## Final Thoughts
281+
282+
This RFC enhances Presto's identifier handling for improved cross-engine compatibility. The proposed changes ensure
283+
better adherence to SQL standards while maintaining backward compatibility. Implementing connector-specific identifier
284+
normalization will help prevent unexpected query failures and improve user experience when working with different
285+
databases.
286+
Would appreciate feedback on any additional cases or edge scenarios that should be covered!
287+
288+
## WIP - Draft PR Changes
289+
https://github.com/prestodb/presto/pull/24551

0 commit comments

Comments
 (0)