Skip to content

Commit f46e9a3

Browse files
Add mixed case identifiers RFC
1 parent 8d4f04c commit f46e9a3

File tree

1 file changed

+201
-0
lines changed

1 file changed

+201
-0
lines changed
Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
# **RFC-0010 for Presto**
2+
3+
## Mixed case identifiers
4+
5+
Proposers
6+
7+
* Reetika Agrawal
8+
9+
## Summary
10+
11+
Improve Presto's identifier (schema, table & column names) handling to align with SQL standards, ensuring better interoperability with case-sensitive
12+
and case-normalizing databases while minimizing SPI-breaking changes.
13+
14+
## Background
15+
16+
Presto treats all identifiers as case-insensitive, normalizing them (typically to lowercase). This creates issues when
17+
querying databases that are case-sensitive (e.g., MySQL, PostgreSQL) or case-normalizing to uppercase (e.g., Oracle,
18+
DB2). Without a standard approach, identifiers might not match the actual names in the underlying data sources, leading
19+
to unexpected query failures or incorrect results. Additionally, inconsistent handling of delimited and non-delimited
20+
identifiers across different connectors further complicates cross-engine compatibility.
21+
22+
The goal here is to improve interoperability with storage engines by aligning identifier handling with SQL standards
23+
while ensuring a seamless user experience. Ideally, the change should be implemented in a way that minimizes
24+
backward-compatibility-breaking changes to the SPI, allowing connectors to adopt the new approach without significant
25+
modifications as per their own requirements.
26+
27+
### Proposed Plan
28+
29+
Connectors handle identifiers in three ways:
30+
31+
1. SQL-Compliant (e.g., Oracle, DB2)
32+
- Delimited identifiers keep their original case.
33+
- Non-delimited identifiers are converted to uppercase.
34+
35+
2. Case-Sensitive (e.g., PostgreSQL)
36+
- Delimited identifiers keep their original case.
37+
- Non-delimited identifiers may be converted to lowercase or another case.
38+
39+
3. Case-Insensitive (e.g., Hive)
40+
- Delimited and non-delimited identifiers are treated the same.
41+
- Identifiers may be automatically converted to a specific case.
42+
43+
Presto handles identifiers in several ways:
44+
45+
- Matching identifiers to locate entities such as catalogs, schemas, tables, views.
46+
- Resolving column names based on table metadata provided by connectors.
47+
- Passing identifiers to connectors when creating new entities.
48+
- Processing and displaying entity names retrieved from connectors, including column resolution and metadata introspection commands like SHOW and DESCRIBE.
49+
50+
## Proposed Implementation
51+
52+
#### Core Changes
53+
54+
* In the common code path, make changes to pass original identifier (Schema, table and column names)
55+
* Introduce new API in Metadata for preserving lower case identifier by default to preserve backward compatibility
56+
* Introduce new Connector specific API in ConnectorMetadata
57+
58+
Metadata.java
59+
60+
```java
61+
String normalizeIdentifier(Session session, String catalogName, String identifier, boolean delimited);
62+
```
63+
64+
MetadataManager.java
65+
66+
```java
67+
@Override
68+
public String normalizeIdentifier(Session session, String catalogName, String identifier, boolean delimited)
69+
{
70+
Optional<CatalogMetadata> catalogMetadata = getOptionalCatalogMetadata(session, transactionManager, catalogName);
71+
if (catalogMetadata.isPresent()) {
72+
ConnectorId connectorId = catalogMetadata.get().getConnectorId();
73+
ConnectorMetadata metadata = catalogMetadata.get().getMetadataFor(connectorId);
74+
return metadata.normalizeIdentifier(session.toConnectorSession(connectorId), identifier, delimited);
75+
}
76+
return identifier.toLowerCase(ENGLISH);
77+
}
78+
```
79+
80+
ConnectorMetadata.java
81+
```java
82+
83+
/**
84+
* Normalize the provided SQL identifier according to connector-specific rules
85+
*/
86+
default String normalizeIdentifier(ConnectorSession session, String identifier, boolean delimited)
87+
{
88+
return identifier.toLowerCase(ENGLISH);
89+
}
90+
```
91+
92+
JDBC Connector specific implementation
93+
94+
JdbcMetadata.java
95+
96+
```java
97+
@Override
98+
public String normalizeIdentifier(ConnectorSession session, String identifier, boolean delimited)
99+
{
100+
return jdbcClient.normalizeIdentifier(session, identifier, delimited);
101+
}
102+
```
103+
104+
JdbcClient.java
105+
```java
106+
String normalizeIdentifier(ConnectorSession session, String identifier, boolean delimited);
107+
```
108+
109+
BaseJdbcClient.java
110+
```java
111+
@Override
112+
public String normalizeIdentifier(ConnectorSession session, String identifier, boolean delimited)
113+
{
114+
return identifier.toLowerCase(ENGLISH);
115+
}
116+
```
117+
118+
Example - Connector specific implementation -
119+
MySqlClient.java
120+
121+
```java
122+
@Override
123+
public String normalizeIdentifier(ConnectorSession session, String identifier, boolean delimited)
124+
{
125+
return identifier;
126+
}
127+
```
128+
129+
#### Example Queries
130+
131+
#### MySQL Table Handling
132+
133+
```
134+
presto> show schemas from mysql;
135+
Schema
136+
--------------------
137+
Test
138+
TestDb
139+
information_schema
140+
performance_schema
141+
sys
142+
testdb
143+
(6 rows)
144+
145+
presto> show tables from mysql.TestDb;
146+
Table
147+
-----------
148+
Test
149+
TestTable
150+
testtable
151+
(3 rows)
152+
153+
presto> SHOW CREATE TABLE mysql.TestDb.Test;
154+
Create Table
155+
--------------------------------------
156+
CREATE TABLE mysql."TestDb"."Test" (
157+
"id" integer,
158+
"Name" char(10)
159+
)
160+
(1 row)
161+
162+
presto> select * from mysql.TestDb.Test;
163+
id | Name
164+
----+------------
165+
2 | Tom
166+
(1 row)
167+
```
168+
169+
## Backward Compatibility Considerations
170+
171+
* Existing connectors that do not implement normalizeIdentifier will default to lowercase normalization.
172+
* Any connectors requiring case preservation can override the default behavior.
173+
* A configuration flag could be introduced to allow backward-compatible identifier handling at the catalog level.
174+
175+
## Test Plan
176+
177+
* Ensure that existing CI tests pass for connectors where no specific implementation is added.
178+
* Add support for mixed-case identifiers in at least one JDBC connector (e.g., MySQL, PostgreSQL) and create relevant unit tests.
179+
* Cover cases such as:
180+
- Queries with mixed-case identifiers.
181+
- Queries with delimited and non-delimited identifiers.
182+
- Metadata retrieval commands (SHOW SCHEMAS, SHOW TABLES, DESCRIBE).
183+
- Joins, subqueries, and alias usage with mixed-case identifiers.
184+
185+
To ensure backward-compatibility current connectors where connector specific implementation is not added, existing CI tests should pass.
186+
Add support for mixed case for a JDBC connector (ex. mysql, postgresql etc) and add relevant Unit tests for same.
187+
188+
## Modules involved
189+
- `presto-main`
190+
- `presto-common`
191+
- `presto-spi`
192+
- `presto-parser`
193+
- `presto-base-jdbc`
194+
195+
## Final Thoughts
196+
197+
This RFC enhances Presto's identifier handling for improved cross-engine compatibility. The proposed changes ensure
198+
better adherence to SQL standards while maintaining backward compatibility. Implementing connector-specific identifier
199+
normalization will help prevent unexpected query failures and improve user experience when working with different
200+
databases.
201+
Would appreciate feedback on any additional cases or edge scenarios that should be covered!

0 commit comments

Comments
 (0)