-
-
Notifications
You must be signed in to change notification settings - Fork 765
Support for JSONB expressions #1699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -2099,8 +2120,9 @@ extension SQLExpression { | |||
case .jsonValue: | |||
if isJSONValue { | |||
return self | |||
} else if sqlite3_libversion_number() >= 3045000 { | |||
return .function("JSONB", [self]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My types are all built from raw SQL so I don’t have a sense of the usage here.
It does bring to mind the SQLite bug that can make it seem like you’re inserting valid JSONB
when really you’re not (as I experienced).
Probably not pertinent here but I do think GRDB should include some guardrails (even if just documentation) so users don’t confuse regular JSON
stored as a byte array with actual JSONB
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @Jason-Abbott!
I know you use raw SQL, so I'm not sure this PR is for you indeed. Actually I'm not sure about the kind of JSONB-related features you'd like to see in GRDB. Maybe you already wrote about them, but I can not put my finger on it.
This PR is still a draft, I need to seriously review it. Its intent is just to define JSONB functions for people who use the SQL builder, and to use JSONB
instead of JSON
in case of implicit JSON conversions.
Those "implicit conversions" are: it is JSON conversions that GRDB performs when the user declares that a value should be interpreted as JSON:
// The address column contains JSON objects
let plainColumn = Column("address") // Plain column, not interpreted as JSON
let jsonColumn = JSONColumn("address") // JSON column interpreted as JSON
// -- Not an array of JSON objects
// SELECT JSON_GROUP_ARRAY(address) FROM player
Player.select(Database.jsonGroupArray(plainColumn))
// -- An array of JSON objects
// SELECT JSON_GROUP_ARRAY(JSON(address)) FROM player -- GRDB 6
// SELECT JSON_GROUP_ARRAY(JSONB(address)) FROM player -- This PR, if SQLite 3.45+
Player.select(Database.jsonGroupArray(jsonColumn))
// Equivalent
// -- An array of JSON objects
// SELECT JSON_GROUP_ARRAY(JSON(address)) FROM player -- GRDB 6
// SELECT JSON_GROUP_ARRAY(JSONB(address)) FROM player -- This PR, if SQLite 3.45+
Player.select(Database.jsonGroupArray(plainColumn.asJSON))
It does bring to mind the SQLite bug that can make it seem like you’re inserting valid JSONB when really you’re not (as I #1656 (comment)).
I'm not sure I understand. Which specific problem are you thinking about? Please be specific, because JSON is a large topic and I may guess wrong.
Probably not pertinent here but I do think GRDB should include some guardrails (even if just documentation) so users don’t confuse regular JSON stored as a byte array with actual JSONB.
Please be explicit as well. I do not understand which confusion and guardrails you are referring to.
Your experience with JSONB is precious, when mine is basically zero. I'd really appreciate if you could be as clear and specific as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I don’t expect GRDB to ensure well-formed JSONB
any more than it ensures well-formed JSON
(which is not at all, I think).
I would only be cautious about anything that could imply that standard JSONEncoder
bytes become JSONB
just because they’re inserted into a BLOB
column.
Example
I put together a small example to illustrate the meaning of “real” JSONB
(in excruciating detail!). If I define this type
struct Example: Codable {
struct Nested: Codable {
let name: String
}
let text: String
let number: Int
let array: [String]
let nested: Nested
}
and create this instance
let example = Example(
text: "five",
number: 5,
array: ["a", "b", "c"],
nested: Example.Nested(name: "inner")
)
the standard JSONEncoder
will encode this as the Data
7B 22 74 65 78 74 22 3A 22 66 69 76 65 22 2C 22 6E 75 6D 62 65 72 22 3A 35 2C 22 6E 65 73 74 65 64 22 3A 7B 22 6E 61 6D 65 22 3A 22 69 6E 6E 65 72 22 7D 2C 22 61 72 72 61 79 22 3A 5B 22 61 22 2C 22 62 22 2C 22 63 22 5D 7D
which are simply the 74 bytes for the string
{"text":"five","number":5,"array":["a","b","c"],"nested":{"name":"inner"}}
If, on the other hand, and with exactly the same JSON
string, I do
SELECT jsonb('{"text":"five","number":5,"array":["a","b","c"],"nested":{"name":"inner"}}')
it produces 53 bytes (reduced size being one of the JSONB
benefits):
CC 33 57 61 72 72 61 79 6B 17 61 17 62 17 63 67 6E 65 73 74 65 64 BC 47 6E 61 6D 65 57 69 6E 6E 65 72 67 6E 75 6D 62 65 72 13 35 47 74 65 78 74 47 66 69 76 65
Not the same (and not decodable by the JSONDecoder
). But how do they behave? If I have the table
CREATE TABLE test (value BLOB);
I can insert both of those byte arrays, in the order described, and then do
SELECT
json(value) AS json_text,
json_valid(value, 6) AS any_valid_json,
json_valid(value, 8) valid_jsonb,
value->>'text' AS text,
value->>'number' AS number,
json_patch(value, '{"added":95}') AS updated
FROM test;
The results are equal except for valid_jsonb
, which is 0
for the first and 1
for the row created with the SQLite jsonb()
function.
Implications
The SQLite json*
functions work seamlessly with JSON
stored as TEXT
, JSON
stored as BLOB
and JSONB
stored as BLOB
so there’s no functional harm in having a mixture of these.
The only negative implication I can think of is for those who wanted the size and speed advantages of JSONB
but their types are inadvertently saved as regular JSON
in a BLOB
column.
Ensuring JSONB
Although I think it should be left to users to produce and ingest well-formed JSONB
, a brute force approach could involve something like
let data = try JSONEncoder().encode(example)
let text = String(data: data, encoding: .utf8)!
let sql = "INSERT INTO test (value) VALUES (jsonb('\(text)'))"
It would then be necessary on the way out to do something like
SELECT json(value) FROM test
to transform JSONB
back to JSON
that the standard JSONDecoder
can process.
Chapter 8 😄
In my own project, I went nuts and wrote a JSONBEncoder
and JSONBDecoder
and a JSONBConvertible
protocol that conforms to DatabaseValueConvertible
to handle all this. That experience convinced me it’s not functionality that belongs in GRDB itself. But I would like to see GRDB help users make smart choices about JSON
storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎁 Thank you very much for your detailed response 🤩
Personally, I don’t expect GRDB to ensure well-formed
JSONB
any more than it ensures well-formedJSON
(which is not at all, I think).
Yes, this is true. The library provides ways to provide SQLite precise instructions, and, on top of them, conveniences/shortcuts.
GRDB 6 provides conveniences for Codable
record types: if the app does not provide precise SQLite instructions, then GRDB encodes and decodes "complex" properties as JSON strings.
I don't know yet how and when GRDB 7 will allow those Codable record type to specify that they prefer JSONB. Be assured that it will be opt-in, i.e. that apps will have to be explicit in order to trigger JSONB storage.
My preliminary explorations have revealed that it will be... difficult. I do not want to introduce breaking changes for people who do not known about JSONB. GRDB currently assumes that Swift and SQLite can communicate through the DatabaseValue
type, which encodes SQLite data types. But JSONB is not a data type. Support for JSONB is not performed at the value level, but at the SQL level:
-- What GRDB does when inserting a record
INSERT INTO player(id, name, address) VALUES (?, ?, ?)
-- What support for JSONB requires
INSERT INTO player(id, name, address) VALUES (?, ?, JSONB(?))
-- What GRDB does when selecting a record
SELECT * FROM player
-- What support for JSONB requires
SELECT id, name, JSON(address) FROM player
I would only be cautious about anything that could imply that standard
JSONEncoder
bytes becomeJSONB
just because they’re inserted into aBLOB
column.
We're on the same track. This will not happen.
Example
I put together a small example to illustrate the meaning of “real”
JSONB
(in excruciating detail!).
Thanks. I added the missing options for VALID_JSON
😅
Implications
The SQLite
json*
functions work seamlessly withJSON
stored asTEXT
,JSON
stored asBLOB
andJSONB
stored asBLOB
so there’s no functional harm in having a mixture of these.
Yes. I think I got this right.
JSON
stored asBLOB
There lies the SQLite "bug", right? JSON stored as BLOB was supposed to be invalid, and documented so, but they let it slip through, and now they're stuck with it.
GRDB tries hard to be a good citizen and stores JSON as TEXT. Codable record types can even instruct GRDB to store their Swift Data
properties that contain JSON as TEXT (DatabaseDataEncodingStrategy.text
).
The only negative implication I can think of is for those who wanted the size and speed advantages of
JSONB
but their types are inadvertently saved as regularJSON
in aBLOB
column.
I suppose those people can write a migration that converts to JSONB the values that are JSON_VALID(_, 2)
.
Ensuring JSONB
Yes. This is a real challenge for GRDB, as I mentionned at the beginning of this post.
Chapter 8 😄
In my own project, I went nuts and wrote a
JSONBEncoder
andJSONBDecoder
and aJSONBConvertible
protocol that conforms toDatabaseValueConvertible
to handle all this. That experience convinced me it’s not functionality that belongs in GRDB itself. But I would like to see GRDB help users make smart choices aboutJSON
storage.
Wow. Since JSONB is a private format, I guess you need to use a private SQLite connection that performs the decoding, right?
Do we agree that you only need this because GRDB is unable to ensure JSONB when writing, and convert JSONB to JSON when reading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-- What GRDB does when selecting a record SELECT * FROM player -- What support for JSONB requires SELECT id, name, JSON(address) FROM player
This one is already possible today:
struct Address: Codable { ... }
struct Player: Codable, FetchableRecord, TableRecord {
var id: Int64
var name: String
var address: Address
static var databaseSelection: [any SQLSelectable] {
[
Column("id"),
Column("name"),
Database.json(Column("address")),
]
}
}
#1700 (not merged yet now merged) makes it better, because you do not have to think about updating the selection when you add a column to the database table:
struct Player: Codable, FetchableRecord, TableRecord {
var id: Int64
var name: String
var address: Address
// <- you can add properties for new columns here...
static var databaseSelection: [any SQLSelectable] {
// ... without updating this method
[
.allColumns(excluding: ["address"]), // NEW with #1700
Database.json(Column("address")),
]
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-- What support for JSONB requires INSERT INTO player(id, name, address) VALUES (?, ?, JSONB(?))
Right. In the original JSONB
announcement discussion, we see
I think one thing that might be helpful is to explain that the expected way to insert JSONB into an SQL table is something to the effect of:
CREATE TABLE log (tstamp, json); INSERT INTO log VALUES (datetime(), jsonb(?));Where you bind RFC 8259 JSON text to the statement, and SQLite does the JSON to JSONB conversion. Then to pull data from the database, you again leverage SQLite to get the output as RFC 8259 JSON text:
SELECT tstamp, json(json) FROM log;(emphasis in original)
There lies the SQLite "bug", right? JSON stored as BLOB was supposed to be invalid, and documented so, but they let it slip through, and now they're stuck with it.
Exactly. It’s the thing that made me jump the gun a bit 🙂 and think SQLite was making proper JSONB
out of JSONEncoder
Data
starting with the first GRDB 7 beta.
Since JSONB is a private format, I guess you need to use a private SQLite connection that performs the decoding, right?
Although brief, the SQLite documentation does describe the format. And someone had already implemented it in Rust. Between those, it wasn’t hard to read and write the correct bytes (creating Encoder/Decoder conforming classes was another matter 😵💫).
The integration with GRDB looks like
public extension JSONBConvertible {
init?(json data: Data?) {
guard let data else { return nil }
do {
self = try Self(from: JSONBDecoder(from: data))
} catch {
log.error("Failed to JSONB decode \(data.bytes) as \(Self.Type.self): \(error)")
return nil
}
}
var databaseValue: DatabaseValue {
do {
return try JSONBEncoder.encode(self).databaseValue
} catch {
log.error("Failed to JSONB encode \(Self.Type.self) as Data")
return DatabaseValue.null
}
}
static func fromDatabaseValue(_ dbValue: DatabaseValue) -> Self? {
if case let .blob(data) = dbValue.storage {
self.init(json: data)
} else {
nil
}
}
}
Nothing too crazy in that, I hope.
Do we agree that you only need this because GRDB is unable to ensure JSONB when writing, and convert JSONB to JSON when reading?
Yes, that seems right. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my own project, I went nuts and wrote a JSONBEncoder and JSONBDecoder and a JSONBConvertible protocol that conforms to DatabaseValueConvertible to handle all this. That experience convinced me it’s not functionality that belongs in GRDB itself. But I would like to see GRDB help users make smart choices about JSON storage.
Sorry to pry but is this something you're open to open sourcing? Even as a throwaway repo. Thanks for considering @Jason-Abbott
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aehlke Happy to share what I can. I don’t have time to make it a proper library right now but I’ll wrap it into a repo that I’ll link in a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aehlke This will have to do for now https://github.com/Jason-Abbott/swift-sqlite-jsonb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much!! Excited to dig into this. I will share back if I adopt it.
Hello @Jason-Abbott, I have released GRDB 7 (not beta) with this PR, but without JSONB support from record types. I think it will be possible to add it later at some point, with an extra configuration. Something like: protocol TableRecord {
// An array of columns that contain JSONB.
static var jsonbDatabaseColumns: [String] { get }
}
// Usage
struct Player: TableRecord {
static var jsonbDatabaseColumns: [String] { ["address"] }
} (TBD) Unless I'm mistaken, this should be enough to help GRDB generate I must say that this feature has low priority: I don't have any use for JSONB myself, and unless I'm mistaken no Apple OS ships with an SQLite version that supports JSONB. |
Based on all the GRDB activity I’m seeing, I think you must have a couple clones of yourself! This is all super and totally agree on the lower priority for JSONB. |
Thanks for your kind message. Your support and insights were well appreciated :-) I feel confident that the library has grounded bases and can make progress! |
This pull requests adds support for the JSONB SQL functions. It does not add support for JSONB columns in record types.
They are available from SQLite 3.45. Currently only custom SQLite builds provide this recent SQLite version.
In this pull request, GRDB will also prefer using JSONB representations whenever possible. Compare: