Skip to content

Support for JSONB expressions #1699

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 26, 2025
Merged

Support for JSONB expressions #1699

merged 6 commits into from
Jan 26, 2025

Conversation

groue
Copy link
Owner

@groue groue commented Jan 18, 2025

This pull requests adds support for the JSONB SQL functions. It does not add support for JSONB columns in record types.

They are available from SQLite 3.45. Currently only custom SQLite builds provide this recent SQLite version.

Database.jsonb            // JSONB
Database.jsonbArray       // JSONB_ARRAY
Database.jsonbExtract     // JSONB_EXTRACT
Database.jsonbGroupArray  // JSONB_GROUP_ARRAY
Database.jsonbGroupObject // JSONB_GROUP_OBJECT
Database.jsonbInsert      // JSONB_INSERT
Database.jsonbObject      // JSONB_OBJECT
Database.jsonbPatch       // JSONB_PATCH
Database.jsonbRemove      // JSONB_REMOVE
Database.jsonbReplace     // JSONB_REPLACE
Database.jsonbSet         // JSONB_SET

In this pull request, GRDB will also prefer using JSONB representations whenever possible. Compare:

// Fetch a JSON array of addresses
//
// SELECT JSON_GROUP_ARRAY(JSON(address)) FROM player  -- Before
// SELECT JSON_GROUP_ARRAY(JSONB(address)) FROM player -- NEW
let address = JSONColumn("address")
let request = Player.select(Database.jsonGroupArray(jsonColumn))
let jsonData = try Data.fetchOne(db, request)

@@ -2099,8 +2120,9 @@ extension SQLExpression {
case .jsonValue:
if isJSONValue {
return self
} else if sqlite3_libversion_number() >= 3045000 {
return .function("JSONB", [self])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My types are all built from raw SQL so I don’t have a sense of the usage here.

It does bring to mind the SQLite bug that can make it seem like you’re inserting valid JSONB when really you’re not (as I experienced).

Probably not pertinent here but I do think GRDB should include some guardrails (even if just documentation) so users don’t confuse regular JSON stored as a byte array with actual JSONB.

Copy link
Owner Author

@groue groue Jan 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @Jason-Abbott!

I know you use raw SQL, so I'm not sure this PR is for you indeed. Actually I'm not sure about the kind of JSONB-related features you'd like to see in GRDB. Maybe you already wrote about them, but I can not put my finger on it.

This PR is still a draft, I need to seriously review it. Its intent is just to define JSONB functions for people who use the SQL builder, and to use JSONB instead of JSON in case of implicit JSON conversions.

Those "implicit conversions" are: it is JSON conversions that GRDB performs when the user declares that a value should be interpreted as JSON:

// The address column contains JSON objects
let plainColumn = Column("address") // Plain column, not interpreted as JSON
let jsonColumn = JSONColumn("address") // JSON column interpreted as JSON

// -- Not an array of JSON objects
// SELECT JSON_GROUP_ARRAY(address) FROM player
Player.select(Database.jsonGroupArray(plainColumn))

// -- An array of JSON objects
// SELECT JSON_GROUP_ARRAY(JSON(address)) FROM player -- GRDB 6
// SELECT JSON_GROUP_ARRAY(JSONB(address)) FROM player -- This PR, if SQLite 3.45+
Player.select(Database.jsonGroupArray(jsonColumn))

// Equivalent
// -- An array of JSON objects
// SELECT JSON_GROUP_ARRAY(JSON(address)) FROM player -- GRDB 6
// SELECT JSON_GROUP_ARRAY(JSONB(address)) FROM player -- This PR, if SQLite 3.45+
Player.select(Database.jsonGroupArray(plainColumn.asJSON))

It does bring to mind the SQLite bug that can make it seem like you’re inserting valid JSONB when really you’re not (as I #1656 (comment)).

I'm not sure I understand. Which specific problem are you thinking about? Please be specific, because JSON is a large topic and I may guess wrong.

Probably not pertinent here but I do think GRDB should include some guardrails (even if just documentation) so users don’t confuse regular JSON stored as a byte array with actual JSONB.

Please be explicit as well. I do not understand which confusion and guardrails you are referring to.

Your experience with JSONB is precious, when mine is basically zero. I'd really appreciate if you could be as clear and specific as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I don’t expect GRDB to ensure well-formed JSONB any more than it ensures well-formed JSON (which is not at all, I think).

I would only be cautious about anything that could imply that standard JSONEncoder bytes become JSONB just because they’re inserted into a BLOB column.

Example

I put together a small example to illustrate the meaning of “real” JSONB (in excruciating detail!). If I define this type

struct Example: Codable {
     struct Nested: Codable {
         let name: String
     }
     let text: String
     let number: Int
     let array: [String]
     let nested: Nested
 }

and create this instance

let example = Example(
   text: "five",
   number: 5,
   array: ["a", "b", "c"],
   nested: Example.Nested(name: "inner")
)

the standard JSONEncoder will encode this as the Data

7B 22 74 65 78 74 22 3A 22 66 69 76 65 22 2C 22 6E 75 6D 62 65 72 22 3A 35 2C 22 6E 65 73 74 65 64 22 3A 7B 22 6E 61 6D 65 22 3A 22 69 6E 6E 65 72 22 7D 2C 22 61 72 72 61 79 22 3A 5B 22 61 22 2C 22 62 22 2C 22 63 22 5D 7D

which are simply the 74 bytes for the string

{"text":"five","number":5,"array":["a","b","c"],"nested":{"name":"inner"}}

If, on the other hand, and with exactly the same JSON string, I do

SELECT jsonb('{"text":"five","number":5,"array":["a","b","c"],"nested":{"name":"inner"}}')

it produces 53 bytes (reduced size being one of the JSONB benefits):

CC 33 57 61 72 72 61 79 6B 17 61 17 62 17 63 67 6E 65 73 74 65 64 BC 47 6E 61 6D 65 57 69 6E 6E 65 72 67 6E 75 6D 62 65 72 13 35 47 74 65 78 74 47 66 69 76 65

Not the same (and not decodable by the JSONDecoder). But how do they behave? If I have the table

CREATE TABLE test (value BLOB);

I can insert both of those byte arrays, in the order described, and then do

SELECT
    json(value) AS json_text,
    json_valid(value, 6) AS any_valid_json,
    json_valid(value, 8) valid_jsonb,
    value->>'text' AS text,
    value->>'number' AS number,
    json_patch(value, '{"added":95}') AS updated
FROM test;

The results are equal except for valid_jsonb, which is 0 for the first and 1 for the row created with the SQLite jsonb() function.

Implications

The SQLite json* functions work seamlessly with JSON stored as TEXT, JSON stored as BLOB and JSONB stored as BLOB so there’s no functional harm in having a mixture of these.

The only negative implication I can think of is for those who wanted the size and speed advantages of JSONB but their types are inadvertently saved as regular JSON in a BLOB column.

Ensuring JSONB

Although I think it should be left to users to produce and ingest well-formed JSONB, a brute force approach could involve something like

let data = try JSONEncoder().encode(example)
let text = String(data: data, encoding: .utf8)!
let sql = "INSERT INTO test (value) VALUES (jsonb('\(text)'))"

It would then be necessary on the way out to do something like

SELECT json(value) FROM test

to transform JSONB back to JSON that the standard JSONDecoder can process.

Chapter 8 😄

In my own project, I went nuts and wrote a JSONBEncoder and JSONBDecoder and a JSONBConvertible protocol that conforms to DatabaseValueConvertible to handle all this. That experience convinced me it’s not functionality that belongs in GRDB itself. But I would like to see GRDB help users make smart choices about JSON storage.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎁 Thank you very much for your detailed response 🤩

Personally, I don’t expect GRDB to ensure well-formed JSONB any more than it ensures well-formed JSON (which is not at all, I think).

Yes, this is true. The library provides ways to provide SQLite precise instructions, and, on top of them, conveniences/shortcuts.

GRDB 6 provides conveniences for Codable record types: if the app does not provide precise SQLite instructions, then GRDB encodes and decodes "complex" properties as JSON strings.

I don't know yet how and when GRDB 7 will allow those Codable record type to specify that they prefer JSONB. Be assured that it will be opt-in, i.e. that apps will have to be explicit in order to trigger JSONB storage.

My preliminary explorations have revealed that it will be... difficult. I do not want to introduce breaking changes for people who do not known about JSONB. GRDB currently assumes that Swift and SQLite can communicate through the DatabaseValue type, which encodes SQLite data types. But JSONB is not a data type. Support for JSONB is not performed at the value level, but at the SQL level:

-- What GRDB does when inserting a record
INSERT INTO player(id, name, address) VALUES (?, ?, ?)
-- What support for JSONB requires
INSERT INTO player(id, name, address) VALUES (?, ?, JSONB(?))
-- What GRDB does when selecting a record
SELECT * FROM player
-- What support for JSONB requires
SELECT id, name, JSON(address) FROM player

I would only be cautious about anything that could imply that standard JSONEncoder bytes become JSONB just because they’re inserted into a BLOB column.

We're on the same track. This will not happen.

Example

I put together a small example to illustrate the meaning of “real” JSONB (in excruciating detail!).

Thanks. I added the missing options for VALID_JSON 😅

Implications

The SQLite json* functions work seamlessly with JSON stored as TEXT, JSON stored as BLOB and JSONB stored as BLOB so there’s no functional harm in having a mixture of these.

Yes. I think I got this right.

JSON stored as BLOB

There lies the SQLite "bug", right? JSON stored as BLOB was supposed to be invalid, and documented so, but they let it slip through, and now they're stuck with it.

GRDB tries hard to be a good citizen and stores JSON as TEXT. Codable record types can even instruct GRDB to store their Swift Data properties that contain JSON as TEXT (DatabaseDataEncodingStrategy.text).

The only negative implication I can think of is for those who wanted the size and speed advantages of JSONB but their types are inadvertently saved as regular JSON in a BLOB column.

I suppose those people can write a migration that converts to JSONB the values that are JSON_VALID(_, 2).

Ensuring JSONB

Yes. This is a real challenge for GRDB, as I mentionned at the beginning of this post.

Chapter 8 😄

In my own project, I went nuts and wrote a JSONBEncoder and JSONBDecoder and a JSONBConvertible protocol that conforms to DatabaseValueConvertible to handle all this. That experience convinced me it’s not functionality that belongs in GRDB itself. But I would like to see GRDB help users make smart choices about JSON storage.

Wow. Since JSONB is a private format, I guess you need to use a private SQLite connection that performs the decoding, right?

Do we agree that you only need this because GRDB is unable to ensure JSONB when writing, and convert JSONB to JSON when reading?

Copy link
Owner Author

@groue groue Jan 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-- What GRDB does when selecting a record
SELECT * FROM player
-- What support for JSONB requires
SELECT id, name, JSON(address) FROM player

This one is already possible today:

struct Address: Codable { ... }

struct Player: Codable, FetchableRecord, TableRecord {
    var id: Int64
    var name: String
    var address: Address

    static var databaseSelection: [any SQLSelectable] {
        [
            Column("id"),
            Column("name"),
            Database.json(Column("address")),
        ]
    }
}

#1700 (not merged yet now merged) makes it better, because you do not have to think about updating the selection when you add a column to the database table:

struct Player: Codable, FetchableRecord, TableRecord {
    var id: Int64
    var name: String
    var address: Address
    // <- you can add properties for new columns here...

    static var databaseSelection: [any SQLSelectable] {
        // ... without updating this method
        [
            .allColumns(excluding: ["address"]), // NEW with #1700
            Database.json(Column("address")),
        ]
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-- What support for JSONB requires
INSERT INTO player(id, name, address) VALUES (?, ?, JSONB(?))

Right. In the original JSONB announcement discussion, we see

I think one thing that might be helpful is to explain that the expected way to insert JSONB into an SQL table is something to the effect of:

CREATE TABLE log (tstamp, json);
INSERT INTO log VALUES (datetime(), jsonb(?));

Where you bind RFC 8259 JSON text to the statement, and SQLite does the JSON to JSONB conversion. Then to pull data from the database, you again leverage SQLite to get the output as RFC 8259 JSON text:

SELECT tstamp, json(json) FROM log;

(emphasis in original)

There lies the SQLite "bug", right? JSON stored as BLOB was supposed to be invalid, and documented so, but they let it slip through, and now they're stuck with it.

Exactly. It’s the thing that made me jump the gun a bit 🙂 and think SQLite was making proper JSONB out of JSONEncoder Data starting with the first GRDB 7 beta.

Since JSONB is a private format, I guess you need to use a private SQLite connection that performs the decoding, right?

Although brief, the SQLite documentation does describe the format. And someone had already implemented it in Rust. Between those, it wasn’t hard to read and write the correct bytes (creating Encoder/Decoder conforming classes was another matter 😵‍💫).

The integration with GRDB looks like

public extension JSONBConvertible {
    init?(json data: Data?) {
        guard let data else { return nil }

        do {
            self = try Self(from: JSONBDecoder(from: data))
        } catch {
            log.error("Failed to JSONB decode \(data.bytes) as \(Self.Type.self): \(error)")
            return nil
        }
    }

    var databaseValue: DatabaseValue {
        do {
            return try JSONBEncoder.encode(self).databaseValue
        } catch {
            log.error("Failed to JSONB encode \(Self.Type.self) as Data")
            return DatabaseValue.null
        }
    }

    static func fromDatabaseValue(_ dbValue: DatabaseValue) -> Self? {
        if case let .blob(data) = dbValue.storage {
            self.init(json: data)
        } else {
            nil
        }
    }
}

Nothing too crazy in that, I hope.

Do we agree that you only need this because GRDB is unable to ensure JSONB when writing, and convert JSONB to JSON when reading?

Yes, that seems right. 👍

Copy link

@aehlke aehlke Mar 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my own project, I went nuts and wrote a JSONBEncoder and JSONBDecoder and a JSONBConvertible protocol that conforms to DatabaseValueConvertible to handle all this. That experience convinced me it’s not functionality that belongs in GRDB itself. But I would like to see GRDB help users make smart choices about JSON storage.

Sorry to pry but is this something you're open to open sourcing? Even as a throwaway repo. Thanks for considering @Jason-Abbott

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aehlke Happy to share what I can. I don’t have time to make it a proper library right now but I’ll wrap it into a repo that I’ll link in a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much!! Excited to dig into this. I will share back if I adopt it.

@groue groue merged commit 2e2c0ea into development Jan 26, 2025
8 checks passed
@groue groue deleted the dev/jsonb branch January 26, 2025 14:22
@groue
Copy link
Owner Author

groue commented Jan 26, 2025

Hello @Jason-Abbott,

I have released GRDB 7 (not beta) with this PR, but without JSONB support from record types. I think it will be possible to add it later at some point, with an extra configuration. Something like:

protocol TableRecord {
    // An array of columns that contain JSONB.
    static var jsonbDatabaseColumns: [String] { get }
}

// Usage
struct Player: TableRecord {
    static var jsonbDatabaseColumns: [String] { ["address"] }
}

(TBD)

Unless I'm mistaken, this should be enough to help GRDB generate INSERT statement with JSONB(?) values, and SELECT statements with JSON(...) columns, so that JSONB blobs remain in the database file, and the app only has to deal with JSON datas/strings. It will be important to check the interaction of JSONB with other record and Codable record features.

I must say that this feature has low priority: I don't have any use for JSONB myself, and unless I'm mistaken no Apple OS ships with an SQLite version that supports JSONB.

@Jason-Abbott
Copy link
Contributor

Based on all the GRDB activity I’m seeing, I think you must have a couple clones of yourself! This is all super and totally agree on the lower priority for JSONB.

@groue
Copy link
Owner Author

groue commented Jan 26, 2025

Thanks for your kind message. Your support and insights were well appreciated :-) I feel confident that the library has grounded bases and can make progress!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants