You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've run into some issues with the .toSQL function on Drizzle query objects, and i'd like to discuss the potential issues with the internal SQL query building.
To provide some background, i've written a an ingestion pipeline which utilizes Drizzle ORM for managing an extensive relational schema (over 100 related tables). This process runs a frequent schedule, pulling new/modified data out of a PostgreSQL db and bulk inserting into an Opensearch cluster. Managing the schema with Drizzle has made my life much easier, i can accept dynamic query objects in the base service layer, and append any logic to the query such as timestamp restriction from last successful runtime for the given ingestion process.
This process works very smoothly, however due to external factors, it's possible this ingestion pipeline could incur some lag. In this scenario, the internal ingestion service layer exports the raw Postgres SQL query from drizzle using the .toSQL function. We do this so we can run the potentially high volume/execution time query independent of the drizzle connection, utilizing a Postgres cursor to guarantee memory size and database performance.
The issue begins here - the exported raw SQL joins tables as json arrays, using JSON_BUILD_ARRAY which creates an array of unnamed fields. Each value in the array is in the order which your Drizzle schema object columns were defined.
// table2 { id: uuid, subAttribute: string, category: string, style: string, }
`` // imagine the relation called subAttribute for this is created on table 1 and exported with the schema for the drizzle client;
Calling .toSQL() on the above, would generate a SQL query like below:
SELECT "table1"."id", "table1"."group", "table1_table2"."data" AS "subAttribute" FROM "table_1" "table1" LEFT JOIN LATERAL ( SELECT JSON_BUILD_ARRAY ( "table1_table2"."id", "table1_table2"."subAttribute", "table1_table2"."category", "table1_table2"."style" ) as "data" FROM ( SELECT * FROM "table_2" "table1_table2" WHERE "table1_table2"."id" = "table1"."subAttributeID" LIMIT 1 ) "table1_table2" ) "table1_table2" ON TRUE
The query would result in the below output, where subAttribute values are unnamed array elements: { id: <some uuid>, group: "group name", subAttribute: [<some uuid>, "subattribute name", "category name", "style type"] }
This presents an issue - I cannot immediately deserialize the results from Postgres into the nested object that is my in memory schema representation. The overcome this, i've implemented a recursive function which goes depth-first into the schema tree, constructing each join table value array into an object with it's expected key-value pair - for every record returned from the database query. This works dynamically because i can fetch the column information from the given drizzle table object.
The primary question is, why does drizzle generate SQL for join tables which do not utilize the users column names? There are many Postgres functions which enable naming JSON objects with keywords, it seems it should be simple to utilize the ORM table's column names as those object keys. I am able to manually modify an exported SQL query such that the join tables do return appropriate key: value objects. You can replace JSON_BUILD_ARRAY in the above example with JSON_OBJECT providing object keys, and deserialization from drizzle execution and manual raw SQL execution would be 1 - 1.
Was this an intentional design by the Drizzle team? How does the internal drizzle package deserialize the results to the ORM objects internally? Was there some sort of performance issue with using JSON_OBJECT instead of JSON_BUILD_ARRAY? Should i open a feature request/bug for this issue?
Thanks in advance to any input on the above scenario. The recursive workaround to enable manual query execution comes at a serious performance loss and introduces tons of room for error. Any insight on better architecture, decisions, etc would be so helpful!!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi everyone,
I've run into some issues with the
.toSQL
function on Drizzle query objects, and i'd like to discuss the potential issues with the internal SQL query building.To provide some background, i've written a an ingestion pipeline which utilizes Drizzle ORM for managing an extensive relational schema (over 100 related tables). This process runs a frequent schedule, pulling new/modified data out of a PostgreSQL db and bulk inserting into an Opensearch cluster. Managing the schema with Drizzle has made my life much easier, i can accept dynamic query objects in the base service layer, and append any logic to the query such as timestamp restriction from last successful runtime for the given ingestion process.
This process works very smoothly, however due to external factors, it's possible this ingestion pipeline could incur some lag. In this scenario, the internal ingestion service layer exports the raw Postgres SQL query from drizzle using the
.toSQL
function. We do this so we can run the potentially high volume/execution time query independent of the drizzle connection, utilizing a Postgres cursor to guarantee memory size and database performance.The issue begins here - the exported raw SQL joins tables as json arrays, using
JSON_BUILD_ARRAY
which creates an array of unnamed fields. Each value in the array is in the order which your Drizzle schema object columns were defined.Suppose the example drizzle schema objects
// table 1
{
id: uuid,
group: string,
subAttributeID: uuid,
}
// table2
{
id: uuid,
subAttribute: string,
category: string,
style: string,
}
``
// imagine the relation called subAttribute for this is created on table 1 and exported with the schema for the drizzle client;
The following drizzle query,
this.drizzle.query.table1.findMany({
with: {
subAttribute: true,
},
})
Calling .toSQL() on the above, would generate a SQL query like below:
SELECT
"table1"."id",
"table1"."group",
"table1_table2"."data" AS "subAttribute"
FROM
"table_1" "table1"
LEFT JOIN LATERAL (
SELECT
JSON_BUILD_ARRAY (
"table1_table2"."id",
"table1_table2"."subAttribute",
"table1_table2"."category",
"table1_table2"."style"
) as "data"
FROM
(
SELECT * FROM "table_2" "table1_table2"
WHERE "table1_table2"."id" = "table1"."subAttributeID"
LIMIT 1
) "table1_table2"
) "table1_table2" ON TRUE
The query would result in the below output, where subAttribute values are unnamed array elements:
{ id: <some uuid>, group: "group name", subAttribute: [<some uuid>, "subattribute name", "category name", "style type"] }
This presents an issue - I cannot immediately deserialize the results from Postgres into the nested object that is my in memory schema representation. The overcome this, i've implemented a recursive function which goes depth-first into the schema tree, constructing each join table value array into an object with it's expected key-value pair - for every record returned from the database query. This works dynamically because i can fetch the column information from the given drizzle table object.
The primary question is, why does drizzle generate SQL for join tables which do not utilize the users column names? There are many Postgres functions which enable naming JSON objects with keywords, it seems it should be simple to utilize the ORM table's column names as those object keys. I am able to manually modify an exported SQL query such that the join tables do return appropriate key: value objects. You can replace
JSON_BUILD_ARRAY
in the above example withJSON_OBJECT
providing object keys, and deserialization from drizzle execution and manual raw SQL execution would be 1 - 1.Was this an intentional design by the Drizzle team? How does the internal drizzle package deserialize the results to the ORM objects internally? Was there some sort of performance issue with using JSON_OBJECT instead of JSON_BUILD_ARRAY? Should i open a feature request/bug for this issue?
Thanks in advance to any input on the above scenario. The recursive workaround to enable manual query execution comes at a serious performance loss and introduces tons of room for error. Any insight on better architecture, decisions, etc would be so helpful!!
Beta Was this translation helpful? Give feedback.
All reactions