Example to produce SQL without Entity Framework Core! #1361
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why
We've always claimed it's possible to use JsonApiDotNetCore without Entity Framework Core. Just implement your own resource service or repository, right?
There's an implementation for MongoDB using its LINQ provider and there's an example that takes a LINQ expression, compiles it, then executes it against an in-memory list.
But we never told you what it takes to translate complex JSON:API requests to SQL yourself. So let's put our money where our mouth is: this PR shows how it can be done!
What
This PR provides an implementation for most of the JsonApiDotNetCore features. It supports all JSON:API endpoints (including atomic operations) and query string parameters (both top-level and deeply nested), as well as custom resource definition callbacks.
DapperRepository
implementsIResourceRepository
and uses Dapper to execute ADO.NET database queries and to materialize the returned result set into JSON:API resources. This example lets Entity Framework Core generate the database at startup (for convenience) but doesn't use it for serving requests. Information about the underlying database model (tables, columns, and foreign keys) is needed to produce SQL. This is provided byIDataModelService
. For convenience again,FromEntitiesDataModelService
obtains that information from the Entity Framework Core model at startup, but feel free to plug in something else.At a high level,
QueryLayer
is translated into a tree ofSqlTreeNode
objects representing the SQL query.SqlQueryBuilder
takes that as input and produces SQL text from it. It's mostly SQL-92 compliant and supports PostgreSQL, MySQL, and SQL Server. Adapting it to your own flavor should be straightforward.For example, the following GET request:
Gets translated by JsonApiDotNetCore (unchanged) into:
Then
DapperRepository
(with the help ofSelectStatementBuilder
) translates that into:For less involved requests, simpler SQL is produced where possible. For example:
Produces the following SQL (no sub-queries):
In the SQL above, the ordering on
Priority
andLastModifiedAt
originates from a resource definition.Limitations
First of all, this is not a mature, battle-tested, and optimized implementation. If you can, please use Entity Framework Core instead, because:
That said, if you're not too concerned about performance or absolute correctness (there are likely bugs; please report them via issues or PRs), you're welcome to try it out or use it as an inspiration to implement your own data access.
The following limitations apply:
JOIN LATERAL
/OUTER APPLY
,ROW_NUMBER() OVER (PARTITION BY...)
. I've spent a long time trying to pull it off but eventually gave up. I challenge you![EagerLoad]
support. It could be done, but it's rarely used.IResourceDefinition.OnRegisterQueryableHandlersForQueryStringParameters()
. Because noIQueryable
is involved, it doesn't apply.Implementation
At a high level, there are many similarities with how Entity Framework Core performs the translation to SQL. I often struggled to grasp patterns from its source code, so I inferred most using trial and error.
The tree of SQL nodes
In this example, all nodes derive from
SqlTreeNode
. Most of them are straightforward and don't require explanation.All nodes are immutable, yet they expose members as read-only collections. This has two reasons:
Dictionary<,>
with a string key. This is not true withImmutableDictionary<,>
, because it relies on indeterministicString.GetHashCode()
. We need to know the exact SQL in tests.The abstract type
TableSourceNode
contains a list ofColumnNode
s. Derived typeTableNode
represents a database table, whileSelectNode
represents a sub-query.ColumnNode
is also abstract, with derived typesColumnInTableNode
andColumnInSelectNode
.SelectNode
contains a list of abstractSelectorNode
s per table, with implementationsColumnSelectorNode
(SELECT t1.Name
),CountSelectorNode
(SELECT COUNT(*)
), andOneSelectorNode
(SELECT 1
).ColumnSelectorNode
points to aColumnNode
(optionally aliased), so it can be a column in a table or a sub-query.These abstract columns in
TableSourceNode
don't occur in the produced SQL. They are used to trace references back to an underlying database column. When a sub-query joins multiple tables, duplicate column names will be aliased to make them uniquely referenceable. In the example request above,t7.Id00 DESC
points to the selectort6."Id" AS Id00
, which points to the selectort5."Id"
, which points to theId
column in theTags
table.Another need for tracing references is that it's not always possible to remap in-place. A post-processor pulls stale references back into scope.
Joins and sub-queries
At a fundamental level, all tables are joined using
LEFT JOIN
. If the foreign key is defined at the left side of the JSON:API relationship and it's non-nullable, it gets optimized into anINNER JOIN
, which is more efficient. This optimization is not applicable when joining with a nestedQueryLayer
. For example, todo-items without any tags must still be returned at/todoItems?include=tags
.Initially, I thought another exception was needed for
has
andcount
in filters (see dotnet/efcore#32103). Ultimately, it comes down to interpreting what "null safe" means, so I chose to follow the Entity Framework Core behavior.It is generally safest to join every
include
(or nestedQueryLayer
) as a sub-query. But that makes the SQL harder to read and slower to execute. Depending on the nested query layer shape, the use of a sub-query can be optimized into a simple join against a table. Determining whether that optimization can be applied is non-trivial when pagination is supported. Entity Framework Core is very flexible: it employs several techniques to push the current query down into a sub-query and pull it out again at various stages while processing the input.In this example, we determine upfront whether a sub-query is needed. Orderings from sub-queries without pagination only need to appear in the top-level query. That just leaves filtering, which may constrain the set of related resources. So, a sub-query is only produced if the nested query layer has a filter. Due to all the complexity in Entity Framework Core, we sometimes generate simpler SQL (because opportunities are missed, there an open issue for that).
Materialization
As mentioned earlier, Dapper is used to parse the result set into .NET objects. It scans the returned column names and starts a new object each time an
Id
property occurs. To make that work, we must feed the list of expected object types upfront. This is easy to determine from the requested includes.We implement a
Map
method that Dapper calls with an array of all objects found in a single row. From there, we callResourceFieldAttribute.SetValue
repeatedly, while caching instances to preserve reference identity (which matches NoTrackingWithIdentityResolution). This is more flexible than the Entity Framework Core materializer, which requires everything to be ordered to support streaming. Therefore, you'll see Entity Framework Core often addsId
to orderings to achieve total ordering (with the downside of potentially not fully using an index). We're not doing that, which reduces pressure on the database server.A downside of Dapper usage is that column names in the result set must match property names exactly, so we use a post-processing step to compensate. You can see this in the example request above, where we turn:
into:
Resource/relationship updates
The tricky part is ensuring changes are sent in the right order so you won't hit a foreign key constraint violation. For example, updating a one-to-one relationship where the foreign key exists at the right side requires first updating a row in another table.
Some of our relationship updates are more efficient because we update/delete related records in one go using
WHERE "Id" IN (...)
, instead of issuing a SQL statement per match. On the other hand, the dynamic contents ofIN
reduces the effectiveness of execution plan caching, so I'm not sure it matters much.QUALITY CHECKLIST