Skip to content

Conversation

@jroith
Copy link

@jroith jroith commented Jan 24, 2024

This is a fork that I'm currently maintaining internally, that optimised performance in certain cases. The implementations shown here is not particularly efficient, it's not fully general purpose and I'm providing it to share some ideas and because @angrykoala inquired about it in another MR in the cypher builder.

Let me try to quickly break down what is going on here.

We have a schema with many interfaces and wanted to achieve two goals:

  1. It should be possible to efficiently selecting very common interfaces in Cypher without OR-ing together a large number of labels.
  2. The library tends to unnecessarily build large UNIONs, often nested, which results in long query planning times and giants query plans that execute slowly, too.

We have addressed this for our case like this:

  1. We preprocess our schema from a meta-schema. Here we declare, amongst other things, which interfaces should get a neo4j label and which should not. We do this to strike a balance between adding to many labels for neo4j to pack them in the bitfield and between having long selectors for base interfaces.
  2. Some types can also have a composite label where the combination of labels identifies the type.
  3. We add a field mainType to certain (most) types that redundantly stores the type of the node (via a populatedBy). This is done in order to easily be able to determine the type of the node even if base interface labels are present and without having to look at the schema and more importantly without having to add a string literal in the __resolveType property for reasons that are explained below.
  4. We then add a LabelManager to the application (not the library, passing it in the context object) that knows how to build short node selectors (such as "A", "B", "B&C" or "I1&I2") given a type or interface. It does so based on the labels available for each type and looking up or down the hierarchy. This code is not included because it is not in the library and could be implemented differently.
  5. The LabelManager also indicates if the mainType property is known to be present on any specific type.

These changes are made, amongst other reasons, in order to be able to make each branch of a UNION as similar as possible and possibly identical. If an interface is queried, we can try to instead use a common label or common expression (if no label is available). Likewise we can replace __resolveType with the mainType property. This is usually enough to make the code in different UNION branches identical. If a "... on Foo" notation is used or perhaps in other cases such as authorisation or whatever, the code may differ. We then check all cases that are identical and collapse those cases to a single one and leave the other cases in place, excluding the combined ones using a predicate.

In practice this sometimes brings the original execution time down from 5 seconds to 100ms, especially due to long query planning times. A drawback is that the query generation itself is inefficient, because the query has to be built twice and recursively before being compressed and we only compare the resulting Cypher string which is robust but again not efficient. Since there is no cache in the library this is not optimal.

Nevertheless it is still much better for our cases and a negligible cost.

The patched library is a drop-in replacement because it will not have an effect unless the labelManager is present on and does not expose any new APIs.

Although I don't really have any hope that this MR will be merged, perhaps it can provide some useful ideas for the future that may help to improve the query generation for interfaces to a point where the fork is no longer necessary and can be dropped.

…erfaces

# Conflicts:
#	packages/graphql/src/translate/create-projection-and-params.ts
#	packages/graphql/src/translate/queryAST/ast/operations/ReadOperation.ts
#	packages/graphql/src/translate/queryAST/ast/operations/composite/CompositeReadOperation.ts
#	packages/graphql/src/translate/queryAST/ast/operations/composite/CompositeReadPartial.ts
@changeset-bot
Copy link

changeset-bot bot commented Jan 24, 2024

⚠️ No Changeset found

Latest commit: 5e52353

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions
Copy link
Contributor

CLA Assistant Lite bot:
Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request

@angrykoala
Copy link
Member

Hi @jroith
Thanks for sending this for reference. Just to make it a bit easier to follow could you sent me an example of a GraphQL query and the Cypher result of your approach?

@jroith
Copy link
Author

jroith commented Jan 24, 2024

Yes. So for example, let's say you have a schema like:

interface Base {
 name: String
}

type Start {
 id: ID! @id @unique
 bases: [Base!]!  @relationship(type: "HAS_BASE", direction: OUT)
}

type A implements Base {
 name: String
 something: String
}

type B implements Base {
 name: String
 somethingElse: String
}

type C implements Base {
 name: String
 different: String
}

[...]



type Z implements Base {
 name: String
}


and a query:

Yes. So for a simple query:

{
  start(where: {id:"foo"}) {
    bases {
     name
    }
  }
}

The 4.4.5 library might generate Cypher like:

MATCH (this:Start)
WHERE this.id = $param0
CALL {
    WITH this
    CALL {
        WITH *
        MATCH (this)-[this0:HAS_BASE]->(this1:A)
        WITH this1 { .name, __resolveType: "A", __id: id(this1) } AS this1
        RETURN this1 AS var2
        UNION
        WITH *
        MATCH (this)-[this3:HAS_BASE]->(this4:B)
        WITH this4 { .name, __resolveType: "B", __id: id(this4) } AS this4
        RETURN this4 AS var2
        UNION
        WITH *
        MATCH (this)-[this5:HAS_BASE]->(this6:C)
        WITH this6 { .name, __resolveType: "C", __id: id(this6) } AS this6
        RETURN this6 AS var2
        UNION
        WITH *
        MATCH (this)-[this7:HAS_BASE]->(this8:D)
        WITH this8 { .name, __resolveType: "D", __id: id(this8) } AS this8
        RETURN this8 AS var2
        UNION
        
        
        ... MANY MORE ...
        
        UNION
        WITH *
        MATCH (this)-[this35:HAS_BASE]->(this36:Z)
        WITH this36 { .name, __resolveType: "Z", __id: id(this36) } AS this36
        RETURN this36 AS var2
    }
    WITH var2
    RETURN collect(var2) AS var2
}

While this patch will generate:

MATCH (this:Start)
WHERE this.id = $param0
CALL {
    WITH this
    CALL {
        WITH *
        MATCH (this)-[this0:HAS_BASE]->(this1:Base)
        WITH this1 { .name, __resolveType: this1.mainType, __id: id(this1) } AS this1
        RETURN this1 AS var2
    }
    WITH var2
    RETURN collect(var2) AS var2
}

If there are some differences that cannot be unified, only those will create an extra UNION case.
If "Base" has not been defined a n interface that should have a label, something like MATCH (this)-[this0:HAS_BASE]->(this1:A|B|C|...|Z) might be used. This part is up to the LabelManager and not included in the MR.

@jroith
Copy link
Author

jroith commented Jan 24, 2024

Just to clarify the mechanism a bit more: In the example shown above, if you substitute the interface selector e.g. "Base" or "A|B|..." for the concrete type label (A in the first union) and __resolveType for "this1.mainType" then if the cypher builder is called independently for each union case it will produce an identical string.

This is then used to identify the cases that can be unified and the ones that can't and the translation is called a second time to generate only the required cases (one in this case, since no "... on" was used and everything can be folded).

@angrykoala angrykoala marked this pull request as draft February 29, 2024 11:51
@angrykoala
Copy link
Member

Hi @jroith
As this is more of a reference to make performance improvements I changed this PR to a draft, as it is not intended to be merged

Thanks again for sharing

@darrellwarde
Copy link
Contributor

This has been logged on our internal backlog and we're closing this reference PR to clear our PR backlog. Thanks so much for the suggestion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Pull requests

Development

Successfully merging this pull request may close these issues.

4 participants