-
Notifications
You must be signed in to change notification settings - Fork 26
Description
As identified in icgc-argo/roadmap#1057, some filters produce 0 buckets inaccurately.
this ticket will serve as documentation log for the research into this issue, and to link its eventual fix.
Thus far, the working theory is that there's something wrong with the aggs filtering for array nested fields, and specifically for "in" operations.
Example
Using the Argo ticket, we can run a testing GraphQL query like this one, with no filters.
query ($SQON: JSON) {
file {
hits (filters: $SQON) {
total
}
aggregations(
filters: $SQON
include_missing: true
aggregations_filter_themselves: true
) {
donors__donor_id {
bucket_count
buckets {
doc_count
key
}
}
}
}
}
any anonymous user can see 1660 docs in the dev environment, as seen in the Arranger's GraphQL response:
{
"data": {
"file": {
"hits": {
"total": 1660
},
"aggregations": {
"donors__donor_id": {
"bucket_count": 6,
"buckets": [
{
"doc_count": 877,
"key": "DO250472"
},
{
"doc_count": 478,
"key": "DO253000"
},
{
"doc_count": 163,
"key": "DO35085"
},
{
"doc_count": 138,
"key": "DO252999"
},
{
"doc_count": 3,
"key": "DO250326"
},
{
"doc_count": 1,
"key": "DO250391"
}
]
}
}
}
}
}
Now lets assume the following SQON:
{
"content": {
"fieldName": "donors.specimens.specimen_tissue_source",
"value": "Solid tissue"
},
"op": "in"
}
Note: donors
here, is technically an array of those, and so are specimens
.
...which results in this response (aka the problem):
{
"data": {
"file": {
"hits": {
"total": 18
},
"aggregations": {
"donors__donor_id": {
"bucket_count": 0,
"buckets": []
}
}
}
}
}
but then, if you turn the SQON to use a "not_in" operation, we get this correct response:
{
"data": {
"file": {
"hits": {
"total": 1642
},
"aggregations": {
"donors__donor_id": {
"bucket_count": 5,
"buckets": [
{
"doc_count": 877,
"key": "DO250472"
},
{
"doc_count": 478,
"key": "DO253000"
},
{
"doc_count": 148,
"key": "DO35085"
},
{
"doc_count": 138,
"key": "DO252999"
},
{
"doc_count": 1,
"key": "DO250391"
}
]
}
}
}
}
}
Notice the totals are 1660 = 18 + 1642
, which tracks with the fact that the SQONs are not entirely broken 🤣