-
-
Notifications
You must be signed in to change notification settings - Fork 17
Open
Description
Applying the command bzcat latest-all.json.bz2 |wikibase-dump-filter --simplify --claim 'P698' |jq '[.id,.claims.P698,.claims.P921]' -c >PMID.ndjson results in >30M lines like this:
["Q94880466",["19484558"],null]
["Q17485067",["21609473"],["Q18123741","Q12156","Q193430"]]
where the first case is an item with P698 claim but without P921 claims, and the second has P698 and P921 claims. However out of these 30M there are at least six (6) that are different:
ralf@ark:~/wikidata> grep '[]' PMID.ndjson
["Q30573040",["23057853"],[]]
["Q30523792",["22888462"],[]]
["Q48835971",[],null]
["Q50125628",[],null]
["Q58616403",[],null]
["Q31128925",["27613570"],[]]
Note that 3 don't have P698 (which should not happen given the filter), and 3 have [] instead of null for no P921.
I'm not claiming there is a bug in wikibase-dump-filter, just that this needs investigating, and the ticket is a start. But maybe you have seen this and have an immediate explanation?
Metadata
Metadata
Assignees
Labels
No labels