-
Notifications
You must be signed in to change notification settings - Fork 31
Description
Describe the bug
The neo4j-upload CLI command fails to successfully upload my KGX-formatted json lines files and complains with the following:
[KGX][jsonl_source.py][ parse] WARNING: Parse function cannot resolve the KGX file type in name nodes-tiny.jsonl. Skipped...
[KGX][jsonl_source.py][ parse] WARNING: Parse function cannot resolve the KGX file type in name edges-tiny.jsonl. Skipped...
To Reproduce
You can reproduce by running the following command, where nodes-tiny.jsonl and edges-tiny.jsonl are any KGX-formatted nodes/edges json lines files (and you have Neo4j running on localhost).
kgx neo4j-upload --uri bolt://localhost:7687 --username neo4j --password [password] --input-format jsonl nodes-tiny.jsonl edges-tiny.jsonl
Expected behavior
I would expect that command to upload my files to Neo4j successfully.
Additional context
I eventually figured out that if I tweak the names of my nodes/edges files so that they end with nodes.jsonl and edges.jsonl, then the command completes successfully. In other words, this command works normally (differs only in file names):
kgx neo4j-upload --uri bolt://localhost:7687 --username neo4j --password [password] --input-format jsonl tiny-nodes.jsonl tiny-edges.jsonl
I might have missed it, but I don't see this file naming requirement in the documentation. Could this requirement either be made looser (e.g., require that nodes/edges is anywhere in the file name, rather than at the end?), or be documented clearly somewhere?
(As a side note, I see that the KGX specification lists file names as nodes.jsonl and edges.jsonl, but it doesn't appear that that exact naming is actually expected in practice - examples in the kgx package documentation use different file names, like test_nodes.jsonl (here))