You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using Python, I am downloading GoogleDrive files to a local server, and caching them in a server tmp folder for failsafe-restart at checkpoint. I load the file into a Document Object which I then parse with semantic parsing. I want to submit the document object to local nlm-ingestor server for processing as well. But If I submit filename and document object, if fails on 404. I don't want to create a publicly available downloads folder on the mlm-ingestor server. Is there a way to submit the document objects, vs the url, to [self.parse_pdf(pdf_file)] in [file_reader.py]?
The text was updated successfully, but these errors were encountered:
the url in the example does not match the routing rule in the server code. it should be [http://yourserverip/api/parsedocument?renderFormat=all.]. The additional folders were not in the server RESTAPI routing path. The REST route is [api/parsedocument]
The PDF rule parser is looking for a style attribute, which did not exist in TIKA text extraction from CV PDF documents I was using. It looks like there was an attempt to assign a default value if the style attribute was not found, causing the document to flush with an opaque error [404 NOT FOUND]. I tried conditionals base on style not found, but it threads down the code. As such, I added a condition, if style attribute not found, report it to the console log and flush the document. Then the calling client API switches to an other Parsing algorithm which does work.
BUG: The style parser bug needs to be fixed for the parser to work.
You may need to download the most recent jar file 2.9.2_v2, tika-server-standard-nlm-modified-2.9.2_v2.jar or downgrade to 2.4.1v6. There was a big update to bring nlm-ingestor in line with Apache Tika's most recent updates, but modifications to Tika's jars had to be done too. Bugs were introduced in 2.9.2_v1 regarding the style parser that may be fixed in v2.
Using Python, I am downloading GoogleDrive files to a local server, and caching them in a server tmp folder for failsafe-restart at checkpoint. I load the file into a Document Object which I then parse with semantic parsing. I want to submit the document object to local nlm-ingestor server for processing as well. But If I submit filename and document object, if fails on 404. I don't want to create a publicly available downloads folder on the mlm-ingestor server. Is there a way to submit the document objects, vs the url, to [self.parse_pdf(pdf_file)] in [file_reader.py]?
The text was updated successfully, but these errors were encountered: