Automating Data Extraction from Scientific Literature and General PDF Files Using Large Language Models and KNIME: An Application in Toxicology
Thank you for your interest in this data extraction workflow! If you are interested in running the workflow locally, either to make edits or just see how it works, follow the directions below to install it on your computer.
Before you can use the workflow, you will need to install its dependencies with Anaconda. If you don't have it already, download miniconda3. To install the dependencies:
- Download the environment.yml file: environment.yml.
- Open the Anaconda/Miniconda terminal.
- Navigate to the folder containing environment.yml using
cd /path/to/environment.yml
. - Run the following command:
conda env create -f environment.yml
. Pressy
to confirm the install if prompted. - Wait for the packages to install (this may take a while).
- GROBID Installation Guide
- Requires JAVA:
For building GROBID yourself, a JDK must be installed on your machine. We tested the tool successfully from JDK 1.11 up to JDK 1.17. Other recent JDK versions should work correctly. - Source Code Download:
- Download from GitHub: grobid-0.8.0.zip
- Unzip the folder
grobid-0.8.0
- Place it in a folder without any spaces in the name
- Building and Running GROBID:
- Navigate to the
grobid-0.8.0
folder:cd grobid-0.8.0
- Run the build command:
./gradlew clean install
- Start the local server with the command:
./gradlew run
- Check the server at: http://localhost:8070/
- Important: Ensure the local server is started before executing the workflow.
- Navigate to the
- Platform Note:
- Windows-related issues:
Windows, unfortunately, is currently not supported, due to lack of experience and time constraints.
- Windows-related issues:
- Requires JAVA:
To set up the workflow on your computer:
- Download the workflow: Data Extraction Workflow.
- In KNIME Analytics Platform, select your Local Space.
- In your local space, select Import Workflow
- Browse to the workflow file you downloaded; it will have the .knwf extension.
- KNIME may prompt you to install extensions; follow the on-screen instructions to do so. You may need to restart KNIME when done.
- Once KNIME restarts, open File > Preferences. In the left panel, navigate to KNIME > Conda. Click Browse or enter the path to your Anaconda/Miniconda installation (On Windows, this will often be in your AppData or User folder).
- Exit Preferences and run the workflow with your data!