GitHub - NIEHS/Data_extraction_workflow: Automating Data Extraction from Scientific Literature and General PDF Files Using Large Language Models and KNIME: An Application in Toxicology

Automating Data Extraction from Scientific Literature and General PDF Files Using Large Language Models and KNIME: An Application in Toxicology

Thank you for your interest in this data extraction workflow! If you are interested in running the workflow locally, either to make edits or just see how it works, follow the directions below to install it on your computer.

Installation

Prerequisites - Python and packages

Before you can use the workflow, you will need to install its dependencies with Anaconda. If you don't have it already, download miniconda3. To install the dependencies:

Download the environment.yml file: environment.yml.
Open the Anaconda/Miniconda terminal.
Navigate to the folder containing environment.yml using cd /path/to/environment.yml.
Run the following command: conda env create -f environment.yml. Press y to confirm the install if prompted.
Wait for the packages to install (this may take a while).

Prerequisites - GROBID

GROBID Installation Guide
- Requires JAVA:
  For building GROBID yourself, a JDK must be installed on your machine. We tested the tool successfully from JDK 1.11 up to JDK 1.17. Other recent JDK versions should work correctly.
- Source Code Download:
  - Download from GitHub: grobid-0.8.0.zip
  - Unzip the folder grobid-0.8.0
  - Place it in a folder without any spaces in the name
- Building and Running GROBID:
  1. Navigate to the grobid-0.8.0 folder:
    - cd grobid-0.8.0
  2. Run the build command:
    - ./gradlew clean install
  3. Start the local server with the command:
    - ./gradlew run
    - Check the server at: http://localhost:8070/
  4. Important: Ensure the local server is started before executing the workflow.
- Platform Note:
  - Windows-related issues:
    Windows, unfortunately, is currently not supported, due to lack of experience and time constraints.

Running the workflow

To set up the workflow on your computer:

Download the workflow: Data Extraction Workflow.
In KNIME Analytics Platform, select your Local Space.
In your local space, select Import Workflow
Browse to the workflow file you downloaded; it will have the .knwf extension.
KNIME may prompt you to install extensions; follow the on-screen instructions to do so. You may need to restart KNIME when done.
Once KNIME restarts, open File > Preferences. In the left panel, navigate to KNIME > Conda. Click Browse or enter the path to your Anaconda/Miniconda installation (On Windows, this will often be in your AppData or User folder).
Exit Preferences and run the workflow with your data!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.DS_Store		.DS_Store
.gitattributes		.gitattributes
Data Extraction Workflow - 58 - Publication.knwf		Data Extraction Workflow - 58 - Publication.knwf
README.md		README.md
data.knar		data.knar
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Automating Data Extraction from Scientific Literature and General PDF Files Using Large Language Models and KNIME: An Application in Toxicology

Installation

Prerequisites - Python and packages

Prerequisites - GROBID

Running the workflow

About

Uh oh!

Releases

Packages

Uh oh!

NIEHS/Data_extraction_workflow

Folders and files

Latest commit

History

Repository files navigation

Automating Data Extraction from Scientific Literature and General PDF Files Using Large Language Models and KNIME: An Application in Toxicology

Installation

Prerequisites - Python and packages

Prerequisites - GROBID

Running the workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages