PEMT: A tool for extracting patent literature in drug discovery

DISCLAIMER: Currently SureCHEMBL is going through a database restructuring and maintenance. The tool might not function in that case. The tool will be updated once the stable version of the database is ready!!

General Info

PEMT is a patent extractor tool that enables users to retrieve patents relevant to drug discovery. The overall workflow of the tool can be seen in the figure below:

Installation

$ pip install pemt

The most recent code can be installed from the source on GitHub with:

$ pip install git+https://github.com/Fraunhofer-ITMP/PEMT.git

Alternatively, for developer the tool can be installed in an editable mode as shown below:

$ git clone https://github.com/Fraunhofer-ITMP/PEMT.git
$ conda create --name pemt python=3.8
$ conda activate pemt
$ cd PEMT
$ pip install pemt

For developers, the repository can be cloned from GitHub and installed in editable mode with:

$ git clone https://github.com/Fraunhofer-ITMP/PEMT.git
$ cd PEMT
$ pip install -e .

Documentation

Read the official docs for more information.

Input Data Formats

Data

For running PEMT from the gene level, you need the input file with the following structure:

symbol	uniprot
HGNC_Symbol_1	Uniprot_ID_1
HGNC_Symbol_2	Uniprot_ID_2
HGNC_Symbol_3	Uniprot_ID_3

For running PEMT from the chemical level, you need the input file with the following structure:

chembl
ChEMBL_ID_1
ChEMBL_ID_2
ChEMBL_ID_3

Note: The data must be in a comma or tab separated file format. If not so, the file should have at least one of the columns shown above.

Usage

In-order to use PEMT, an installation of chromedriver is required.

As mentioned above, the tool has a two-step approach. Each of these steps can be run individually as well as together as show belwo:

Chemical enrichment The following command links chemicals to genes of interest based on causality. In this command it is necessary to indicate whether the file contains uniprot ids or not with the --uniprot or --no-uniprot parameter.

$ pemt run-chemical-extractor --name=<ANALYSIS NAME> --data=<DATA FILE PATH> --input-type=<DATA FILE SEPARATOR> --uniprot

Patent enrichment The following command interlinks chemicals to patent literature publicly available.

$ pemt run-patent-extractor --name=<ANALYSIS NAME> --chromedriver-path=<PATH TO CHROMEDRIVER> --os=<OS NAME> --no-chemical

We also allow the flexibility to start the pipeline from this step, if the user has list of chemicals in the right format as indicated above. The user then has to use the tag --chemical and provide a respective --chemical-data path.

PEMT workflow The following command generates the patent enrichment on the gene data where the gene data file is a TSV file containing uniprot identifiers.

$ pemt run-pemt --name=<ANALYSIS NAME> --data=<DATA FILE PATH> --input-type=<DATA FILE SEPARATOR> --chromedriver-path=<PATH TO CHROMEDRIVER> --os=<OS NAME>

Issues

If you have difficulties using PEMT, please open an issue at our GitHub repository.

Citation

If you have found PEMT useful in your work, please consider citing: PEMT: A patent enrichment tool for drug discovery.

Yojana Gadiya, Andrea Zaliani, Philip Gribbon, Martin Hofmann-Apitius, PEMT: a patent enrichment tool for drug discovery, Bioinformatics, 2022;, btac716, https://doi.org/10.1093/bioinformatics/btac716

Disclaimer

PEMT is a scientific tool that has been developed in an academic capacity, and thus comes with no warranty or guarantee of maintenance, support, or back-up of data.

Funding

This project has been funded by EOSC-Life which has received funding from the European Union's Horizon 2020 programme under grant agreement number 824087.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.github		.github
data		data
docs		docs
example		example
src		src
tests		tests
.bumpversion.cfg		.bumpversion.cfg
.codecov.yml		.codecov.yml
.coveragerc		.coveragerc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
AUTHORS.md		AUTHORS.md
CITATION.cff		CITATION.cff
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
doc8.ini		doc8.ini
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PEMT: A tool for extracting patent literature in drug discovery

DISCLAIMER: Currently SureCHEMBL is going through a database restructuring and maintenance. The tool might not function in that case. The tool will be updated once the stable version of the database is ready!!

Table of Contents

General Info

Installation

Documentation

Input Data Formats

Data

Usage

Issues

Citation

Disclaimer

Funding

About

Releases 3

Packages

Contributors 2

Languages

License

Fraunhofer-ITMP/PEMT

Folders and files

Latest commit

History

Repository files navigation

PEMT: A tool for extracting patent literature in drug discovery

DISCLAIMER: Currently SureCHEMBL is going through a database restructuring and maintenance. The tool might not function in that case. The tool will be updated once the stable version of the database is ready!!

Table of Contents

General Info

Installation

Documentation

Input Data Formats

Data

Usage

Issues

Citation

Disclaimer

Funding

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Packages