Initial Commit: July 2016
***** Cavatica has been adopted by the incertae-sedis group. *****
Code and pipeline for fetching PubMed and PubMed Central data and co-author network analysis. This tool can be used to identify author trends among several search terms.
An example, I've used these scripts to do a multi-network analysis of network analysis papers and their software. Wiki Page Here
The name comes from Charlotte's Web since her full name was Charlotte A. Cavatica. Although Cavatica also refers to barn spider.
***** Cavatica pipeline has been modified so no longer relies on Ebot. *****
- Some type of Linux Terminal where you can run Bash. (Cygwin if you're on Windows. Terminal already preinstalled on Mac)
- R (check if installed by typing Rscript --version)
- perl (check if installed by typing perl --version)
- Mango Graph Studio for multi-network analysis
git clone https://github.com/incertae-sedis/cavatica.git
Here is a basic example fetching PubMed and PMC papers containing the word "Neo4j" and "Cytoscape".
cd cavatica/data
mkdir test
cd test
echo "Neo4j" > config.txt
echo "Cytoscape" >> config.txt
../../code/script.sh
This will create tabular files (list of papers Neo4j_papers_pm.tsv
and list of authors Neo4j_authors_pm.tsv
). Open the png files Neo4j_pm.png
to see a barchart of the number of papers by year.
Can also open the html files to check the one sentence usages of Neo4j and Cavatica
2018 29377902 Reactome graph database: Efficient access to complex pathway data.
2018 28936969 Systematic integration of biomedical knowledge prioritizes drugs for repurposing.
2017 28416946 Use of Graph Database for the Integration of Heterogeneous Biological Data.
... |
2018 29894068 Identification of potential miRNAs and candidate genes of cervical intraepithelial neoplasia by bioinformatic analysis.
2018 29872319 An integrated analysis of key microRNAs, regulatory pathways and clinical relevance in bladder cancer.
2018 29760609 Identification of potential crucial genes and construction of microRNA-mRNA negative regulatory networks in osteosarcoma.
... |
It will also create a script pubmed.gel
. Open Mango Graph Studio, open pubmed.gel
and type the following into the Mango Console.
run "pubmed.gel";
This will create a transition table and export the file. It will also load and visualize the author-paper networks.
Neo4j | Cytoscape |
../../code/script.sh
The transitions should be saved in trends_pm.txt
. The following trends_pm.txt indicates that authors switched from cytoscape to Neo4j 9 times, while authors switched from Neo4j to Cytoscape 3 times.
Cytoscape:Neo4j 9
Neo4j:Cytoscape 3
It will then commence searching PMC, fetching list of papers and authors and generating a "pmc.gel" file. Once again open the "pmc.gel" file in Mango and type the following into Mango Console.
run "pmc.gel";
Then rerun the script to continue tabulating the trends which should be saved in trends_pmc.txt
.
The output of a 2017 run comparing "Neo4j", "Gephi", "GraphViz" and "iGraph" is shown below:
=============PubMed Transitions
Neo4j:Gephi 1
Neo4j:GraphViz 1
Neo4j:iGraph 1
=============PubMed Central Transitions
Gephi:GraphViz 2
Gephi:Neo4j 3
Gephi:iGraph 31
GraphViz:Gephi 19
GraphViz:Neo4j 10
GraphViz:iGraph 58
Neo4j:Gephi 4
Neo4j:GraphViz 4
Neo4j:iGraph 1
iGraph:Gephi 34
iGraph:GraphViz 9
iGraph:Neo4j 13
PMC results usually return more papers since search terms like "Neo4j" or "Cytoscape" are being matched to the fulltext, instead of just the title and abstract. This may return more accurate trend tables since sometimes software names are only mentioned in the methods and not in the abstract.
This repo provides a container for easily reproducing and running Cavatica through a container. The pipeline for both Singularity and Docker was ran on an Ubuntu 18.04 instance on Jetstream, which is a national science and engineering cloud led by the Indiana University Pervasive Technology Institute.
A singularity container of Cavatica is available on Singularity Hub. Using singularity you can download the contained with the following command:
singularity pull shub://TeamMango/cavatica:latest
When run, the container will look for a text file called config.txt
in a directory called output
in the same directory as the .simg
you just downloaded. Place the terms that you want Cavatica to search for in this file. In Ubuntu, you can use the following commands to create this file:
mkdir output
echo "YOURSEARCHTERM" > ./output/config.txt
Your search terms can also be followed by a year range, separated by commas:
echo "YOURSEARCHTERM,1996,2006" > ./output/config.txt
Each search term and year range should occupy it's own line. If you want to search for use of the term cytoscape and VisANT between 1994 and 2000, config.txt
would look like this:
visant,1999,2006
cytoscape,1994,2003
Once you have entered the terms in the config.txt
file, return to the same directory as the .simg image and run the following command:
singularity run --bind output:/cavatica/data/output TeamMango-cavatica-master-latest.simg
The results of the search will appear in the output
directory next to your config.txt
file.
A docker container of Cavatica is available on Docker Hub. You can pull the docker container with the following command:
docker pull incertaesedis/cavatica
To run the docker container, move into the directory where you want to generate output from Cavatica. Create three files called multitool-pubmed.tsv
, multitool-pmc.tsv
, and config.txt
. In Ubuntu you can do this with the following command:
touch multitool-pubmed.tsv multitool-pmc.tsv config.txt
All three files must be present in the directory where you run the container. In config.txt
enter the search terms that you want Cavatica to search for, with each term on a new line. Optional year ranges can be indicated with commas:
visant,1999,2006
cytoscape,1994,2003
In the same directory as config.txt, run the docker container:
docker run -v ${PWD}:/cavatica/data/output incertaesedis/cavatica
If on windows, "$PWD"
should be replaced with the absolute path to your current directory. The files produced by Cavatica should appear on running the container. If you wish to rerun the search with different terms, make sure that the multitool-pubmed.tsv
and multitool-pmc.tsv
files are still in the folder.
Accomplishments and opportunities of reproducing and containerizing this project
- J. Chang and H. Chou, "Cavatica: A pipeline for identifying author adoption trends among software or methods," 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Kansas City, MO, USA, 2017, pp. 2145-2150.