Skip to content

Commit

Permalink
support sciencedirect suppl
Browse files Browse the repository at this point in the history
  • Loading branch information
Miachol committed Jul 30, 2020
1 parent 6646591 commit 5a855b3
Show file tree
Hide file tree
Showing 6 changed files with 160 additions and 33 deletions.
30 changes: 27 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,30 @@ For website spider (optional):

- Headless Chrome is required for some of website with JavaScript driven render pages. For windows users, you may need to create an alias of Chrome to make [chromedp](https://github.com/chromedp/chromedp) work.

```bash
# To resolve `[FATA] exec: "google-chrome": executable file not found in $PATH` error:
# option 1: install Chrome in your OS
## centos
sudo yum install liberation-fonts
sudo yum -y install libXss*
sudo yum install libappindicator*
wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm
sudo rpm -ivh google-chrome-stable_current_x86_64.rpm

## ubuntu
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo dpkg -i google-chrome-stable_current_amd64.deb
sudo apt install -f

# option 2: run bget in the headless-shell docker container
docker run -d -p 9222:9222 --rm --name headless-shell -v /path_contains_bget/:/tmp/bget chromedp/headless-shell
docker exec -it headless-shell /bin/bash

# set more timeout for poor network access
bget doi 10.1016/j.devcel.2017.03.001 --suppl --timeout 100
```


For raw sequencing data query (optional):

- [sra-tools](https://github.com/ncbi/sra-tools) for SRA and dbGAP database: `bget i sratools`;
Expand All @@ -33,15 +57,15 @@ For raw sequencing data query (optional):

```bash
# windows
wget https://github.com/openanno/bget/releases/download/v0.3.0/bget.exe
wget https://github.com/openanno/bget/releases/download/v0.3.1/bget.exe

# osx
wget https://github.com/openanno/bget/releases/download/v0.3.0/bget_osx
wget https://github.com/openanno/bget/releases/download/v0.3.1/bget_osx
mv bget_osx bget
chmod a+x bget

# linux
wget https://github.com/openanno/bget/releases/download/v0.3.0/bget_linux64
wget https://github.com/openanno/bget/releases/download/v0.3.1/bget_linux64
mv bget_linux64 bget
chmod a+x bget

Expand Down
4 changes: 2 additions & 2 deletions chrome/doi.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ func DoiSupplURLs(url string, timeout time.Duration, proxy string) []string {
//err := cdp.Run(ctx, visibleNejm("https://www.nejm.org/doi/full/10.1056/NEJMoa1902226", &attbs))
if strings.Contains(url, "www.nejm.org") {
err = cdp.Run(ctx, visibleNejm(url, &attbs))
} else if stringo.StrDetect(url, "sciencedirect.com|/10.1016/") {
} else if stringo.StrDetect(url, "sciencedirect.com|/10.1016/|www.cell.com") {
err = cdp.Run(ctx, visibleScienceDirect(url, &attbs))
} else if strings.Contains(url, "www.ncbi.nlm.nih.gov/Traces/study") {
err = cdp.Run(ctx, visibleSraRunSelect(url, &attbs, ctx))
Expand Down Expand Up @@ -177,5 +177,5 @@ func visibleDownloadTask(url string, ctx context.Context) cdp.Tasks {
}

//func main() {
//GetURLFile("https://linkinghub.elsevier.com/retrieve/pii/S2215036619303943", 145*time.Second, "http://lee_jianfeng:[email protected]:8000")
//DoiSupplURLs("https://www.sciencedirect.com/science/article/pii/S1934590919303078?via=ihub", 145*time.Second)
//}
1 change: 1 addition & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ require (
github.com/openbiox/ligo v0.0.0-20200607024921-dd2356ca56a1
github.com/sirupsen/logrus v1.6.0
github.com/spf13/cobra v1.0.0
github.com/tebeka/selenium v0.9.9 // indirect
github.com/tidwall/pretty v1.0.1
github.com/vbauerster/mpb/v5 v5.2.2
golang.org/x/crypto v0.0.0-20200604202706-70a84ac30bf9 // indirect
Expand Down
Loading

0 comments on commit 5a855b3

Please sign in to comment.