Skip to content

v-bible/bible-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


📔 Table of Contents

🌟 About the Project

🎯 Features

  • Scrape bible from biblegateway.com and bible.com. Current supports:
    • Verses (with poetry).
    • Footnotes.
    • Headings.
    • References.
    • Psalm metadata (like author, title, etc.).
  • Progress logging.
  • Save to Postgres database.

🔑 Environment Variables

To run this project, you will need to add the following environment variables to your .env file:

  • App configs:

    DB_URL: Postgres database connection URL.

    LOG_LEVEL: Log level.

E.g:

# .env
DB_URL="postgres://postgres:postgres@localhost:65439/bible"
LOG_LEVEL=info

You can also check out the file .env.example to see all required environment variables.

🧰 Getting Started

‼️ Prerequisites

This project uses pnpm as package manager:

npm install --global pnpm

Playwright:

Run the following command to download new browser binaries:

npx playwright install

🏃 Run Locally

Clone the project:

git clone https://github.com/v-bible/bible-scraper.git

Go to the project directory:

cd bible-scraper

Install dependencies:

pnpm install

Setup Postgres database using Docker compose:

docker-compose up -d

Migrate the database:

pnpm prisma:migrate

Generate Prisma client:

pnpm prisma:generate

👀 Usage

Note

To prevent the error net::ERR_NETWORK_CHANGED, you can temporarily disable the ipv6 on your network adapter:

sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
sudo sysctl -w net.ipv6.conf.default.disable_ipv6=1
npx tsx ./src/biblegateway/main.ts
npx tsx ./src/bibledotcom/main.ts

Note

For the bible.com script, it doesn't use the local version code, which may vary for different languages. For example, in Vietnamese language, version "VCB" has local code is "KTHD".

The Lectionary for Mass - Second USA Edition (Sunday Volume, 1998; Weekday Volumes, 2002)

npx tsx ./src/catholic-resources/main.ts

Note

The script get-ordinary-time.ts will log out mismatch gospel reading for Weekday OT between Year I & II. You can see it in dumps/catholic-resources/note-ot.txt.

npx tsx ./src/ktcgkpv/main.ts

👋 Contributing

Contributions are always welcome!

📜 Code of Conduct

Please read the Code of Conduct.

❔ FAQ

  • Question 1

    • Answer 1.
  • Question 2

    • Answer 2.

⚠️ License

Distributed under MIT license. See LICENSE for more information.

🤝 Contact

Duong Vinh - @duckymomo20012 - [email protected]

Project Link: https://github.com/v-bible/bible-scraper.

💎 Acknowledgements

Here are useful resources and libraries that we have used in our projects: