Skip to content

A Python web scraper for locally archiving .txt versions of the web serial fictions of Wildbow (J.C. McCrae).

License

Notifications You must be signed in to change notification settings

charmsRace/Crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Crawler

written by J. Alex Ruble (Calamitizer)

(GitHub repository)

Overview

This is a Python script that web-scrapes the serial fictions of Wildbow (J.C. McCrae) and bundles them into local .txt files for offline / e-reader viewing.

Usage Guide

Navigate to the downloaded directory (e.g. ~/Downloads/WildbowCrawler/) in your terminal, then input the command

python main.py <storyname> <format> <arcsep> <chapsep>

with arguments

  1. <storyname> -- This is the name of the story to be locally archived. Presently, this should be one of
  • worm
  • pact
  • twig
  1. <format> -- This is the keyword for which file structure the results will be placed in. Select one of
  • single -- this will create the <storyname> directory, containing <storyname>.txt with the full text of the story.
  • per-arc -- this will create the <storyname> directory, containing one <arcnumber>_<arcname>.txt file for each arc (e.g. 1_Gestation.txt).
  1. <arcsep> and <chapsep> -- These are the strings inserted at the beginning of each arc and chapter, respectively, for CTRL+F purposes. Certain characters need to be escaped by quotes, as in '#A'. Choose '' for no separator.

Some example usages follow.

python main.py worm per-arc [ARC] [CHAPTER]
python main.py pact single '#A' '#C'
python main.py twig per-arc '' Chapter:

Changelog

v0.9 (12 March '16)

This is the first stable version of WildbowCrawler. It successfully passes through the entirety of Worm.

To-Do List

  • Support Pact and Twig
  • Figure out packaging for imported modules (?)
  • Replace sys.argv with the argparse module Nevermind, I don't really see a benefit
  • Go back in time to prevent the birth of anyone somewhat responsible for unicode

Jolly Cooperation

Contributions to and optimizations of this (small) project are welcome if you'd like. Kindly alert me if you notice any anomalies in transcription, including disordering, jumbling, skipping and the like -- no, I am not rereading Worm to write this crawler. Also, let me know if (probably unicode-related) bugs arise upon further Twig publication.

Implementation

This code is written in Python 2.7.3. The non-standard library modules used are requests (for HTML request handling) and bs4 (for HTML parsing).

Wildbow-Related Links

Wildbow's personal blog
Worm
Pact
Twig
/r/parahumans (for discussion of all Wildbow's work)
Wildbow's Patreon

In my opinion, Wildbow is a very gifted writer. Support him if Worm gets picked up for publishing!

Author

You can reach the author (Alex Ruble) most easily via GitHub (Calamitizer), email ([email protected]), or Twitter (@aknifeallblade).

License

This software has no associated copyrights whatsoever (i.e. an unlicense). See LICENSE.txt for the full description.

About

A Python web scraper for locally archiving .txt versions of the web serial fictions of Wildbow (J.C. McCrae).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages