-
Notifications
You must be signed in to change notification settings - Fork 0
dodgejoel/mtg_gatherer_database
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This bundle of code is designed to scrape Wizards of the Coast's Magic: the Gathering website at 'http://gatherer.wizards.com' and build an SQLite database called mtg_gatherer.db with all of the card information contained in it. First run 'initialize_database' to create the database with the appropriate tables. Next run 'gather_card_data' to extract all of the data. Finally run 'input_to_database' to input the data into the database. The first and last programs take very little time to run but gather_card_data takes about 20 minutes to complete. If gather_card_data.py is interrupted and then called again, it will pick up where it left off. i.e. the card data already retrieved will not be fetched again. This is an ongoing project. The next step is to bundle this into a single script that, when called once, will execute the three parts of this program without further user interaction. After this, I will implement a GUI for interacting with the database. In essence I would like to recreate locally all of the functionality of the original website. Non built-in modules required are: bs4, sqlite3. Bugs/Problems: 1. Fuse/Split/Flip/Werewolf cards are supported but are quirky. Only the name, mana cost and card text is currently shown for both halves with each halves data seperated by a '//'. This is a little undesirable as some of these cards types/subtypes change upon flipping. The parsing of the type subtype data is a little different than he rest of the data so I am putting this off for now. Currently, there is a problem with cards that are creatures on one side but flip to non creatures. This is sloppy on my part but I have fixed the problem of only collecting power nad toughness data for one side of the werewolf cards. Gatherer's treatment of these cards is not uniform so there are some funny things. Currently all werewolves have two multiverse_ids for example whereas split cards only have one. This will need to be dealt with in a more heavy handed way to avoid duplicate werewolves in the resulting database. Another little problem with these cards is the computation of the converted mana cost of the split and fuse cards. As it currently stands, the database entries for these cards will have cmcs of the form '3//3' for example and this compares as > than any single number. 2. There is a difficult to reproduce bug involving URL request timeouts and some of the threads in the gather_card_data script not finishing their tasks and not terminating correctly. Perhaps has something to do with the speed of the internet connection? Not a serious problem as the program can always be run a second time to clean up the lost jobs. I still have not understood this well.
About
Web scraper which gathers magic card data
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published