A Vue.js and Flask Web Application designed to provide a quick way to search for quotes from NBC's "The Office".
- Vue.js based, providing fast SPA functionality
- Setup via Vue CLI
- Vue Router
- Vue Bootstrap
- Axios
- Vue Instantsearch (Algolia)
- Backend API provided by Flask
- Instant Search provided Algolia
- Sleek, responsive design that is easy on the eyes
Credit to officequotes.net for providing all quote data.
Credit to imdb.com for episode descriptions.
Quotes are scraped directly from the website as of this moment.
This repository will hold the current pre-processed raw quote data, but the application has the ability to fetch and parse HTML pages directly as needed.
python server/cli.py fetch
--season SEASON Fetches all episodes from a specific season.
--episode EPISODE Fetches a specific episode. Requires SEASON to be specified.
--all Fetches data for every episode from every season.
--skip SEASON:EPISODE When specified, it will skip a given episode.
The data has to be parsed, but due to high irregularity (at least too much for me to handle), the files will have to be inspected and manually processed.
-s --season SEASON Pre-processes all episodes from a specific season.
-e --episode EPISODE Pre-processes a specific episode. Requires SEASON to be specified.
-a --all Pre-processes all episodes from every season.
-o --overwrite DANGER: Will overwrite files. May result in manually processed files to be lost forever.
From then on, once all files have been pre-processed, you will have to begin the long, annoying process of editing them into my custom format.
These raw pre-processed files are located in './server/data/processed/
Each section (barring the first) is pre-pended by a .hyphen.
CharacterName: Text that character says.
OtherCharacter: More text that other character says..
-
ThirdCharacter: Text that character says in a second scene/section.
-!1
Fourth Character With Spaces In Name: Text that fourth character says in a deleted scene.
Fifth-Character: Which deleted scene? Deleted scene number one.
Deleted scenes are marked by a initial exclamation mark, and then a number of digits marking which deleted scene they are a part of.
Please note that extra text like 'Deleted Scenes 3' might appear before a hyphen - this is expected and is helpful when deciding which scene goes with which Deleted Scene ID. If you don't know, do what I did - go look at the web page it's based on. Otherwise, I read the quotes and figure out based on context.
This concept is rather loose, slow, and dumb, it simply allows me to mark what deleted scenes go together while working with a incredibly inconsistent, human curated data format.
To ease text processing, I did come up with RegEx expressions for search and replacement:
^(?!-)([\w\s’'0-9\-\.\/#]*):\s+
$1|
From then on, the process becomes much simpler, 95% of the work needed to process quotes is already done.
Now that quotes are in a consistent (although custom) format, they need to be processed into individual episodes. In reality, they are just the JSON format of the previous stage.
python server/cli.py process
-s --season SEASON Processes all episodes from a specific season.
-e --epsiode EPISODE Processes a specific episode. Requires SEASON to be specified.
-a --all Processes all episodes from all seasons.
Now that they're all in individual files, the final commands can be ran to compile them into one file, a static 'database' or something. Technically, they could be kept scattered, but I decided to make it simpler with just 1 big file.
This also is where Algolia comes in.
python server/cli.py build [algolia|final]
Each command is ran with no special arguments (as of now), generating a algolia.json
or data.json
in the ./server/data/
folder.
This data.json
file is loaded by the Flask server and the algolia.json
can be uploaded to your primary index.
For every command mentioned, you can read all arguments with --help
:
$ python cli.py preprocess --help
Usage: cli.py preprocess [OPTIONS]
Pre-processes raw HTML files into mangled custom quote data.
Custom quote data requires manual inspection and formatting, making it a
dangerous operation that may overwrite precious quote data.
Options:
-s, --season INTEGER Season to be fetched. Without --episode, will
download all episodes in a season.
-e, --episode INTEGER Specific episode to be fetched. Requires
--season to be specified.
--all Fetch all episodes, regardless of previous
specifications.
-o, --overwrite Overwrite if a file already exists.
-ss, --silent-skip Skip missing/existing files silently
-ssm, --silent-skip-missing Skip missing files silently
-sse, --silent-skip-existing Skip overwrite skips silently
--help Show this message and exit.
This project was built on Python 3.7 and Node v12.18.3 / npm 6.14.6.
To install all Node/NPM dependencies, run
npm install
To install Python's dependencies, run
pip install -r ./requirements.txt
I recommend that you use a virtualenv in order to keep dependencies separate from other projects, as I do. Personally, I use PyCharm Professional to maintain virtualenvs, just because it's easy to start, use, update and maintain them.
- Vue.js can be ran via
npm run serve
.- Run this in
./client/
.
- Run this in
- Flask can be ran via
flask run
.- Run this in
./server/
. - Add
--host=0.0.0.0
to the end to allow connections from LAN.
- Run this in
Note: Readying this application for Production and wider-development is still in progress.
While The Office is a large show, it unfortunately does not have any readily accessible short descriptions of characters.
Wikis provide long and comprehensive descriptions and some high quality images, but overall, it was rather difficult to find stuff for characters introduced late in the show or with overall more minor impact (ex David Wallace, Robert California, Clark Green).
Episode and Character images will be clickable, opening a Lightbox of the full body image.
Character Images are made to have 2 different shots: 1 large full body image, and a cropped square of their face.
Additionally, all images will be paired with thumbnails of their faces, used for progress image rendering.
All images should be as high quality as possible, minimum 500px in each dimension for the full body shot. Cropped squares should also look to be as large as possible.
Thumbnails are manually generated with ImageMagick v7: magick input.webp -resize 128 -quality 75 input_thumb.webp
Episode Images should mirror Netflix's episode images if possible. Otherwise, each episode will have to be painstakingly reviewed for what scene and moment(s) best represent the episode, then screenshotted.
Then, like the character images, a square crop will be made, and thumbnails will be generated just the same.
One single image will represent each Episode on the Season page, but multiple images will be available in a carousel to be viewed.
Small to-do list to complete.
- Font Awesome Icons
- SeasonList Chevron
- Quote Permalink
- Attempt Algolia Query Suggestions
- Redirect to SearchResults page on Enter press
- Process all quote data
- Site Meta Tags
- Better Mobile Season List
- Smaller, collapsible?
- Heroku Production Deployment
- Possible solution via Docker
- Axios 'Fetch' Error Handling
- Navigation Bar
- Navbar Logo (?)
- Overall Responsiveness Improvements
- Algolia Clickthrough Events
- Search Results Page
- Character List
- Quote Permalink
- Deleted Scenes Marker
- Possible 'Flashback' Scene Marker
- Season List Episode Modal Popover
- Preview Image, Description, Episode Stats
- Quote Likes Database
- Requires difficult implementation of Flask controlled Postgres database
- Requires funding, longterm free hosting not possible with database requirements