The chanalysis Python scripts in this repository offer a convenient approach to map trends, shifts, and other characteristics with 4chan data, or any data with a similar structure. The scripts can help moving beyond the consideration of the anonymous imageboard as an un-researchable and homogeneous blob.
Note that these scripts are a work in progress and may contain bugs.
The current chanalysis scripts include:
createHistogram.py
Frequency histograms: Visualise the occurances of a particular word over timegetReplies.py
Identifying popular posts: Show which posts are most replied to and thus garnered attention.createTokens.py
Tokenization: Tokenise the text (lemmatization and stemming).createLongString.py
Creating a text file: Takes text in a csv column and outputs as a long text in a .txt file. Useful in tandem with jasondavies.com/wordtree/.getImages.py
Downloading images: Download images from a set of postscreateImageWall.py
Making an image wall: Use downloaded images to create an image wall.getTfidf.py
Get popular terms: (work in progress) Outputs popular words in the dataset via tf-idf.
More scripts will be added later.
The scripts require Python 3.
Once you have installed Python 3 on your computer, clone or download this repository.
Go to the folder of the scrips in a terminal and install the requirements (python -m pip install -r requirements.txt
). When executing the scripts (e.g. python3 createLongString.py
) without parameters, it will show the functions and various options. Run the scripts e.g. like so: python createLongString.py --source=input/star-wars.csv
.
All the resulting data files are saved in the output/
folder.