A Shiny app to browse the results of a topic model trained on 30,000+ blog articles about the statistical programming language R.
Blog data was scraped from r-bloggers.com from their public archives using the super convenient rvest package. The process is split into three phases to allow me later to change my mind about the particular processing of articles.
- Obtain a list of links to articles published on R-Bloggers
- Download each article from the link collection
- Parse each document and combined them into a single dataframe
For each document meta information was collected such as blog title, publication date, author and URLs to the original blog. In total more than 30,000 articles were collected published as early as Jan 2010.
For the ongoing data refresh the R-Blogger RSS feeds are downloaded every day using a script running in a Docker container on AWS on a daily schedule. The RSS feed is a XML file that contains the latest few articles published on the website. At the end of each month a second script in a second container combines the articles in all collected feeds with the 30,000 scraped articles and refreshes the topic model behind this application. So every month something new should appear in this Shiny app.
The app is deployed through RStudio's webservice shinyapps.io. Additionally, the app is published on RStudio Cloud which provides a complete development environment of the project.
The development environment of this project is encapsulated in a Docker container.
- Install Docker. Follow the instructions on https://docs.docker.com/install/
- Make docker run without sudo
Log out and log back in so that your group membership is re-evaluated
sudo groupadd docker sudo usermod -aG docker $USER
- Clone the GIT repository
git clone https://github.com/nz-stefan/blog-explorer.git
- Setup development Docker container
You should see lots of container build messages
cd blog-explorer bin/setup-environment.sh
- Spin up the container
bin/start_rstudio.sh
- Open http://localhost:8790 in your browser to start a new RStudio session
- Install R packages required for this app. Type the following instructions into the R session window of RStudio
The installation will take a few minutes. The package library will be installed into the
renv::restore()
renv/lib
directory of the project path. - Open the file
app/global.R
and hit the "Run app" button in the toolbar of the script editor (or typeshiny::runApp("app")
in the R session window). The Shiny app should open in a new window. You may need to instruct your browser to not block popup windows for this URL.