Skip to content
This repository was archived by the owner on Jul 18, 2020. It is now read-only.

Documentation

FAN 2 TUNNING edited this page Jun 23, 2019 · 6 revisions

Q/A

Q: What's this?

A: Hashtagsbattle is a Web App which displays some analytics, such as hourly trending hashtags, daily hashtags, worldwide activity... based on Twitter and in Real-Time. Inspired by the awesome One Million Tweet Map.

Q: How it works?

A:

  • First of, there's a tweets listener built with Tweepy which retrieves tweets sent back by the Twitter API. It does some basic cleaning and filtering before publishing them to a Pub/Sub topic. The listener is running on an Google App Engine instance.

  • Then, there's a little Express server using SocketIO. This application is also running on App Engine. It's quite useful because I wanted to represent the current tweets location in Real-Time on a Mapbox map. Storing this data (in Firebase for example) isn't useful and will be quite expensive.

  • The heart of my project is the Apache-Beam streaming processing pipeline running on the Cloud Dataflow runner. This pipeline consumes events sent by the source Pub/Sub topic and it does some data transformations (grouping, counting, filtering, batching...) before sending back the pre-aggregated output to another Pub/Sub topic. I'm playing with some windows and some triggers to achieve a quite low-latency.

  • Finally, the output Pub/Sub topic will trigger Cloud Functions instances that are going to do some computation on the data before saving it to Firestore.

The Web-App is built with Stencil and it's deployed to Firebase Hosting.

As you can see, this is fully managed by Google Cloud Platform.

GCPimplementation

Installation

Work in progress.

The application is made of 4 components. Almost each component is Dockerized and has it's own CI/CD pipeline using Cloud Build.

Clone this wiki locally