-
Notifications
You must be signed in to change notification settings - Fork 0
Documentation
Q: What's this?
A: Hashtagsbattle is a Web App which displays some analytics, such as hourly trending hashtags, daily hashtags, worldwide activity... based on Twitter and in Real-Time. Inspired by the awesome One Million Tweet Map.
Q: How it works?
A:
-
First of, there's a tweets listener built with Tweepy which retrieves tweets sent back by the Twitter API. It does some basic cleaning and filtering before publishing them to a Pub/Sub topic. The listener is running on an Google App Engine instance.
-
Then, there's a little Express server using SocketIO. This application is also running on App Engine. It's quite useful because I wanted to represent the current tweets location in Real-Time on a Mapbox map. Storing this data (in Firebase for example) isn't useful and will be quite expensive.
-
The heart of my project is the Apache-Beam streaming processing pipeline running on the Cloud Dataflow runner. This pipeline consumes events sent by the source Pub/Sub topic and it does some data transformations (grouping, counting, filtering, batching...) before sending back the pre-aggregated output to another Pub/Sub topic. I'm playing with some windows and some triggers to achieve a quite low-latency.
-
Finally, the output Pub/Sub topic will trigger Cloud Functions instances that are going to do some computation on the data before saving it to Firestore.
The Web-App is built with Stencil and it's deployed to Firebase Hosting.
As you can see, this is fully managed by Google Cloud Platform.
Work in progress.
The application is made of 4 components. Almost each component is Dockerized and has it's own CI/CD pipeline using Cloud Build.
🔥