Infraestruturas e Modelos de Programação para Análise de Dados em Stream (or Infrastructure and Programming Models for Stream Data Analysis, with acronym IMPADS) is an thesis theme which the main objective is to evaluate, test and benchmark various Streaming Processing Data Systems.
Some Sistemas de Processamento de Dados em Stream (or Streaming Data Processing Systems, with acronym SPDS) that have significant popularities are:
- Apache Flink | Github
- Apache Storm | Github
- Apache Kafka (With Kafka Stream API) | Github
- Apache Samza | Github
- Apache Heron (Incubator) | Github
For the limitation of time base on thesis delivery, the main SPDS that will be approached are Apache Storm, Apache Flink and Apache Kafka (with the use of Kafka Stream API).
The main objective of this projects are:
- Evaluate their infrastructure needed
- Evaluate their programming model
- Propose an deployment process for infrastructure
- Evaluate development process of an topology
- Benchmark built topology for each choosen SPDS