Code to build and evaluate predictive models with a dataset comprising movie reviews using PySpark.
The task consists of classifying movies as positive (1) or negative (0) based on the user's text comment. The data is formed by 2000 examples, 1000 each class, and stored in HDFS. The steps are: feature engineering, model selection and model evaluation. There is also a part of data exploration that has not been included.