This is a repository containing a Jupyter Notebook with a code used in a paper on automated classification of negativity I coauthored with political communication Alessandro Nai - Political Attacks in 280 Characters or Less: A New Tool for the Automated Classification of Campaign Negativity on Social Media (Petkevic & Nai, 2021). Using a sample of manually annotated data, the code is able to reliably classify four dimensions of negativity (tone, personal and policy attacks, and incivility) in tweets posted by the candidates in the 2018 US Senate elections.
Negativity in election campaign matters. To what extent can the content of social media posts provide a reliable indicator of candidates' campaign negativity? We introduce and critically assess an automated classification procedure that we trained to annotate more than 16,000 tweets of candidates competing in the 2018 Senate Midterms. The algorithm is able to identify the presence of political attacks (both in general, and specifically for character and policy attacks) and incivility. Due to the novel nature of the instrument, the article discusses the external and convergent validity of these measures. Results suggest that automated classifications are able to provide reliable measurements of campaign negativity. Triangulations with independent data show that our automatic classification is strongly associated with the experts’ perceptions of the candidates’ campaign. Furthermore, variations in our measures of negativity can be explained by theoretically relevant factors at the candidate and context levels (e.g., incumbency status and candidate gender); theoretically meaningful trends are also found when replicating the analysis using tweets for the 2020 Senate election, coded using the automated classifier developed for 2018. The implications of such results for the automated coding of campaign negativity in social media are discussed.
The entirety of the code used in this paper is contained in the campaign_negativity.ipynb Jupyter Notebook. The code includes a range of text preprocessing steps, word embedding using a pre-trained model (Spacy), model building with parameter fine-tuning, and application of the trained model to classify new data. MLP neural network architecture is used to classify the four dimensions of negativity concurrently (binary classification). The model demonstrates a reliable performance with F1 scores of the validation dataset ranging from 0.75 (presence of policy attacks) to 0.94 (absence of incivility).
Petkevic, V., & Nai, A. (2022). Political Attacks in 280 Characters or Less: A New Tool for the Automated Classification of Campaign Negativity on Social Media. American Politics Research, 50(3), 279-302.