Skip to content

Hadis-mrd/Twitter-Hate-Speech-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

Twitter-Hate-Speech-Dataset

Important Note: All of the User IDs Are Anonymized. No Personal Info Is Disclosed in the Datasets Publicized Here.

In this repository, we release the Hate Speech Dataset used in our paper Hate speech on online social media through the lens of language and behavior: Analysis and prediction.

our dataset consists of tweets belonging to two groups of users:

(1) hate group who posted content promoting hate speech targeted towards Anti-Asian Hate during the COVID-19 Crisis

(2) control group who either posted general ideas about COVID-19 and China/Asians or posted counter hate speech to support Asians

We collect tweets between 15 Jan 2020 and 26 Mar 2021. We use Twitter API to collect their timelines i.e. most recent 3,200 tweets. We use the user’s timeline posts up to the first hate tweet for analysis (only users with at least 100 tweets). For control users, we sample such a date from a normal distribution having mean and variance of first hate posts of Hate data.

Hate and Control dataset are published on Kaggle.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published