Twitter-Hate-Speech-Dataset

Important Note: All of the User IDs Are Anonymized. No Personal Info Is Disclosed in the Datasets Publicized Here.

In this repository, we release the Hate Speech Dataset used in our paper Hate speech on online social media through the lens of language and behavior: Analysis and prediction.

our dataset consists of tweets belonging to two groups of users:

(1) hate group who posted content promoting hate speech targeted towards Anti-Asian Hate during the COVID-19 Crisis

(2) control group who either posted general ideas about COVID-19 and China/Asians or posted counter hate speech to support Asians

We collect tweets between 15 Jan 2020 and 26 Mar 2021. We use Twitter API to collect their timelines i.e. most recent 3,200 tweets. We use the user’s timeline posts up to the first hate tweet for analysis (only users with at least 100 tweets). For control users, we sample such a date from a normal distribution having mean and variance of first hate posts of Hate data.

Hate and Control dataset are published on Kaggle.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Twitter-Hate-Speech-Dataset

About

Uh oh!

Releases

Packages

Hadis-mrd/Twitter-Hate-Speech-Dataset

Folders and files

Latest commit

History

Repository files navigation

Twitter-Hate-Speech-Dataset

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages