A lot of claims are made in social media posts, which often contain misinformation or fake news. Hence, it is crucial to identify claims as a first step towards claim verification. Given the huge number of social media posts, the task of identifying claims needs to be automated.
This competition deals with the task of 'Claim Span Identification' in which, given a text, parts/spans that correspond to claims are to be identified. This task is more challenging than the traditional binary classification of text into claims or not-claims, and will require state-of-the-art methods in Pattern Recognition, Natural Language Processing and Machine Learning. See Evaluation tab for details.
For this task, we will use a newly developed dataset containing about 8K posts in English and about 8K posts in Hindi with claim-spans marked by human annotators.
Dataset Preparation: Utilized a dataset with 8K English and 8K Hindi posts annotated for claim spans. Prepared the dataset for a token classification task to be used by the model. • Model Development: Fine-tuned BERT and Multilingual BERT models specifically for claim-span identification. Implemented advanced architectures such as BERT with a CNN head for enhanced performance.