Skip to content

Rkarande1/Spam-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Spam-Detection

In this code segment, I prepare text data for a text classification task. Initially, I use the train_test_split function from Scikit-Learn to split the dataset into training and testing sets, allocating 85% of the data for training and 15% for testing. Next, I employ a TF-IDF vectorizer to convert the text data into numerical features, which are crucial for machine learning. Specifically, I first fit the vectorizer on the training data to learn the vocabulary and compute TF-IDF scores. Then, I transform both the training and testing data into TF-IDF representations, which are initially sparse matrices optimized for efficiency when many values are zero. To facilitate model training and evaluation, I subsequently convert the training and testing TF-IDF matrices into dense NumPy arrays. This code segment sets the stage for building a text classification model that can predict whether news articles are true or false based on their content. It demonstrates the essential data preprocessing steps required for natural language processing and text analysis tasks, ensuring that the data is in a suitable format for machine learning.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published