Skip to content

A comparative optimization study of logistic regression for fake news detection, evaluating gradient descent, conjugate gradient, Newton, and L-BFGS methods on large-scale TF-IDF text features for convergence behavior, runtime, and memory efficiency.

Notifications You must be signed in to change notification settings

mahnoorsheikh16/Optimizing-Binary-Classification-Problem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

Optimizing the Binary Classification Problem: Fake News Classification

This project studies fake news detection as a binary classification problem, with a primary focus on optimization behavior rather than model architecture. Logistic Regression with L2 regularisation is used as a convex learning objective, enabling a controlled comparison of multiple numerical optimization algorithms applied to the same loss function. Beyond classification accuracy, the project analyzes convergence speed, numerical stability, runtime, and memory efficiency of different optimization methods.

*This is a course project for CMSE 831 Computational Optimization at Michigan State University.

Table of Contents

  1. Dataset, Preprocessing and Feature Representation
  2. Problem Setup
  3. Implementation and Results

Dataset, Preprocessing and Feature Representation

The WELFake dataset combines Kaggle, McIntire, Reuters, and BuzzFeed political news. Articles are cleaned to remove HTML artifacts, excessive whitespace, and duplicates. Very short articles are removed to ensure meaningful textual content. Data is split into 80–10–10 train/validation/test sets with stratification.

TF-IDF Vectorization is implemented with 5,000 features, unigrams and bigrams, stop-word removal, and sublinear term frequency scaling. This produces a high-dimensional sparse feature space, well suited for studying optimizer behavior.

Problem Setup

Minimize the regularized logistic regression loss:

  • Objective: Negative log-likelihood + L2 regularization
  • Convex and strongly convex (guaranteed unique global optimum)
  • Optimization variables: weight vector (5,000D) and bias term

All methods use approximate line search and identical stopping criteria:

  • Gradient Descent (GD)
  • Non-linear Conjugate Gradient (Polak–Ribiere)
  • Non-linear Conjugate Gradient (Fletcher–Reeves)
  • Newton’s Method
  • Quasi-Newton (L-BFGS)

Implementation and Results

All optimizers converge to essentially the same solution, achieving ~90% test accuracy and ~0.969 AUC.

Key observations:

  • Gradient Descent converges reliably but slowly.
  • CG (FR) improves substantially over GD and CG-PR.
  • Newton’s Method converges in only a few iterations but is computationally expensive.
  • L-BFGS offers the best overall trade-off, reaching near-Newton precision with far lower runtime and memory usage.

The notebook includes Loss vs. iteration plots, Gradient norm vs. iteration (log scale), and 2D projections of optimization paths around the optimum. These plots visually confirm theoretical convergence properties and efficiency differences between first-order, second-order, and quasi-Newton methods.

The project demonstrates that while many optimization algorithms can successfully solve convex machine-learning objectives, their practical efficiency differs dramatically. For high-dimensional logistic regression on sparse text data, L-BFGS emerges as the most effective optimizer, balancing speed, accuracy, and memory usage.

About

A comparative optimization study of logistic regression for fake news detection, evaluating gradient descent, conjugate gradient, Newton, and L-BFGS methods on large-scale TF-IDF text features for convergence behavior, runtime, and memory efficiency.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published